If I’ve learned one thing in my 18 years of working in telecoms, it’s that people don’t like outages. And they especially don’t like outages that impact 911 service.
Unfortunately, that’s exactly what happened for subscribers in four counties in West Virginia, for a 16 day period in April 2022.
Apparently the local ILEC, Shenandoah Telecommunications Company (or Shentel) was performing two separate network updates simultaneously.
- Shentel was “transitioning customers to a new 911 routing service” – which I assume means they were turning up some new 911 trunks, probably for a SIP next-gen 911 service.
- Shentel was also replacing its Session Border Controllers – it sounds like this was a migration rather than a flash-cut, which would usually be safer… but apparently not in this case.
Unfortunately, if a subscriber was still connected to the old SBCs but their 911 service had been moved to the new trunks, they experienced the following scenario:
- The call would complete just fine from a signaling perspective.
- However the audio path was only working in one direction, such that the caller could hear the PSAP operator, but the PSAP operator couldn’t hear the caller.
- Since the caller couldn’t communicate their emergency, these calls weren’t very helpful to anyone.
- Thankfully, since signaling was working correctly, the emergency operator was able to identify the caller and call them back – and this return call worked fine.
I don’t want to be too judgmental here – “let him who is without sin throw the first stone” – but I do want to highlight a couple of lessons we can learn from this situation.
- Only change one thing at a time: If you’re going to make a change to your production network, keep it simple. It’s very tempting to batch up a bunch of changes into a single maintenance window so you don’t have to stay up multiple nights, but it’s much safer to take things slow, and make one change at a time.
- Always have a backout plan: I don’t know any details of this situation beyond the FCC report, but it sounds like both the 911 and the SBC change were migrations rather than flash cuts (since only some subscribers were impacted). In theory this sounds good – because if you have both the new and old systems in service simultaneously you can easily move things back if there’s a problem. I don’t know why the outage lasted so long for Shentel, but anytime you’re executing a maintenance activity, make sure you have planned what to do if it doesn’t work – ahead of time.
- Test 911 really well: Anytime you’re doing anything significant on your network, even if it’s not obviously related to 911, it’s always worth making a couple of test calls (with the PSAP’s permission) just to make sure everything’s working. 911 is really important, and the FCC takes outages very seriously.
This 911 outage proved to be a very expensive mistake for Shentel. After negotiating with the FCC, the company has agreed to pay a civil penalty of $227,200 and implement a detailed compliance plan to make sure it follows best practices in future.
In other 911 news, the FCC is beefing up the reporting requirements for service providers to notify PSAPs of any potential or actual outages in service. Check out this article for more information, and to see how it might impact you.