Bandwidth.com operates (in their words) “the world’s largest directly-connected network”. They have PSTN connectivity in over 60 countries, and provide originating and terminating service to carriers large and small across the world. My impression is that Bandwidth is a well-run company, with a strong technical foundation and good people.
Despite this, from September 25-29, 2021 Bandwidth customers experienced severe service outages as the result of a sustained and large scale DDOS attack.
What does this mean for the industry, and what could a Bandwidth customer have done about it?
Is anyone safe from DDOS attacks?
We have discussed DDOS attacks before in an episode of our podcast, and there are certainly steps that all carriers should be taking to prepare for (and protect themselves from) DDOS attacks. The general approach for most DDOS mitigation efforts is for inbound traffic to be handled by a larger capacity network which scrubs the traffic – removing the bad traffic and only allowing “good” traffic through to your network.
The scrubber basically acts as a bouncer / security guard, standing at the door, and only allowing legitimate guests to go through the door and enter your party. This keeps your party from becoming too crowded with undesirable characters, but for VoIP service (in particular) it’s also very important that legitimate guests don’t have to wait in line. So you need enough security guards to quickly handle the throngs of people trying to crash the party, and quickly allow access for the VIPs.
(Check out this excellent video for a more in-depth explanation.)
If you have a small party and a large number of security guards this all works splendidly. The problem in recent weeks is that we’ve seen attackers with huge networks of bots attack large service providers. At the beginning of September we saw attackers take down multiple UK VoIP providers before this most recent significant outage at Bandwidth.
These were large companies with significant resources and skilled technical staff – and yet they still couldn’t prevent the outage. I have to imagine that Bandwidth already had DDOS mitigation measures in place, but the scale of this attack must have been so large as to overwhelm their mitigation plans. To continue our analogy, these are very large parties with a lot of legitimate guests (VoIP traffic), and so to mitigate they would need a huge number of bouncers (scrubbers) to filter out the bad traffic – and so I’d speculate that the attack was on a much larger scale than they were prepared to handle.
What if Bandwidth provides my DIDs?
Many smaller carriers use Bandwidth either for outbound LD routing, or as a provider of DIDs (phone numbers) for inbound calling. If you’re not familiar this latter service, allow me to briefly explain.
An ILEC may own a number block, and may be able to port numbers to its own LRN within their local LATA, but if it wants to offer CLEC service in a wider geographic area, it can use Bandwidth (or Inteliquent, who also have significant US market share) to gain access to numbers in those areas. Bandwidth owns numbers across the US, and can either provide a DID it owns, or an existing DID can be ported to Bandwidth. Standard PSTN routing would then deliver the call to Bandwidth (as the carrier that owns the number) and then Bandwidth would deliver the calls to the CLEC over a SIP trunk. The CLEC then routes the call to the subscriber’s phone.
If Bandwidth has a major outage, it’s fairly simple for a service provider to reroute outbound calls via a different carrier. However, since Bandwidth owns the DID, all inbound calls to that number will be routed to Bandwidth, and (in an outage scenario) will fail. With regular numbers there’s no option for a “backup carrier” – the number can only be owned by one carrier at a time, and if you want to change which carrier owns a number, the porting process typically takes a few days (at best).
Toll-free to the rescue?
While that’s all true for regular 10-digit phone numbers, the situation for toll-free numbers is a little different.
As you may know, when placing an outbound call to a toll-free number, a carrier usually does a “dip” – a database query – to find out where to send the toll-free call. The database will either provide a ring-to number (a real 10-digit phone number) or else a carrier to whom the call should be routed.
Whereas porting a number to a new carrier is a slow (and often frustrating) process, the toll-free database can be updated in real-time. Toll-free numbers are owned and managed by a RespOrg – these entities are often carriers themselves (you can apply to become a RespOrg), and a RespOrg has the ability to dynamically update the toll-free database.
This means that in an outage event, you could update your toll-free number to route via a different carrier, or you could update the ring-to number to be a different 10-digit number that is owned by a different carrier.
So with the benefit of hindsight, if you’re providing Hosted PBX / UCaaS services to businesses, I’d recommend a structure along these lines for each business.
- A toll-free number, where you have access to the database and can directly update the ring-to number.
- DIDs on (at least) two different carriers (e.g. Bandwidth and Inteliquent) which both route to an IVR system / front desk.
- If you use an IVR, a dial-by-extension / dial-by-name option that allows access to individual phone lines.
- DIDs for each employee.
If one of your inbound carriers experiences a major outage you can update the toll-free number to route via the other carrier, and through access to the front desk / IVR, callers will still be able to reach everyone at the business – even if all the DIDs to individual phones are unavailable.
Should I port away from Bandwidth?
It’s very tempting to look at what happened and conclude that Bandwidth is unreliable, and that you should port your numbers elsewhere. I personally think that would be a bad choice.
Bandwidth suffered a DDOS attack on a scale that they were not ready for, and they have now enhanced their systems and processes to a point where they could provide service despite the attack.
Would you rather get service from a carrier that experienced the attack and fixed the problem, or from a carrier who has not yet been attacked?
These attackers have been targeting VoIP carriers repeatedly over the past month or so. I see no reason to suppose that they’re done. Hopefully all major VoIP providers are busy preparing themselves for the next DDOS attack, but if you’re relying on a single carrier for inbound calls, this might be a good time to review your contingency plans.