• Skip to main content
  • Skip to primary sidebar

Award Consulting

Metaswitch consultants

  • Home
  • About
  • Services
  • Questions
  • Training
  • Articles
  • Podcast
  • Contact

When CenturyLink has an outage, do you follow?

September 2, 2020 By Andrew

On the morning of Sunday August 30, CenturyLink’s IP network was reported to be down for about 4.5 hours – probably focused on the part of the network that came with their acquisition of Level 3.

from DownDetector.com

From what I’ve read (simplifying greatly), a CenturyLink customer had requested that they block a certain IP address from hitting their network – which seems like a very routine event for anyone used to experiencing DOS or hacking attempts. Except that someone incorrectly entered the configuration update. Apparently…

“this request was accidentally implemented with wildcards, rather than isolated to a specific IP address”

ThousandEyes CenturyLink Outage analysis

I don’t know a lot about BGP routing, but I’m pretty sure that if you were supposed to block a specific IP and instead you use a wildcard, you end up blocking ALL IP addresses. Which would be bad.

This request was distributed across all the routers in the network, which then were (presumably) unable to be contacted because they were blocking all traffic. Fantastic.

It’s always fun to read outage reports – when it’s someone else who had the outage – and see what we can learn from them. In this case there are two big lessons.

Lesson 1: Even a customer-specific change to the core network can cause big problems

What could be more mundane than blocking a certain IP from reaching a particular customer? I’m sure many of you update ACLs on your firewalls all the time.

It’s easy to be complacent about changes that only impact a specific customer – but anytime you’re modifying the core network there’s a risk that something big can go wrong.

Obviously using a wildcard instead of the specific IP address is really bad, and I’ve also seen several issues where an IP address conflict led to a big outage.

It’s not a big deal to add a new device to the network. But if you accidentally reuse the IP of a critical piece of equipment…

Action: Always write a MOP for any core network maintenance, and always have someone review it.

Lesson 2: You always need a backup route

If you had SIP trunks to Level 3 (now CenturyLink) this weekend, I’m guessing they didn’t work too well. If you had SIP trunks to anyone else and your only connection to the internet backbone was via CenturyLink, then I’m guessing they also didn’t work too well.

For critical traffic, you can’t rely on a single provider – even if they’re providing a supposedly redundant network.

This applies both to your IP-based carrier internet connections AND to your voice trunks (whether SIP or TDM-based). You can’t assume that outages won’t happen. You need to plan for them.

If you have no plan, and would like us to perform an audit of your voice network to identify common issues, drop us a line and we can talk.

About Andrew

Award Consulting is focused on helping ILECs and CLECs who use Metaswitch products to thrive as they improve their networks through migrations, strategic projects and improved service offerings.

Our goal is to create highly specific, highly valuable content targeted specifically at US regional service providers, and especially those who are running Metaswitch equipment. Join our email list to be notified of new content.

Primary Sidebar

Our goal is to create highly specific, highly valuable content targeted specifically at US regional service providers, and especially those who are running Metaswitch equipment. Join our email list to be notified of new content.



Articles by Theme

  • Hosted PBX (17)
  • Interviews (1)
  • IP Networks (7)
  • Network Evolution (25)
  • Network Ops (56)
  • Product (17)
  • STIR-SHAKEN (23)
  • Strategy (18)
  • Technical (32)

Copyright © Award Consulting Services 2023