If you’ve worked in telecoms for any length of time you’ll likely have some kind of psychological scars from dealing with an outage.
Everyone tries to build a redundant network that is resilient to any problem, but even with the best of intentions outages happen.
I’ve heard stories of
- human error – that time someone thought they were shutting down the lab system but actually shut down the production switch…
- software bugs – the software patch that was intended to fix a minor problem with a feature but instead caused a system to continually restart…
- network issues – that time someone added a new PC to the core network with an IP address that conflicted with the voice switch…
- cascading failures – this is before my time in the industry, but in 1990 the entire AT&T long distance network (114 switches) went down. For 9 hours.
And many more. By the way, if you think I’m talking about you in the above examples then (a) it’s possible, but don’t worry because (b) these things have happened to multiple people so it might not be you, and (c) I’ve heard so many stories over the years that I’ve forgotten the details anyway.
One good thing about the telecoms industry is that while outages do happen, significant outages are rare, and when they do happen people act very quickly to resolve them.
A 40+ hour outage
This week I’ve been on the receiving end of an extended outage. But thankfully not in my work life.
Some of you may know that I’m a keen runner. “Keen” being a euphemism for “slow and overweight” – but nevertheless I run 3-4 times each week and I particularly love getting out into the hills and running on hiking trails.
For father’s day my wife and kids treated me to a cool new GPS watch from Garmin that allows you to see a map as you run, and upload your planned route ahead of time so you don’t get lost. This is a good investment as I have an unfortunate habit of getting lost somewhere in the mountains as sunset approaches.
So Wednesday night I was really frustrated when I couldn’t access the Garmin website. I wanted to upload my route for my Thursday morning run and it was “down for maintenance”.
Worse still, after my run on Thursday morning (where I, predictably, got a little lost) the Garmin app on my phone still couldn’t connect and upload my activity. [Because if I can’t add it to my heat-map, then did it really happen?]
As a telecoms engineer I’m thinking, “What kind of maintenance causes a 12 hours outage?!”
Well… it turns out that Garmin have been the victim of a ransomware attack that has impacted
- their website
- their app that lets users upload workouts
- their call centers
- their factories!
As I write this (on Friday July 24, 2020), this issue has astonishingly still not been resolved.
Security is super important!
As I’ve been watching this situation at Garmin I can’t help but imagine what would happen if something similar happened to a telco.
- I’ve written in the past about security as it relates to toll-fraud, and you should definitely address that, but…
- If a hacker got access to your corporate network, what damage could they do?
- If hackers were able to take down not only Garmin’s website but also their call center and their factory, do we really think we would not be vulnerable from a similarly advanced attack?
- Once a hacker got into your corporate network wouldn’t they be able to access your provisioning tools, your billing system, your servers…?
You don’t tend to hear about voice switches or SBCs getting compromised (although if you have more stories let me know), but your core network is only as secure as your corporate network.
This would be a great moment for me to reveal some kind of IT security service that we offer to clients, but the truth is we don’t have expertise in this area – so there’s no sales pitch here.
Instead consider this a public service announcement to take IT security seriously. Virus scanners, firewalls, strong passwords… all that good stuff.
And if anyone knows of a good IT security consultant that I should recommend let me know. 🙂