As a kid growing up in England, I used to watch the Formula 1 Grand Prix races on Sunday afternoons. My Dad and I were TV-buddies, sitting enthralled as the drivers raced at over 200 miles per hour, on twisting, winding tracks in exotic cities across the world.
When I moved to the US, I stopped paying attention to Formula 1 racing, which wasn’t covered much here – until these past few years when Netflix started releasing a documentary covering each season (Drive to Survive), and suddenly I found that a bunch of my coworkers and friends from church were suddenly experts in the sport. And so I started to pay attention again, and I will openly admit… I’m hooked.
All races are exciting, but the most exciting part – as a kid and now – is when they crash. This tends to happen either right at the start of the race, or whenever it rains. I’ll bring this back to telecoms in a moment, but first, take a moment to watch this video of an epic crash at the start of the British Grand Prix a few weeks ago.
Safety measures come in layers
I’m going to ask you to watch it again in a moment, but before you do, take a moment to consider all the safety measures that are in place:
- As the commentators mentioned, these cars have incredibly low centers of gravity, so it’s almost unheard of for one to flip upside down – but just in case it happens, there’s a roll-bar (known as the “Halo“) above the driver’s head to protect him.
- At the edge of the track there’s a “run off” area – a large gravel area – that is designed to slow the cars down if they run off the road.
- If a car doesn’t slow down enough on the gravel, there’s a thick wall of tires that is supposed to be the final barrier to stop a car quickly, but with enough give so that the driver isn’t hurt.
- In this case, something freakish happened, and just before that tire-wall, the car flipped up into the air, and jumped over the tire-wall, skipping that layer of protection entirely. No-one planned for this eventuality.
- Luckily, right in front of the crowd of spectators, there was one final fence. Its stated purpose was to catch any debris (e.g. small bits of metal that got flung off a car in a crash), but in this case, it ended up being the final barrier – that prevented a fast-moving race car from flying into a large crowd of spectators.
- Obviously, the driver is also wearing a helmet, is thoroughly strapped into his seat, and is wearing flame-retardant racing overalls. F1 drivers also have famously strong necks to help them withstand the G-forces they experience.
Now take a moment to watch the crash again, and notice how all these measures helped limit the damage – and notice that the driver, Zhou Guanyu, walked away with barely a scratch.
What’s the telecoms equivalent of a Halo?
Formula 1 takes safety very seriously. Anytime there’s a significant accident, they do a thorough investigation and figure out what they can learn from the situation, and update rules and regulations to reduce the risk of a recurrence.
If we are to provide carrier class service to our subscribers, we need to take the same approach when running our network. This starts with a mindset – try to apply these principles to your role.
- Build a system that is resilient to failure. You can’t prevent failures from occurring in your network, but you can build a network that is resilient to the failure of individual components.
- Build in layers of redundancy. Redundant IP links, redundant hardware, redundant power, overflow routes… it all adds up. And don’t forget to take backups of everything, just in case everything else fails.
- Monitor your network for signs of trouble. The Formula 1 cars have sensors all over the place, allowing the team to keep an eye on every aspect of the car during the race – so they can warn the driver before something fails catastrophically. For you, this means monitoring your logs and alarms – there’s no point having redundancy if you don’t notice and fix a failed device.
- Don’t forget to stress test your network. It’s all very well to think that your network will withstand a certain failure, but until you’ve tested it (in a maintenance window, please!), you won’t know for sure.
- Conduct in-depth reviews anytime something goes wrong. We like the idea of the “5 Whys” approach – where you keep digging to find the underlying problem, rather than simply addressing the surface cause. You never want to repeat the same mistake twice.
If you’d like a head-start on improving your network resilience, check out our system audit service, where we review your voice network as it stands today – and give you some concrete guidance on what needs work.
And if there are any other F1 fans out there, drop me a line!