I recently came across this post on Reddit, where someone had come across a group of about a dozen driverless cars all stopped in the middle of the same city block, effectively preventing any traffic from using that street.
Obviously, any time a group of artificially intelligent beings decide to gather together in one place, it’s a cause for concern. Are they plotting to overthrow their human creators? Or was it just a software bug?
In this case, the vehicles, were all manufactured by Cruise, a startup that has funding from GM and Honda among others. This isn’t an isolated incident but so far the cars haven’t been violent. Perhaps they’re concerned about the long working hours, and so they decided to go on strike? Or form a union?
Personally, driverless cars kinda freak me out. I’m sure the tech is amazing, and I know they’re probably safer than a lot of drivers I see out there – but I’m still not keen to get into one of these cars.
On the plus side, based on this video, if they encounter a bug, they seem to be programmed to stop. Which isn’t great if they all do it in the same place, but is much safer than the alternative.
Inside the Cruise Network Operations Center
I just love to imagine the Cruise NOC that night.
- First there’s one yellow alarm, or error log – there’s some kind of issue with one car, and it has pulled over. One of the NOC techs starts investigating – but it’s not a big deal, no sense of urgency.
- Then a second one comes in – same problem, and the car’s in the same location. How odd. A couple of people chat about it, trying to see what’s going on.
- Then a third, fourth, fifth alarm comes in. Now everyone’s paying attention – this is really weird! What is going on?
- Now more alarms come in, and then they start getting calls from the city police department.
- The urgency escalates rapidly. How close are we to figuring this out? Is there a workaround? Can we solve it remotely? Who’s going to wake up the boss?
- We need to send a team on-site to fix this. Go! Go! Go!
Or at least that’s my guess as to how it went down. According to the guy who shared this, eventually some real human people (who presumably worked for Cruise) had to come out and reset the vehicles – so I got the ending right at least.
How does that compare to your NOC?
As you read the above description, my guess is that much of this sounds familiar. Most of us have been responsible for monitoring logs and alarms. Most of us have been in situations where a small problem quickly turned into a total freakin’ disaster.
But the NOC I described had a bunch of people working in it – which I assume would be true for an organization like Cruise. Whereas most of our clients are fairly small ILECs, and in most cases the team responsible for the voice network is pretty small – and may well be responsible for all the other networks too. Maybe the network operations team is just two people. Or one person. Or maybe it was one person, and that person just quit/retired.
So you may find yourself responsible for monitoring logs and alarms – really monitoring the health of your network and services – but with very few people to share the burden. Which is why it’s critical that you run a very efficient operation – to make the most of the resources you have.
An efficient NOC
So what does an efficient NOC look like – one that can run well with a small team? I’d love to hear your experiences, but here are a few ideas:
- Automate as much as possible. For most foreseeable failure scenarios, you want your network to automatically switch to Plan B – i.e. overflow trunks, alternative IP routes, backup power, etc. This is more efficient and provides better availability than having to manually intervene. Maybe the cars could have stopped, reset themselves, and then returned to base – without human intervention.
- Get notified! You want all your systems to send alarm notifications if there’s a problem, and you want them all to be sent to you using a method that you’ll notice. That could be a phone call, or a text message, or a siren in the CO, or an email (provided your phone alerts you to new emails) – but you don’t want to be in a situation where you only notice a problem if you go looking for it. Hopefully Cruise knew about the problem with their cars before the police called them.
- Keep it clean. If you have some kind of dashboard / alarm panel, you need to keep it clean so you can spot new issues. If your system always has a bunch of error logs and alarms, it will be really hard (a) to notice new issues, and (b) to investigate them.
If you’re having a hard time keeping on top of all the logs and alarms in MetaView Explorer, or if you don’t know how to actually resolve the issues – please contact us to see how we can help. We regularly work with clients to clean up excess alarms, or to investigate and resolve those that look concerning. This kind of housekeeping helps you troubleshoot problems more quickly and helps prevent outages. It’s important! So let us know if you need help.
P.S. If you want to see more fun with driverless cars, check out this video where the police pull over a Cruise vehicle. The car stops, waits for the policeman to get out of his vehicle, and then promptly drives off!