Phone service is important. Providing reliable service can mean the difference between life and death.
People don’t like risk – and they especially don’t want to be responsible for an unnecessary risk. That’s why every telecom vendor offers some kind of redundancy.
Geographic vs ‘regular’ redundancy
I’d define regular redundancy as the ability to maintain service in the event of a single hardware failure – typically by offering a pair of blades or servers that work together to provide the service, where either one can act independently if the other fails.
But if that’s not enough for you, you can also buy geographic redundancy (or “geo-redundancy”). In this case the system providing (e.g.) voice service is geographically dispersed so that if an earthquake / tornado / lunatic with an axe were to thoroughly destroy one physical location, service would still be available from the second location.
Of course, all VoIP redundancy could be geographic – these are just computers running software. If you have a redundant pair of servers there’s no reason why you couldn’t create homemade geo-redundancy. Just put one of them in Denver and one in St. Louis. As long as you can create an IP network across those two sites that meet the IP requirements of the system, you’re all good.
So what features make a product explicitly geo-redundant? And for today’s purposes, specifically, what does it mean when we say that a Metaswitch Perimeta Session Border Controller is geo-redundant?
In the Perimeta architecture, geo-redundancy actually refers to multiple redundant pairs of SBCs acting together to provide a distributed, redundant network.
In contrast to my example above, a geo-redundant Perimeta network would have a redundant pair of SBCs in Denver AND a redundant pair of SBCs in St. Louis – a total of four servers rather than two.
And if you desire, it can be extended to a third site and beyond.
What’s more, unlike my homemade geo-redundant network, there are no special IP network requirements. The SBCs aren’t sharing a virtual IP address, so they can each interface the public internet with a separate IP address, and there’s no need to create a Layer 2 WAN for communication between the two sites.
The little diagram above compares a two-site homemade geo-redundant Perimeta to an “official” geo-redundant system. [Note that I’ve used private IP addresses for the example although likely these would be public in real life.]
The top half of the diagram has a single server in Denver on the left, and a separate server in St. Louis on the right. These share a virtual IP but in order to do that (and in order to communicate with each other) you have to have them on the same Ethernet subnet.
Alternatively, the bottom half of the diagram shows a redundant pair of Perimeta SBCs at each site, and then each pair publishes a virtual IP.
Official geo-redundancy is better
I’m going to go out on a limb here and say that I don’t recommend the “homemade” option. It’s possible to make it work, but the IP networking requirements are pretty complex.
- You need a single IP subnet (actually multiple, because you’ll have multiple interfaces on the SBC – not just the one shown here) that spans the two physical locations.
- You need that subnet to be accessible if either location is destroyed.
- You need to avoid a scenario where the two sites can’t connect to each other, but are both externally visible.
Fundamentally, by trying to do something this complex, you end up creating more risk. There’s a greater chance of breaking something through the complexity, than there was risk of a tornado in the first place.
In contrast, the official geo-redundant Perimeta design avoids all that complexity – you simply have two sites, each internally redundant, each accessible through a separate virtual IP. Better yet, you can extend this design beyond two sites if you wish – even creating a nationwide network of SBCs (which has quality benefits as I’ve previously described).
How does a geo-redundant Perimeta work?
What I’ve described so far doesn’t really seem like a geo-redundant system particularly – it’s just two different SBCs. Sure it’s simple to configure, because it seems like two separate SBCs. So what’s the big deal?
When you purchase Perimeta SBCs with the Geographic Redundancy feature, you’re really purchasing a feature that replicates configuration among the devices.
Imagine having a GR cluster of 4 SBC pairs – then each time you needed to add an adjacency or update Perimeta routing, you would need to make that change on all 4 SBCs. Would your whole team consistently make all those updates?
Despite your best intentions, I promise you that 2 years from now the SBCs will have different configurations – which means that when you actually need the redundancy, some configuration will be missing, and some of your customers will be out of service.
With Metaswitch’s Geographic Redundancy feature, anytime you update part of the “global configuration” (i.e. shared configuration) that update is automatically copied over to all the other SBCs in the cluster – so they can never get out of sync.
This seems simple, but from an operational point-of-view it’s absolutely necessary – it saves time AND gives you confidence that your redundant network will actually work when you need it.
But it’s still two SBCs from an IP perspective!
You noticed that, eh? Yes – that’s true.
Each SBC in the GR cluster has a separate IP address for each interface (including core interfaces facing your Metaswitch CFS), and this means some extra configuration is required on the other SIP devices to make this work.
For registered SIP devices you have two options:
- Many phones support a backup SIP proxy server, in which case you can simply configure a primary and backup IP address on the SIP endpoint (which can be managed centrally through SIP provisioning server phone profiles). In this model the phone itself will re-register to a backup SBC if the first one drops off the network.
- Alternatively you can configure DNS SRV records, which is a mechanism through DNS that allows a single DNS entry (e.g. sip.mytelco.com) to be associated with multiple redundant SBCs at different IP addresses. This moves the redundancy from the SIP endpoint (e.g. phone) to the DNS system.
For static / configured SIP bindings (e.g. network SIP trunks or PBXs that do not register) things get more complicated. I won’t get into all the details here, but fundamentally you need to configure both the CFS and the remote SIP endpoint with multiple connections (one to each SBC) so that the PBXs / network SIP trunk providers can receive traffic from any of them and can send traffic to any of them in the event of an outage.
This can all be a little complex, but there are also some cool benefits – you have the option of defining different priorities for different devices.
For example, you could have a persistent profile used by business groups in Colorado that uses Denver as the primary and St. Louis as the backup, and another persistent profile for business groups in Chicago that uses St. Louis as the primary and Denver as the backup.
You end up with a redundancy model where each SBC is active at all times – providing the best quality of service available – rather than having one SBC unused – solely present for redundancy. As you add more SBCs you can create further combinations of priorities so all endpoints can make use of their closest Perimeta.
This is a complex topic, and some of the terms can be confusing at first. Hopefully things are slightly clearer after reading this article. If you want to learn more (and have Metaswitch Communities access) you can go deeper into this topic in Communities documents 448089 and 49502.
If you’d like to build out a redundant SBC network and would like some expert help to get it right first time, we can offer design and/or implementation services to help with your specific situation. Just contact us and we’ll schedule an introductory call.