There’s no chatty preamble today, because I’ve got a puzzle for you. Are you ready?
Let’s suppose that you want to build a nice redundant VoIP network, and so you deploy two Session Border Controllers in geo-diverse locations, along with a single Metaswitch CFS. These two SBCs have IP addresses 220.127.116.11 and 18.104.22.168 (because you got very lucky when the IP Gods assigned IP addresses). You create DNS records for your domain name, and point that domain name at both SBCs for redundancy:
sip.mytelco.com. IN A 22.214.171.124
sip.mytelco.com. IN A 126.96.36.199
You then configure all your SIP endpoints to register with sip.mytelco.com.
This all sounds very sensible, but unfortunately you’ve made a big mistake. What is it?
Here’s the problem – you’ve configured the two DNS entries with equal weight. This doesn’t seems like it should be an issue, but actually when you get into the details of the SIP flows it becomes a problem very quickly.
- Bob’s phone does a DNS query for sip.mytelco.com and gets back 188.8.131.52.
- Bob sends a SIP REGISTER to 184.108.40.206.
- The SBC (acting as a back-to-back UA, as is typical), sends an internal registration from its core IP address (call it 10.10.10.10) to the CFS.
- The CFS confirms the registration and notes that Bob can be contacted via 10.10.10.10.
- Some time passes, and Bob decides to make a phone call.
- Bob’s phone does another DNS query for sip.mytelco.com and this time gets 220.127.116.11.
- Bob sends a SIP INVITE to 18.104.22.168 in an attempt to set up a call to Alice
- The SBC (again, acting as a B2B UA) sends an internal INVITE from its core IP (call it 10.10.10.20) to the CFS.
- The CFS looks at the INVITE, and rejects it, because there’s no registration associated with Bob at the 10.10.10.20 address.
That’s complicated I know, so let’s make a terrible analogy, in case that helps. Imagine Bob is an 8 year-old flying on a plane all by himself to go visit his Grandma Alice.
It’s as if Alice is waiting at the airport to meet her Bob, and Bob told her he was going to be arriving at Terminal 1 on Southwest (that’s the registration), but then Southwest had a massive system failure and cancelled all their flights (implausible, I know), so instead Bob gets re-routed via Delta and arrives at Terminal 2 (the INVITE). But because Alice isn’t expecting Bob to arrive at Terminal 2 she’s not there to meet him. And because he’s a little boy with no-one to meet him, the airline rejects his arrival and sends him all the way back to where he started with a 404 Not Found sticker on his shirt.
How do I avoid this?
I should first note that this problem can only impact you if you have two separate SBCs. If you have a single Perimeta SBC with two blades (but they act together as a redundant pair) then this is not an issue because they share both public and core-side IP addresses.
However, if you genuinely do have two SBCs, then the best way to handle this is to assign a weight in your DNS configuration (so you have a designated primary and secondary).
You can’t do this using A records, so you have to get more advanced and explore DNS NAPTR and DNS SRV records which give you more control, and allow weighting and the ability to assign different DNS mappings for different services. For example, something like this might be used:
sip.mytelco.com. IN NAPTR 10 100 "s" "SIP+D2U" "" _sip._udp.sip.mytelco.com.
sip.mytelco.com. IN NAPTR 20 100 "s" "SIP+D2U" "" _sip._udp.sip.mytelco.com.
_sip._udp.sip.mytelco.com. IN SRV 10 5060 22.214.171.124
_sip._udp.sip.mytelco.com. IN SRV 20 5060 126.96.36.199
If you want to get really fancy, you could have different priorities given to SIP clients based on their location – learn more about GeoDNS here.
I hope you enjoyed the puzzle. I’m continuing to play with ChatGPT, and asked it to write you a short verse to conclude this message.
May your flights take off on time,
Never delayed, never denied.
May your networks run smoothly, fast,
With no issues to disturb your rest.