I’m not the only one to think so:
- “SD-WAN will probably make your VoIP phone system work, even if you have one location, with a shoddy Internet connection” – from Network World
- “Software-defined WANs for VoIP: When you absolutely, positively need high quality” – from Citrix
- [VeloCloud SD-WAN] “enables enterprises to leverage the Internet to roll out rich media UC applications which demand high bandwidth and reliable WAN” – from VeloCloud.com
VeloCloud also have a side-by-side comparison showing a video conference experiencing 2% packet loss, as compared to the same video (allegedly) when it’s run through their product.
There are plenty of claims out there about the amazing impact SD-WAN has on voice quality, and today we are going to dig deeper, and look behind the curtain.
How does the magic happen?
If you’re like me, you tend to be skeptical of the claims of tech marketing teams, and need to understand what’s actually happening under the covers to make this all work. So today we’re going to dig a little deeper into some of the common features offered by SD-WAN devices.
- Prioritization: pushing VoIP traffic to the front of the queue ahead of data.
- Path selection: if you have two or more uplinks, SD-WAN products will dynamically select the uplink to use for each particular packet, hiding outages or quality problems from the user.
- Packet duplication / cloning: by duplicating RTP packets (which costs bandwidth) you can improve both the performance and resilience of an RTP stream.
- Forward error correction: similarly to packet cloning (but with less overhead) forward error correction allows missing RTP packets to be re-created at the destination.
Prioritization: An argument for skipping the line
We all hate standing in line, right? But it’s much worse when you’re in a hurry.
I remember arriving at the airport one time about an hour before my flight, only to discover that the line for security stretched far beyond the checkpoint, past all the check-in counters and then wound back on itself several times… this line must have been 200 yards long with maybe 500 people between me and the checkpoint.
Not good.
It’s at times like this that I wish there was a better system for handling airport security lines. If someone’s about to miss their flight (which causes the passenger and the airlines problems down the road) shouldn’t they be able to cut in line ahead of those people who arrived 4 hours early? But if you do that would everyone start arriving later because they know they’ll be able to skip the line? And if everyone arrives late wouldn’t that make everything even worse?
After a few moments of pondering this problem I realized that I was suffering from analysis paralysis, and that actually there is a way to cut the line. I found a United check-in machine, paid $15 to purchase Premier Access, then used the special Premier Access lane (with a much shorter line) to go through security and catch my flight.
Now unfortunately IP routers can’t charge money to applications to decide which packets should get to cut in line, they rely on a scheduling process (an over-worked airport employee who evaluates every new person to arrive to see which line they should go in) to figure out how to handle all the packets.
What’s special about SD-WAN is simply that the product developers have gone to the trouble to simplify this process – they’ve defined a bunch of rules (policies for the airport employee, if you really want to continue the analogy) so that VoIP traffic (which is very sensitive to delays) is automatically categorized and can easily be prioritized above data traffic.
So the improvement in upload prioritization is really in the user interface, which sounds small, but when you consider how complicated it is to configure the average IP router, an improvement in the user interface probably translates from “prioritization implemented incorrectly and not working” to “prioritization implemented and working”, so that’s a big win.
The other part of the prioritization feature is the download prioritization. You may remember the above diagram from my previous article, which includes a device labelled “SD-WAN cloud gateway”. This cloud gateway is both an intermediate destination for all the packets (which allows it to spot quality problems) but also the entry point for all traffic headed in the other direction, so VoIP packets can be prioritized not only on the uplink from the CPE to the internet, but also in the download direction from the internet to the CPE.
In our airport analogy this means that your Premier Access pass would also allow passengers who have finished their flight to take the fast lane through immigration – so prioritization is applied both to passengers entering the airport system but also to passengers exiting out of the airport. In a phone call both are equally important – it doesn’t matter much if your caller can hear you clearly if you can’t understand a word they’re saying!
Path selection
Perhaps the most well-known feature of SD-WAN solutions for voice is the ability to connect multiple internet uplinks to the router and have the SD-WAN device dynamically select the path to use in order to hide service outages and maximize the quality of the voice path.
While the precise features and technology vary among the various vendors, the ideas are pretty compelling.
- The most basic functionality is redundancy – if you have two links and service fails on one of them then the second link is available as a backup. This functionality is not solely available with SD-WAN, but it’s a core building block for businesses that want high service availability.
- The presence of a cloud gateway at the other end of the link allows the SD-WAN network to monitor the performance of each uplink to quickly spot outages or quality problems, or simply to track the relative performance (latency, packet loss, jitter) of the links. This information is available to the SD-WAN devices throughout the network, which means high priority traffic can be re-routed to the best performing link in real time.
- A traditional network with a backup link might prioritize a primary link and only route traffic over the backup link in the event of an outage, whereas SD-WAN can use all the available bandwidth at all times, so you get the full benefit of the second link – it’s not just an insurance policy.
- On the flip side, if you want to prioritize a particular link (e.g. if you have one business broadband connection and one metered wireless LTE connection) you can configure the system to only use the second link when necessary.
- There are a variety of videos on Youtube showing a live demo of what happens when you lose an internet uplink with SD-WAN enabled.
Packet duplication / cloning
While the first two features strive to give your RTP packets the maximum chance of being routed promptly over the best available link, they won’t prevent an audio glitch if the “best link” suddenly starts dropping packets. However, with packet cloning you can choose to allocate extra bandwidth to your voice calls to reduce the chance of any quality disruption.
With the path selection feature, the cloud gateway would track the quality of each link, and would notify the CPE if one of the links started dropping packets. This allows the CPE to quickly switch to the other link, but those packets were still dropped.
However, if packet cloning is enabled, every single RTP packet is sent over both uplinks simultaneously, which means that the cloud gateway can not only continue to provide flawless service if a packet is dropped on one of the links, but even if no packets are dropped it can choose to use the packet that arrives first – thereby keeping latency to a minimum (combined with some dynamic jitter buffering that could allow latency to increase or decrease as network conditions change).
Some of you might be thinking, “Hang on, if I duplicate every packet I’m doubling the bandwidth requirements and therefore I’m increasing the likelihood of packet drops in the first place!”
This was my initial thought too, and that makes sense if the only traffic on the network is voice calls. However there will typically be many different applications using the same network, so while doubling the bandwidth used by voice will reduce the bandwidth available to data traffic, that may be insignificant compared to the improvement in voice quality.
For example, if 20% of your bandwidth is used by voice calls, and 80% is used by data traffic, packet duplication would increase the voice bandwidth to 40%, reducing your data bandwidth from 80% to 60%. This could potentially slow your data performance by 25% (if you’re maxing out bandwidth) but if that gives you great voice quality your users may not care (or even notice).
Forward Error Correction
Forward Error Correction is a technique where additional packets are included in a stream with some additional data (similar to the way RAID arrays work) that allows the recipient to check whether the received data is complete, or in our case, to fill in the gaps left by dropped RTP packets. In other words this feature also tries to mitigate against packet loss, but with lower overhead than packet cloning.
Forward error correction is particularly beneficial in situations where missing packets can’t easily be retransmitted (e.g. in communications to distant space probes where the latency is measured in minutes or hours), but it’s also valuable for real-time applications where we also can’t wait for missing packets to be resent.
I considered trying to explain the math behind forward error correction in this article, but for the good of mankind I will instead link to the FEC wikipedia article so that those who care can indulge to their heart’s content, while the rest of us can simply accept that it works.
Obviously there is a trade-off between using additional bandwidth for “loss recovery packets” and the effectiveness of the error correction. This white paper from Silver Peak gives some examples of the impact.
- 1 loss recovery packet for 10 regular packets: reduces 1% packet loss to 0.09%.
- 1 loss recovery packet for 5 regular packets: reduces 1% packet loss to 0.04%.
In other words, even if you only add a 10% overhead with 1:10 forward error correction you can turn a noticeable 1% packet loss into an unnoticeable 0.09% packet loss.
While FEC gives a much better “bang-for-buck” than packet duplication in compensating for packet loss on a link, it doesn’t help in the situation where a link dies entirely – nor does it provide the performance benefits (latency reduction) of packet duplication.
Whether to use one or the other (or even both?) will depend on the particular characteristics of your network – how heavily the bandwidth is used, and by what applications – but if you’re able to use these features your users will see a dramatic improvement in voice quality even when the quality of the network is significantly degraded.
What’s next?
Over the coming weeks and months I’m planning to publish further articles exploring what SD-WAN means for VoIP service providers, how best to deploy it in different scenarios, before trying to answer the core business question – is SD-WAN worth the investment?
Please join my mailing list to be notified when the next article is published. I’m also working on a guide to help you decide how best to deploy SD-WAN (if at all) for different types of businesses, which I’ll share with my email list once it’s ready.
Award Consulting is focused on helping regional service providers who use Metaswitch products to thrive as they improve their networks through migrations, strategic projects and improved service offerings. If your business is being damaged by recurring quality or stability issues, contact us to see how we can help.