Some of these problems are an annoyance – distracting your team from today’s tasks – and some of them are serious problems that leave your customers feeling unhappy and your sales team feeling frustrated and demoralized. Does any of that sound familiar?
Of course, absolute perfection may be out of reach, but that doesn’t mean we shouldn’t strive for it. Today we’re going to look at a tool (a process) that we can use to significantly reduce the number of trouble tickets over time, and get gradually closer to perfection.
I was first introduced to this idea in the Metaswitch Operations team, where we used this methodology to reduce the number of shipment quality issues by 80% in 3 years. This tool is commonly used in lean manufacturing and is known as the 5 Whys of Root Cause Analysis, one of the most effective problem-solving tools that Toyota integrated into its famous lean manufacturing philosophy that became the Toyota Production System (TPS).
And I say, “Try to figure out what really went wrong”.
It’s an obvious question to ask, but in real-life, in a busy NOC environment, most of us succumb to the temptation to instead ask “is the problem fixed?”. And that’s where we go wrong – because when we say “is it fixed”, we really mean “is it still happening”, and in many cases we’ll close a trouble ticket because the problem mysteriously disappeared, or because we changed a few settings and now it’s gone.
And maybe that’s fine for this particular customer, but if we don’t know how our process caused this problem to occur in the first place then it’s going to happen at other customers, and our quality will never improve.
We need to view every trouble ticket as evidence of a flaw in our operational processes, and therefore every ticket gives us a hint as to where we can improve our processes so that our quality will be higher tomorrow.
The 5 Whys are simply the process of asking the question “Why?” five times in succession to drill down into a failure mode five layers deep, to identify the true root cause of a problem on the most basic of levels.
In order to do this, we state the problem in the simplest terms possible, then we ask why that problem occurred, and each time we get an answer, we ask why again, until we go at least 5 layers deep.
The 5 Whys: Real-Life Examples
Let’s start with a famous example that has nothing to do with manufacturing and nothing to do with telecoms – but was itself of monumental concern.
Problem: The Washington Monument is deteriorating.
- Question: Why is the monument deteriorating?
Answer: The chemicals used to clean the monument are corrosive. - Question: Why are these chemicals needed?
Answer: To remove the bird droppings from the monument. - Question: Why are there birds on the monument?
Answer: There are a lot of spiders and birds eat the spiders. - Question: Why are there a lot of spiders?
Answer: The spiders are there to eat the gnats. - Question: Why are there gnats?
Answer: The gnats are attracted by the lights when they are turned on at dusk.
So what was the solution to this problem? The maintenance staff determined that by turning the monument lights on after dusk, the presence of the gnats was greatly reduced, and thus the spiders, the birds, and the droppings.
Let’s consider an example of another, slightly less serious everyday problem that can be examined in the same way.
- Question: Why did he get a ticket?
Answer: Because he was driving too fast (ok, ok…) - Question: Why was he driving so fast?
Answer: John was late for work. - Question: Why was John late for work?
Answer: He overslept. - Question: Why did he oversleep?
Answer: He didn’t go to bed until 2.30am. - Question: Why did he go to bed so late?
Answer: He was watching Australia and New Zealand play rugby on TV.
The solution? Perhaps John should learn how to use TiVo for future antipodean sporting events. In any case, it should be easy to see the point, which is that by looking at the problem step-by-step, we can often arrive at a simple solution that is fairly easy to implement.
Of course the idea of this article is to take Lean operating methodologies and apply them to network operations, so let’s look at a couple of examples that apply the 5 Whys process to network problems we might expect to see in real life.
Problem: A business customer reports garbled audio on calls around lunch time.
- Question: Why is the audio garbled?
Answer: because the IP network is overloaded and so packets are being dropped. - Question: Why is the IP network overloaded?
Answer: because several employees are watching Netflix on their lunch break. - Question: Why is Netflix impacting phone calls?
Answer: because the customer’s IP network doesn’t use VLANs/QoS to prioritize voice traffic. - Question: Why doesn’t the IP network prioritize voice traffic?
Answer: because their IT team doesn’t know how to do it. - Question: Why doesn’t the IT team know how to do it?
Answer: because we (the service provider) don’t provide any guidance, documentation or tools.
Or how about this one?
Problem: Calls to a certain town in the next county are failing.
- Question: Why are the calls failing?
Answer: because all circuits are busy on the EAS trunk group to that provider. - Question: Why are all the circuits busy?
Answer: because that town is growing and so traffic has been increasing over the past year. - Question: Why didn’t we notice that traffic was increasing?
Answer: because no-one monitors our trunk usage statistics. - Question: Why doesn’t anyone monitor the usage stats?
Answer: Bob used to do it, but after he retired no-one thought to look at them. - Question: Why did no-one think to look at the stats?
Answer: because we had no clear definition of Bob’s duties, and no system to remind people of regular tasks.
In this case the solution might include some, or ideally, all of the following.
- making sure that everyone has clear job descriptions
- making sure there’s a reminder system in place (perhaps automatic emails, or calendar events) to make sure everyone remembers to take care of those tasks (like checking the usage stats) that need to be done once/week or once/month
- making sure there are alarms in place on each trunk group to notify the NOC if usage ever goes above 75%.
You may also notice that by asking slightly different questions we may also have ended up discussing how overflow routes are set up in the switch translations and what process is in place for verifying and testing overflow routes to each destination. In a good 5 Whys process you will often identify several different things that went wrong (perhaps several answers to a particular “Why”, or several different questions that could be asked), and by fixing all of them you can significantly reduce the chances of something similar happening in the future.
Implementation and training
This means that by getting up and observing the problem, asking why the problem occurs, and in doing so reserving judgement and criticism, you can gain a better understanding of how to fix the problem.
- Be sure to formalize and describe the problem completely and in writing; doing so will keep your focus on the problem and not on any distractions that you may be exposed to.
- When asking “why,” look only for answers that are supported by observable fact, not mere possibilities; and in doing so you will keep the process systematic and objective rather than based on subjective reasoning.
- Put your questions and answers in writing as complete sentences and statements so that they are clearly understood by all.
- Resist the temptation to oversimplify more complex issues and those that are critical to major operations. In some cases you may not know the answers, in which case you may need to create a hypothesis and test some theories.
As you train your team on Lean processes such as the 5 Whys, it is important to remember that what you really want to promote is a culture of continuous improvement.
Continuous improvement is found in Toyota’s PDCA (Plan, Do, Check, Act) cycle and Root Cause Analysis is an important component. In the “plan” stage, the problem is defined, and the root cause analysis (5 Whys) is performed in the “do” stage. In the “check” stage, the solution is tested and in the “act” stage it is fully implemented before the cycle starts over.
It is wise to include a variety of personnel in the group that are knowledgeable about the particular failure in question, including customers in some cases.
If you are training more than 5 people, break them up into teams and assign one problem at a time to each team.- Start by defining the problem on a whiteboard or flip chart in the simplest and most accurate terms. The problem should be defined without implying any possible cause and refined by consensus of the team.
- Ask the first “Why?” in writing, under the problem statement and write down the answer. The answer should be a single sentence that is factual and in simple terms.
- Remember that as you drill down into the root cause, you are at first really just identifying symptoms of the problem. If the answer to step 2 does not solve the problem, repeat steps 1 and 2.
- You can suggest that the question “why” be asked 5 times, but what is most important is that the final answer addresses the root cause rather than merely another symptom. This may not always need 5 iterations, but often it will be more – we use 5 as a guideline to make sure that people are really digging down to the root cause.
- If you arrive at a likely solution that is agreeable to the team, write that down as your proposed solution to the original problem.
Getting everyone on board
Overcoming the first challenge of resistance requires both management and employee buy-in, and since the 5 Whys is so simple and cost effective, with very little effort required, getting buy-in shouldn’t be too hard, especially when you place an emphasis on leadership and showing respect to those who will be part of the team.
Another challenge is to figure out how to implement the “check” stage of the PDCA cycle, which requires some sort of metric or a way to measure the effectiveness of your solution. As you define the problem and arrive at the solution, be sure to include some sort of self-evident measure of success or failure so you know whether or not you need to continue to work on a problem or not. In many cases you’ll have concrete evidence that the problem was caused by a certain piece of configuration, in which case you’re fine – but in other cases you may not be entirely sure why the problem occurred and may need to do some lab testing to check your theory before moving forwards.
Finally, once a solution has been confirmed, you can overcome the challenge of implementation by building your solution into your process so that they are not easily missed – so processes become “error proof.”
It may seem as though you are adding more work to your processes by adding new requirements and preventative measures, but keep in mind that it takes more effort and resources to correct a problem than it does to prevent one. Every time work needs to be redone, 100% of that effort is wasted, so when you take sufficient time to effectively solve a problem, the waste you eliminate translates to an improvement in the bottom line.
This is a key point: by implementing the 5 Whys process you are making an up-front investment in quality. It WILL require more work to thoroughly investigate your trouble tickets, but over time you’ll see a massive return on that investment as the time spent on trouble tickets reduces and through the improvement in service quality to your customers.
Is it worth it?
Implementing Lean in any service sector places an emphasis on customer service and problem resolution to increase efficiency and to reduce waste. These efforts should then result in significant reductions in the time it takes to solve problems, and in the number of problems encountered.
When you focus your efforts on taking a “customer first” approach, and create a culture of continuous improvement where everyone has the basic tools to quickly and effectively solve problems, you will find that fewer and fewer problems occur.
Fewer problems LEADS TO happier customers AND a happier team.
Sounds good to me.
If you’d like to begin implementing the 5 Whys process in your business, read more here.