Root Cause Analysis Explained: Definition, Examples, and Methods
People often use root cause analysis in everyday situations as one of many analysis methods without realizing it.
A doctor will essentially perform a root cause analysis on their patients to determine the causal factors of an illness’ symptoms. Similarly, a technician will be able to identify the source of the problem or causal factors. This would apply to a vehicle or piece of machinery not performing as expected.
Although a company’s under-performance can be caused by different potential problems, a root cause analysis will determine the reason.
Once the cause of the problem has been identified, it can, first of all, be corrected, after which corrective action can be implemented to prevent the same problem from occurring again. Companies often implement a corrective action tracking process to manage RCAs.
Definition of Root-Cause Analysis
Root-cause analysis (RCA) can be described as a systematic process that is used to investigate an issue that uses proven techniques to gather information about the problem, identifies more than one cause, prioritizes them, and determine potential solutions. It can be used in virtually all industries, from software development and IT to consumer goods and manufacturing, as well as the provision of services.
A root cause is an action or event that leads to a nonconformance and has to be removed permanently, often by improving a process. The root cause, or core issue, is what began the chain of events that ultimately led to the problem. It is also the underlying reason.
Root cause analysis is a very wide term that references several procedures, analysis tools, and methodologies that are commonly used to identify an issue’s root causes. Although RCA is used in ISO 9001 to help resolve non-conformances, it is also often used in IT sectors such as infrastructure management, DevOps, and software development where the company is not necessarily ISO certified.
As it is part of nonconformance reporting in ISO 9001, non conformance software contains a section where the RCA can be recorded.
Root Cause Analysis Goals
The goal of RCA is to identify an issue’s fundamental source by using a specific set of processes and their tools. This will allow you to:
- Determine the circumstances under which the issue occurred.
- Carefully analyze the issue’s cause.
- Implement solutions to prevent the issue from happening again (Corrective action).
RCA presumes system and event interdependence. This means that a specific action will lead to a chain of consequent actions in related areas. You can find the origin of the problem and how it developed into the issue that ultimately causes the non-conformance by going back through these acts.
The root cause analysis procedure is reactive and it is carried out after an occurrence of a problem. Once a root cause analysis has been performed, it turns into a proactive mechanism as it may be used to identify future issues before they occur.
There is a very good chance a failure will happen again if only the symptom of the issue is addressed but the underlying cause is not addressed.
One example would be a broken belt on a piece of machinery being replaced. If an RCA is performed, it may reveal that an underlying cause is that a misaligned component resulted in the belt overheating and breaking. If only the belt were replaced but the misaligned component is left in place, the belt will, sooner rather than later, fail again.
RCA attempts to identify the root cause that, when addressed, will eliminate all the subsequent defects by tracing the causal chain of events from one problem to the next. This process can be applied to IT infrastructure, physical systems, as well as business processes.
Root Cause Analysis Techniques
There are several root cause analysis techniques available depending on the type of issue that needs to be resolved.
· Production-based RCA: Production-based RCA is typically used by manufacturers to improve quality control. You may use this to identify the cause of warped injection-molded glass bottles leaving a production line.
· Safety-based RCA: This approach combines the fields of workplace health and safety with accident analysis. This type of root cause analysis is used to identify the underlying cause of accidents in the workplace, such as why someone cut themself or why a worker fell from a height.
· Failure-based RCA: This is used in maintenance and engineering to identify the main causes why equipment fails.
· Process-based RCA: In manufacturing as well as in business, process-based RCA can be used to identify any process or system problems. One example would be to use this in accounting to determine why suppliers aren’t paid timeously.
· Systems-based RCA: This technique started as a combination of a number of the techniques described above and combines two or more RCA techniques. It can be used for many different applications.
Effective Root Cause Analysis
The fundamental steps of an effective root cause analysis process are described below:
Problem Identification
The adage “A problem described properly is a problem half solved” is frequently used in the problem-solving community. An accurate problem description can help to focus and advance the diagnosis.
Before the problem is defined, it’s worthwhile to deliberate whether the problem is important enough to work on in comparison to other problems or whether its scope is constrained sufficiently enough to permit analysis with a high signal-to-noise ratio. This simply means that you need to first evaluate if the amount of useful information that you’ll get from the RCA, compared to the amount that has no use will be enough.
If you apply a wide focus that includes several similar problems that each have a distinct cause, a poor signal-to-noise ratio will make it difficult to differentiate between cause-and-effect linkages. It may therefore be necessary to apply a filter that separates issues that have to be fixed from those that can only be tracked before you implement solutions and take corrective action.
This screening will however not be required for all problems identified in an organization.
It is also crucial to analyze a company’s other issues before looking at a problem and to decide which ones require resources to be reallocated to address the situation. This will require examining the relative frequency of the problem, the cost associated with the non-conformance, hazards, and opportunity costs.
Understanding The Procedure
Many businesses skip the crucial step of doing a change analysis or evaluating the procedures that may have failed when they identify the problem, but rather form an intuitive or hasty conclusion on where the issue was likely first encountered.
This results in many other possible explanations never even being considered. Taking a step back and looking at the overall issue before starting to focus on potential reasons is key to fully understanding the procedure. In cases where a problem was previously believed to have been solved but has now resurfaced, this is especially helpful. To begin understanding the procedure, a set of parameters for the diagnostic must be developed as the first problem-solving step.
Potential Cause Determination
This is the core of the RCA. In this stage, you reconstruct a chronology of events that will help you in identifying the specific event or events that contributed to the issue and any other issues that coexist with it. This technique must be used to identify specific causative elements.
There are 3 ways to identify potential causes:
(1) View each step on a flowchart as a possible cause.
(2) To find potential causes at a systems level, use a logic tree.
(3) Use a cause-and-effect diagram to generate a list of likely causes.
Data Collection
The aim of collecting data is to determine whether there are correlations between two variables A and B. The result of a procedure, or parameter B, is what the problem statement refers to. There are often several A factors, and it is believed that A influences B.
The data-gathering method is aimed at assisting in working through the various factors to determine which one(s) has contributed to the problem. This often means having to determine which entity led to the issue before determining the condition or state of the entity. If the issue is widespread, data can be layered in different ways to identify patterns that may increase the probability that a specific factor did or did not cause the issue.
Analyzing The Data
Analyzing the data aims at finding causative factors and the core causes of those causal variables. This will help in determining the root cause(s) for each causative element. If this procedure does not find a causative element, you will later overlook some fundamental causes. The primary goals of data and systems analysis are relevance assessment and data organization and using this to develop a model of the origin of the problem.
Recommendations
Recommendations are developed after data analysis has been completed and the root causes have been discovered and form part of process improvement. Implementing a proposal should eliminate the causal element and fundamental underlying reasons for the non-conformance. Implementing the recommendations should therefore prevent the series of actions that led to the non-conformance.
This should stop the incident and its underlying causes from recurring. Only recommendations that are implemented and can afterward be shown to have been successful will benefit an organization. Recommendations must therefore be applicable, doable, and attainable.
Root-Cause Analysis Methods
Various methods are commonly used to perform root cause analysis.
5 WHYs Analysis
You can use the 5 Whys analysis to further explore until you find the actual root cause of an issue. When a problem is caused by more than one reason, you can split the process, or you can ask why repeatedly until the core cause has been determined.
You can organize the Whys into boxes like a fault tree or flowchart, although that’s not required. When using the 5 y’s analysis, the ultimate cause should follow the proximate cause.
Remember these points when using the 5 whys for conducting root cause analysis:
- The real root cause for an issue can be found by asking why five times.
- The 5 whys root cause analysis technique may be used together with other quantitative approaches which may be more accurate.
- You can use the 5 y’s analysis to identify incidence as well as failure.
5 Whys Examples
5 Whys Example 1
Problem statement – A customer refused to pay a progress payment during a big project.
- Why did the customer refuse to pay? Because the activity was completed late.
- Why was the activity completed late? Because it took longer than we thought it would.
- Why did the activity take longer? Because too little material was bought.
- Why was too little material bought? Because we didn’t place the order on time.
- Why was the order not placed on time? Because the work schedule was not analyzed properly before the start of the project.
The root cause of the problem is that the work schedule was not analyzed properly before the start of the project.
Some possible corrective actions would be to improve communication channels within the project team or to set up a project checklist that includes work schedule analysis timeously.
5 Whys Example 2
Problem statement – a piece of machinery in a production line stopped working.
- Why did the machine stop? Because a fuse blew due to the electrical circuit overloading.
- Why did the circuit overload? Because the bearings had insufficient lubrication, causing them to seize.
- Why did the bearings not have enough lubrication? Because the machine’s oil pump did not circulate enough oil.
- Why did the pump not circulate enough oil? Because the intake of the pump was clogged with debris.
- Why is the pump intake clogged with debris? Because the pump does not have a filter.
The root cause of the problem is that there is no filter on the pump.
A possible corrective action is to fit a filter to the pump’s filter.
Note: Although asking the question “why?” 5 times with this method will generally lead you to the root cause of the problem, this rule is not cast in stone. In some cases, you will get to the root cause with fewer questions, while in others you may need more.
Ishikawa diagram
Developed by Kaoru Ishikawa, the Ishikawa diagram, also known as a cause-and-effect diagram or a fishbone diagram, the fishbone root cause analysis is used to illustrate probable causes that may result in a specific effect. It is often helpful to stimulate brainstorming when a problem’s likely cause is not certain.
When an Ishikawa diagram is used the 6 Ms (measures, man, milieu, material, machines, and methods) need to be considered. These variables can and should be changed to match specific procedures. The variables may for example be altered if the fundamental cause has already been narrowed down to a specific region. The process impacted may however still be influenced by some other elements contributing factors, like material or measurement and this means that those aspects may still be relevant to the root cause.
Pareto chart
The Pareto principle is commonly known as the 80/20 rule. This rule states that 20% of the issues cause 80% of the expenses. A Pareto chart can be used to identify problems where improvement will be the most beneficial. The main application of this tool is to prioritize issues.
The Pareto principle should be seen as a suggestion and not a law. You should for example solve occasional safety problems with a big potential impact before you solve more prevalent ones where the consequences are less severe.
A Pareto chart can be used as part of the continuous improvement process for failure costs, failure locations, failure types, or other categories as required.
Fault Tree Analysis
The fault tree analysis (FTA) process is used in an incident investigation into the root causes of system failures. In fault tree analysis, risks are ranked in importance, and this allows the most significant issues to be fixed first. It uses a top-down strategy to identify component-level failures (basic events) that result in system-level failures (top events) and combines them with Boolean logic.
Failure mode and effects analysis (FMEA)
This proactive root cause analysis method identifies probable equipment or system failure.
FMEA diagrams typically show:
- Failure causes, effects, and scenarios.
- The safeguards that have been implemented against every type of failure.
- The risk priority number as well as the next course of action can be determined by using ratings for SOD – severity, occurrence, and detection.
FMEA combines reliability engineering, safety engineering, and quality control efforts. It analyzes past data to forecast failures and flaws in the future.
Scatter diagrams
Scatter diagrams, also known as scatter plots are used to analyze potential associations between paired data, like for example the temperature used to smelt steel and the resultant hardness of the steel.
Using a scatter plot, analysts can determine an absence of a correlation, a weak positive correlation, or a high positive correlation.
Hazard and operability analysis (HAZOP)
This approach is a systematic risk management and system analysis tool.
The HAZOP technique is often used to detect operability issues and possible risks in systems that may lead to nonconforming goods. HAZOP is based on the theory that deviations from operating or design goals result in risk occurrences.
The Challenger Interview
Much like the 5 Whys analysis, the Challenger Interview method for root cause analysis emphasizes asking why repeatedly. It however does not try to figure out what happened, but rather focuses on: “Why does it matter? The Challenger Interview informs you about people’s underlying ambitions and motivations to identify the actual problem, opportunity, or challenge that should be solved.
Role-playing
The goal of role-playing is to take on another person’s role. Deep insights into the underlying root causes of an issue can be provided by understanding another person’s viewpoint. This person may be someone whose issue you need to fix or a potential user.