This article originally appeared in the 2018 September/October issue of PARCEL.

In our personal lives, it can be uncomfortable to answer the scary “what if?” questions about our cars, our homes, our health, and our lives themselves. But with respect to Murphy’s Law, we do answer them. Let’s face it: Stuff happens, so most of us buy insurance to protect ourselves and our families from the economic risks associated with those “what if?” events.

Businesses buy insurance too, and most establish policies that address health and safety risks. They comply with safety regulations such as building codes, electrical codes, fire codes, and the standards set in the US by the Occupational Safety and Health Administration (OSHA). Beyond these more obvious health and safety risks, management teams are often so focused on running their businesses that they don’t pay much attention to the operational risks their businesses are exposed to.

Uncovering Operational Risks

Operational risks can seriously hinder the day-to-day operations of a business — and even prevent it from doing business at all. These operational risks are not so obvious, and they are commonly unforeseen — out of sight and out of mind. Yet the value of consistently and proactively identifying and mitigating them flows straight to the bottom line.

A typical distribution center (DC) is filled with operational risks, and these risks can have serious implications. After all, the primary mission of the modern DC is to serve as the backbone of the supply chain. Profits come from customers, so distribution-intensive businesses rely implicitly on their DCs to fulfill their customers’ orders promptly, accurately, and inexpensively. If a DC is unable to perform its vital mission, the business will lose sales, customers, and profits.

An Introduction to FMEA

A business can identify and eliminate operational risks in the DC by applying the principles of Failure Mode and Effects Analysis (FMEA). FMEA is an objective, systematic, iterative approach to mitigating risk in a design, process, product, or service to identify and prioritize potential failures, so that users can reduce or eliminate them.

As a practice, FMEA was developed by the US military in 1949. After that, FMEA was adopted commercially by the aerospace, chemical, nuclear, petroleum, and automotive industries. It was used extensively by the aerospace industry and NASA during the Apollo program. In 1988, FMEA was adopted as an ISO 9000 standard. In the 1990s, the Automotive Industry Action Group (AIAG) and the American Society for Quality Control (ASQC) embraced FMEA. In response to FDA requirements, FMEA was also adopted by the medical device industry. More recently, the supply chain industry has begun to understand the value of FMEA.

FMEA differs somewhat from its cousin, FMECA (Failure Mode, Effects, and Criticality Analysis). FMEA and FMECA are both quantitative and use similar approaches and methodologies. But FMEA scores potential risks based on likelihood and impact using ratings, while FMECA scores potential risks based on more extensive probability and statistics.

Put simply, the goal of FMEA is to identify, evaluate, and prioritize possible failures, and then reduce or eliminate them. It’s not as complicated as it sounds. And properly applied, FMEA is highly effective for mitigating risks that can bring a DC to its knees.

Applying FMEA to a DC Operation

The practice of FMEA in a DC involves asking and answering these important questions about each major operation in the DC:

  • What are the steps in each operational process?
  • What could fail at each step (the failure mode)?
  • What could cause it to fail?
  • What is the likelihood that it will fail?
  • What would be the impact (the failure effects) on the business if it fails?
  • Which potential failures present the greatest risks?
  • What controls could reduce or prevent the likelihood of each potential failure?
  • What controls could reduce or prevent the impact of each potential failure?

FMEA can be applied to an operation using either a “bottom up” or “top down” approach. The “bottom up” approach begins with an item, task, part, software code, person, or other detail and methodically rolls up to the subsystem to identify systemic results of failures. The “top down” approach begins with the subsystem and methodically drills down to the item, task, part, software code, person, or other detail to identify the discrete causes of failures.

Which approach is better for a DC? Either approach can be effective, but the “bottom up” approach is typically recommended for evaluating risks associated with individual entities, such as the design of a product. The “top down” approach is typically recommended for holistically evaluating risks associated with complex, interconnected operations, such as those in a DC.

Evaluating and Controlling Potential Operational Risks in a DC

Fundamentally, management can evaluate two unique attributes of any potential failure in a DC. The first measure is the failure’s likelihood, which is based on the realistic possibility that the failure can occur. The likelihood measure is influenced by the likelihood of the occurrence of such a failure, the likelihood that the failure would cause harm, and the likelihood that the failure would not be detected if it occurred. The second measure of a failure is its potential impact, which is based on the realistic consequences of the failure if it does occur. The impact measure is influenced by the failure’s impact on time (such as downtime), the impact on other DC operations, and most importantly, the impact on customers served by the DC.

Two risk controls go hand-in-hand with these two attributes: Preventive controls and preparedness controls. Preventive controls are designed to reduce a failure’s likelihood, while preparedness controls are designed to reduce a failure’s potential impact. As an example, consider the risk of a house fire in any home. Preventive controls for house fires would include cleaning the chimney, following safe practices when cooking, and limiting the number of electrical items plugged into each receptacle. Preparedness controls would include installing good smoke detectors, changing their batteries, and placing fire extinguishers in strategic locations around the home.

A risk profile number (RPN) should be assigned to each potential failure. The RPN is an objective, composite score of the potential failure, calculated based upon its likelihood and impact. It is used for ranking and prioritizing possible failures. The RPN helps DC management avoid wasting time, money, and resources on less significant risks and focuses their attention on higher-priority potential failures.

An Effective Operational Risk-Mitigation Program

When management is committed to evaluating and mitigating DC risks, they should establish an effective risk assessment and management program that follows these eight steps:

  • Identify critical subsystems (processes, equipment, technologies, inventory, and people) within each DC operation.
  • Apply FMEA comprehensively and objectively, following a “top down” approach.
  • Identify potential failures within each subsystem — especially potential single points of failure.
  • Assign an RPN score to each potential failure.
  • Evaluate and prioritize all potential failures based on their relative RPN scores.
  • Develop improved preventive and preparedness controls for the high-priority potential failures.
  • Implement the improved controls in the DC operation.
  • Repeat steps 1 through 7 periodically, as the operation evolves.

If the likelihood of a failure is high, then DC management should take the following steps:

  • Evaluate and eliminate causes of the failure.
  • Improve other DC processes that contribute to the causes of failure.
  • Install preventive controls that reduce or prevent failures.
  • Add redundancy to DC operations.
  • Develop and maintain standard operating procedures (SOPs) for preventing possible failures within the DC.

On the other hand, if the potential impact of a failure is high, then DC management should take the following steps:

  • Identify correlated, leading events that can serve as “warning flags.”
  • Add a step to an earlier event’s process to prevent the failure.
  • Use event management tools such as alarms and notifications to proactively alert users when conditions in the DC indicate a possible or imminent failure.
  • Train DC staff to recognize the “warning flags” and monitor event management tools, so they can intervene proactively.
  • Install preparedness controls that reduce or prevent the consequences of failures.
  • Design workarounds to DC operations.
  • Develop and maintain SOPs for dealing with possible failures within the DC, in the event they do occur.

Typical Results

The outcomes of effective risk mitigation in DC operations can be highly valuable to the business and its customers. By using a standardized approach to mitigating risk based on FMEA and by developing comprehensive SOPs, the DC staff will develop a culture of proactive problem prevention rather than reactive problem solving. They will also maintain a living record of potential failures and associated controls, resulting in improved reliability, reduced operating costs, and reduced losses in the DC, as well as greater customer satisfaction.

Steve Hopper is founder and Principal, Inviscid Consulting. He will be speaking on this topic on Wednesday, September 26 at the 2018 PARCEL Forum.