Disaster Planning

When Failure is Not an Option

Article Posted: August 26, 2011

How do businesses identify potential failures within their operations and mitigate risks?It’s been a little over a year since one of the worst environmental disasters in U.S. history occurred in the Gulf of Mexico, killing eleven people and releasing millions of gallons of crude oil into the North Atlantic Ocean. The circumstances that led up to that fateful event are a study in how process and equipment failures can lead to tragic losses.

You’re probably thinking, “What does any of this have to do with managing my vivarium?” The answer is plenty! Many people either don’t realize it or take it for granted, but to adequately house and maintain laboratory animals involves complex processes and very specialized equipment, where resultant failures can be inconvenient at best and catastrophic at their worst. This is especially true as the amount of technology and robotics become more prevalent in every day vivarium operations.

So how do other businesses identify potential failures within their operations and mitigate risks associated with things like equipment breakdowns, utility and supply chain disruptions, facility issues, etc.? The answer is a methodology called “Process Failure Mode and Effects Analysis” or PFMEA—and it has been commonly used in other industries for decades.

Background of PFMEA
PFMEA as a methodology was first developed by the U.S. Department of Defense in 1949 to assess the potential for battlefield failures in an effort to try and reduce war casualties. Later, the same methodology was adopted by NASA as a way to mitigate risk associated with manned space flight.

The first commercial application of PFMEA was by Ford Motor Company in the 1970s to eliminate design and manufacturing errors in automobile production. Since then, it’s been used in such diverse industries as healthcare, banking, aerospace, and transportation, to name a few. Today, PFMEAs are required by certain regulatory agencies for healthcare and by corporate mandate for such operations as automobile manufacturing.

Exactly what is a PFMEA and how is it used? Basically, this methodology is an analytical approach to understanding how and why things can fail, the effect that various types or “modes” of failure will have on a process or operation and most importantly, it is a way to evaluate the risk of that failure and decide what to do about it before it happens.

The approach is quite simple and follows the PFMEA form shown in Figure 1. It is important to note that a failure describes a way in which the process could potentially fail to meet its intended requirements. It doesn’t necessarily have to be something that is broken, but it could be that a step in the process was skipped or not done properly. The potential effect is a direct result of a failure and described in terms of how the failure would affect the system.

As you’d guess with most things borne out of an engineering exercise; the PFMEA uses a relatively straightforward mathematical equation resulting in a Risk Priority Number (RPN) to help prioritize those failure modes that constitute the greatest risk to the identified process being analyzed. The RPN formula is as follows:

RPN =Severity (SEV) x Occurrence (OCC) x Detection (DET)

  • The Severity (SEV) provides an assessment of the seriousness of the effect(s) of the failure on the operation. Scoring ranges from 1 to 8 with higher score indicating very high severity.
  • The Occurrence (OCC) is the predicted frequency of the occurrence, and represents the anticipated failures during normal operation. Scoring ranges from 1 to 10 with higher number indicating higher probability of occurrence
  • The Detection (DET) is a rank of the probability that the control within the process will detect the non-conformance. Scoring ranges from 1 to 10 with higher number indicating less likelihood of detection.

Eventually, as an organization uses the PFMEA process they will begin to develop their own criterion for ranking the severity, occurrence, and detection specific to that organization or department within it. Anything greater than 200 RPN represents significant risk and is likely that some preventative action should be taken, typically by trying to reduce the frequency of occurrence or improving the method of detection.

Figure 1
(Click Image For A Larger Version)

Related Topics: Air Monitoring and Control September 2011 ALN What's That Engineer Doing in My Vivarium? HVAC Design HVAC Systems Regulatory Compliance Consultation Training and Training Materials Disaster Planning