Benefits of a Failure Mode Approach to Reliability-Centered Maintenance

The core of reliability-centered maintenance (RCM) analysis lies in the answer to seven questions:

  1. What are the functions and associated desired standards of performance of the asset in its present operating context (functions)?
  2. In what ways can the asset fail to fulfill its functions (functional failures)?
  3. What causes each functional failure (failure modes)?
  4. What happens when each failure occurs (failure effects)?
  5. In what way does each failure matter (failure consequences)?
  6. What should be done to predict or prevent each failure (proactive tasks and task intervals)?
  7. What should be done if a suitable proactive task cannot be found (default actions)?
NOTE: For question number six, inspection techniques that adequately identify defects during normal operation are preferred over those that require downtime. Less invasive action is preferred over more invasive.

This is one of the fundamental concepts of any well-defined maintenance strategy.

Contained within question number three is the term failure mode. A failure mode is the condition that exists that will cause a functional failure. Another way to think of it is simply:

Failure mode = part + what is wrong with the part + the reason

The "part" is a group of pieces that make up a component. Examples from the ISO standard include impeller, seal, shaft, bolt, nut, and bearing.

What is wrong with the part” refers to the effect of the failure mechanism. Examples include failed, damaged, out of adjustment, abrasion, erosion, corrosion, fatigued, burnt, and broken.

The "reason" is the physical cause of the problem. This could be age, improper lubrication, misalignment, imbalance, or improper installation, among others.

For example:

One month, a misalignment condition is noted on a pump that is directly coupled to an electric motor. The pump was recently replaced due to wear on the impeller. The post-maintenance follow-up by the vibration analyst revealed that the alignment was performed improperly. The motor-pump combination is now running in a misaligned condition. The failure mode noted on the vibration analyst’s exception report would be Shaft – Misalignment – Improper Installation.

Just two months later, the misalignment condition having never been corrected, the vibration analyst now detects an outer race defect on the pump bearing. The failure mode noted on the exception report is now Bearing – Fatigued – Shaft Misalignment.

This example should help demonstrate how condition monitoring programs allow for the acceleration of root cause failure analysis (RCFA). Having the vibration analysis reports to review during the RCFA makes the process much faster because documented proof is readily available.

The above situation is an extremely typical example of defects creating increased maintenance labor and material costs while increasing the amount of unplanned (or even planned) downtime. Planned downtime is increased by the fact that replacing the pump bearings takes significantly longer than properly aligning the shafts, the initial failure that was noted.

This example also builds an excellent case for procedure-based maintenance and improved craft skills. Had the craftsmen aligned the shafts properly upon replacing the impeller, none of this would have occurred. This is also an excellent example of why the RCM analysis team should include a condition-based maintenance (CBM) specialist.

NOTE: The list of failure modes covered in an RCM analysis need not be an exhaustive list. It should only include the predominant failure modes that represent the failures that have previously occurred and the failures that are very likely to happen. This philosophy is decidedly different from the original system laid out by Nowlan and Heap, which John Moubray later formalized.

The biggest advantage of RCM is the fact that the analysis team, and by extension the organization, begin to think in a failure modes manner. They realize that there is a myriad of non-value-added tasks in a typical maintenance program that not only waste valuable crafts time but by means of intrusive inspections, increase their chances for infant mortality problems like improper reassembly or lubrication contamination.

The analysis team also realizes a clear set of guidelines for determining what tasks are to remain a part of the maintenance strategy. Any task that is to remain in the maintenance program must meet at least one of the following criteria:

  1. Task prevents a failure mode from occurring.
    • These are the most powerful tasks. Anything that can be done to prevent a failure mode from occurring and not simultaneously create more risk is the best answer.
    • Examples include inspecting and calibrating a meter to prevent equipment malfunction and lubricating a motor bearing to prevent damage.
  2. Task detects failure modes once they have occurred.
    • These are the most prevalent. Remember, operating inspections are most preferred because they require no downtime. These are also known as failure-finding tasks.
    • One type of failure-finding task is an inspection to check for the functional failure of a hidden function component that is not/may not be evident to the operating crew during normal operation. A classic example is the testing of an emergency stop (E-Stop) on a machine. This can only be tested by activating the E-Stop, which is not normally done during the operation of the machine.
  3. Task is statutory or regulatory in nature (i.e., required by federal or state agencies for environmental, health, and safety reasons).
  4. Task is administrative in nature.

This failure modes style of thinking quickly separates the value-added from the non-value added. Additionally, this type of analysis need not be limited to the creation of an equipment maintenance plan (EMP); it can also be applied to the redesign of an existing EMP.

Performing the preceding analysis on an already existing preventive maintenance (PM) strategy allows for the non-value-added tasks to be removed from the PM program and either deleted or reassigned to more appropriate personnel. It also calls out which PM tasks need to be kept and which ones need to be cleaned up in terms of wording and formatting to create a more quantitative, repeatable procedure. This exercise is called a preventive maintenance evaluation (PME).

A PME can be done in one of two ways. A sample PME can be done at the beginning of a reliability improvement initiative to build some momentum around the types of changes possible and start to define the size of the changes that could happen. This is typically performed on 200-300 PM tasks that are deemed to be representative of the entire PM program. The PM tasks should be selected from across 20-25 different equipment types in the plant and from a combination of monthly, quarterly, and annual PM tasks.

Secondly, a full PME can be done. This is typically performed on the entire PM library, towards the middle of a reliability improvement initiative. This is done to calculate precisely how many craft resources will be freed up and how many PM tasks need to be re-engineered into the proper format.

The organization does not have to implement the results of the PME right away. The output of the study can follow a staged implementation. For example,

  1. All tasks deemed non-value-added: These tasks can and should be deleted from the program immediately. This will not be detrimental to the equipment performance because the tasks, by definition, hold no value. This is an excellent time to begin reengineering the tasks that need to be made more qualitative and repeatable.
  2. All tasks deemed non-value-added: reassign to operator care: These tasks do not require a skilled maintenance craft person to be successfully completed. These tasks should only be assigned to the individual operator(s) after the proper task procedure has been created and the operator has been task qualified for both the written procedure and the physical procedure. This step gets the operator(s) more intimately involved with the maintenance of the equipment and provides another line of defense against equipment failures.
  3. All tasks deemed non-value-added: reassign to lube route: Lubrication tasks require a significant amount of training to be performed correctly. Contamination control and sound lubrication fundamentals are broad topics and should be accounted for in the design of the procedure. These tasks, like the operator care tasks, should only be assigned to the individual lube technician after the proper task procedure has been created and the lube technician has been task qualified for both the written procedure and the physical procedure.
  4. All tasks deemed Reassign to predictive maintenance (PdM): Parallel to the PME process, the PdM improvement process should be taking shape and it should be time to relieve the PM program of all tasks that were previously deemed reassigned to PdM. This should only be performed after the PdM program is up and running, and like all these steps, can be done department by department as the technicians are ready to increase their coverage.

Remember, this will not be as simple as throwing a light switch. Changes to the process workflows must be made and people need to be trained on changes in the workflow. Different metrics may need to be created to measure implementation effectiveness and overall system efficiency. All four of these steps can be done department by department, as the reliability improvement team tackles tasks and gets everything ready for rollout.

Reliability Improvement Roadmap

In this workshop, Allied Reliability will introduce the strategies required for asset reliability programs and provide insight into where your company can best benefit from aligned focus and improvement.

Register Today


Allied Reliability provides asset management consulting and predictive maintenance solutions across the lifecycle of your production assets to deliver required throughput at lowest operating cost while managing asset risk. We do this by partnering with our clients, applying our proven asset management methodology, and leveraging decades of practitioner experience across more verticals than any other provider. Our asset management solutions include Consulting & Training, Condition-based Maintenance, Industrial Staffing, Electrical Services, and Machine Reliability.

Subscribe to our Blog

Receive the latest insights on reliability, maintenance, and asset management best practices.