What is Preventive Maintenance?

Preventive Maintenance Defined

In the most theoretical sense, Preventive Maintenance (PM) is any maintenance activity that is performed to prevent a piece of equipment from failing.

Preventive maintenance activities may take several forms, but suffice it to say all forms of PM are intended to do one of two things:

  1. prevent the next occurrence of an equipment failure (only reengineering can possibly prevent it forever),
  2. or detect the presence of an impending failure.

Preventive maintenance has another characteristic in that it happens on a very predictable frequency. This is because PMs should be established to detect age-related or wear-out type failure modes. Therefore, preventive maintenance is any maintenance activity that is performed at a fixed interval.

The interval is usually based on operating time, such as every so many days, weeks, months, or years; however, the interval can also be based on throughput (aka meter-driven), such as every so many gallons of fuel burned, miles driven, or boxes produced. Either way, PM activities occur at fixed intervals. Therefore, any activity that is performed on demand is not a PM activity. These characteristics of PM have been misinterpreted over the years resulting in preventive maintenance programs that look very different from one company to the next.

What Can We Learn From the History of Preventive Maintenance?

The oldest version of a PM program entailed a PM person walking around looking at, listening to, and feeling equipment while it ran, to determine whether the machine was working fine or not. Any discrepancy found during these walk-around inspections was either fixed on the fly or during a brief interruption of service, such as a shift change for example.

Some companies still subscribe to this style of maintenance. PM mechanics are assigned to constantly survey the machinery in order to detect issues that need corrective maintenance. Of course, the issue with this style of inspection is that the defect must be very late in its failure progression to be detected. This creates a scenario where production must be interrupted to fix the problem and the warehouse must keep a very large inventory of parts to deal with whatever may arise.

In the late 1960s, this style of maintenance was found to be too costly and produce too much non-productive downtime due to not knowing what was going to break next and how long it would take to fix. This scenario gave way to a different PM execution model.

In the early 1970s, the concept of time-based replacement came into popularity. There was a belief that all failures were a function of time or throughput and were therefore very predictable. With this predictability, organizations could organize their preventive maintenance strategy to replace parts or components at fixed intervals known well in advance and thereby avoid the problem of not knowing when the next failure would occur. They believed that accurate record-keeping, and simple statistics would solve their reliability problems.

Much to the chagrin of those who followed this philosophy, it did not work. Maintenance costs skyrocketed and system reliability went by and large unchanged. The problem was a lack of understanding of the cause of the problems. When random failures occurred, the traditional statistical opinion was that not enough data had been collected yet. The companies thought that with enough data, no failure would be random, all would be predictable. While this is the case, it was the type of data being collected that was the problem. Simply tracking hours to failure is a lagging indicator and will never point to the nature of the problem. The adoption of a system of 100% time-based replacement is not the answer. This realization paved the way for the breakthrough that would be made public in December of 1978.

The next style of PM that found some prevalence in the industrial manufacturing world was a hybrid of the previous styles. For years, maintenance people would plan a PM task to go to a piece of equipment, disassemble it, and fix whatever the technician deemed needed fixing. This is probably the most common style seen today. The monthly preventive maintenance schedule says things like “inspect the following components and repair as needed”. In essence, this is a blank check for the technician to apply whatever level of rigor the technician feels is required in a given situation.

Of course, this leads to a high degree of variation in the PM effort as different technicians apply their own idea of how bad is bad enough to work on at that moment. It also means that the degree of inspection is also dependent on how educated, task qualified, and thorough the technician happens to be. This style of PM is not much better than the PM mechanic walking around and working on whatever has the most smoke rolling off it.

The business results for this method reflect that level of efficacy as well. What is interesting is that maintenance managers grow increasingly frustrated with their PM process because of the lack of performance improvement even though piles of money are spent on the PM effort. This leads to action items such as doing more PM, doing it more often, concentrating on technician wrench time, and rewriting all the procedures to improve equipment reliability, none of which work because they are not the source of the problem.

 

The source of the problem is a lack of understanding about which failure modes are driven by time or cycles and which ones are random in nature.

Nowlan and Heap of United Airlines had developed Reliability Centered Maintenance (RCM) as a failure mode-driven maintenance strategy. In the RCM system, all maintenance tasks are driven by a specific failure mode and have a specific strategy based on the impact of failure and type of failure mode. Failure modes may be random or may wear out with respect to time (see Figure 1).

RCM Failure Curves


Random failure modes require inspections, and the corrective maintenance work is performed based on the condition of the defect at the time of the inspection. A common question is why infant failures are considered random. To answer that question, we must consider the definition of random. Random is literally defined as “having no specific pattern”. While it is true that infant mortality failures happen after work has been performed on a machine, such as after initial start-up or after maintenance activities, this does not happen after every initial start-up or after every maintenance activity. In fact, sometimes these infant failures happen after start-up or maintenance and sometimes they do not. Hence, there is no specific pattern, which is why they are considered random. Wear-out failure modes do not require inspections as frequently as the failure propagation is more predictable. This style of thinking about failure modes, defects, and strategies is perfect for preventive maintenance systems, and though this RCM report is over 30 years old, it remains the gold standard for reliability systems design for maintenance to this day.

Some organizations believe that the PM program should start with the precise original equipment manufacturer’s recommendations. They have the belief that no one knows the assets and equipment better than the people who made it. Unfortunately, this is not always the case. The one thing the manufacturer often does not know is the operating context.

This is not the case of course for purpose-built equipment, but it is certainly the case for general equipment that can be applied in many applications. Figure 2 shows the results of a reliability analysis based on knowledge of the operating context as compared to the OEM recommendations. Note that some of the task intervals had to be changed and some new tasks had to be initiated. This is a surprise to some people who believe that the manufacturer’s recommendations should be followed to the letter.

Failure Mode Analysis of OEM Recommendations


While this may be the case for purpose-built machines, it is not the case for general machinery. As such, the development of an equipment maintenance plan for a general-purpose machine in a specific application should consider the operating context and the operating environment.

The litmus test for whether a maintenance task remains in the equipment maintenance plan is based mostly on the RCM system.

  • Does the task prevent a failure mode?
  • Does the task detect the presence of a failure mode?
  • Is the task regulatory or statutory in nature?

These tests help us determine if a task is value-adding and whether it should remain a part of the maintenance strategy. All too often, non-value-adding (NVA) tasks creep into the program over time and become bigger and bigger problems. The PM program becomes bloated and no matter how much bigger it becomes, it is no more effective, and soon becomes a burden to the organization instead of a program that solves problems. The typical scenario that creates such a large program is that upon experiencing a failure, the organization, believing they can PM their way to reliability, immediately assigns more tasks to the PM, at a higher frequency, and with more people. Of course, this does not address the problem for reasons we will discuss in just a moment, and the organization continues to falter while they spend even more on preventive maintenance. The fallacy is that most problems are random in nature and a time-based replacement strategy is not effective at all in dealing with random problems. To understand this phenomenon, we must first define a few terms.

A failure mode is the local effect of a failure mechanism according to the American Society for Testing and Materials (ASTM). An example might be simply “bent”, “broken”, or “leaking”. For the reliability engineer, this is not descriptive enough to identify the problem and solve it. So, a slightly different definition of failure mode will be used for this discussion. A failure mode then shall be described as the part, the problem, and the reason. Example: Bearing – Fatigued – Misalignment. This is read as: The bearing was fatigued due to misalignment. This description of the failure mode gives us all the information we need to effectively leverage a countermeasure against future failures. This definition will be used for the balance of this article.

Once we understand the failure modes, we can then assign one of the six failure curves found in Figure 1. The A-B-C curves denote an interval-based failure. These are the curves best suited for an interval-based reconditioning or replacement strategy. Curves D-E-F are random failures, and an interval-based strategy will not work for them at all. In fact, it will cost significantly more money and will result in no higher availability. Below is a table for a typical motor with a dominant failure mode that is random. The table shows the difference between a time-based strategy applied to a random failure and an inspection-based strategy for the same failure mode. The calculations were made within a Monte Carlo simulation software with typical costs of failure and repair for an electric motor in a typical manufacturing facility.

Monte Carlo Simulation Results of Task Interval Optimization


It should be obvious from the differences in the run-to-failure column and the interval-based strategy column that for a random failure, replacing the component on a fixed schedule does nothing to the availability of the component and only raises the maintenance costs. The answer to random failures is an inspection-based strategy where the component is inspected at some regular frequency and the repair is affected based on the condition of the component, regardless of time. This is a very doable strategy within a Predictive Maintenance (PdM) program. All the inspections should produce this type of work.

Ready to Learn More?

Check out our eBook, "Are You Doing Too Much PM? 16 Ways To Save Time And Money On Preventive Maintenance."

A must-read guide for maintenance and reliability leaders.

Download Today

ABOUT ALLIED RELIABILITY

Allied Reliability’s production and asset management experts are committed to optimizing equipment and processes. Our experts work with you for best outcomes. Understanding how critical asset failures impact the environment, production, financials, and safety enables us to deliver the right monitoring, analytics, decision making, and maintenance plans. We bring unique asset management content along with best practices, advanced tools, and proven methodologies to help customers move forward in their Digital Transformation journey to deliver enhanced performance.

Subscribe to our Blog

Receive the latest insights on reliability, maintenance, and asset management best practices.