Actions

Mean Time to Repair (MTTR)

Revision as of 17:02, 6 February 2021 by User (talk | contribs) (The LinkTitles extension automatically added links to existing pages (https://github.com/bovender/LinkTitles).)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Definition of Mean time to repair (MTTR)[1]

Mean time to repair (MTTR) is a basic measure of the maintainability of repairable items. It represents the average time required to repair a failed component or device. Expressed mathematically, it is the total corrective maintenance time for failures divided by the total number of corrective maintenance actions for failures during a given period of time. It generally does not include lead time for parts not readily available or other Administrative or Logistic Downtime (ALDT). In fault-tolerant design, MTTR is usually considered to also include the time the fault is latent (the time from when the failure occurs until it is detected). If a latent fault goes undetected until an independent failure occurs, the system may not be able to recover. MTTR is often part of a maintenance contract, where a system whose MTTR is 24 hours is generally more valuable than for one of 7 days if mean time between failures is equal, because its Operational Availability is higher. However, in the context of a maintenance contract, it would be important to distinguish whether MTTR is meant to be a measure of the mean time between the point at which the failure is first discovered until the point at which the equipment returns to operation (usually termed "mean time to recovery"), or only a measure of the elapsed time between the point where repairs actually begin until the point at which the equipment returns to operation (usually termed "mean time to repair"). For example, a system with a service contract guaranteeing a mean time to "repair" of 24 hours, but with additional part lead times, administrative delays, and technician transportation delays adding up to a mean of 6 days, would not be any more attractive than another system with a service contract guaranteeing a mean time to "recovery" of 7 days.



The Importance of Mean Time to Repair (MTTR)[2]

Because MTTR ostensibly measures how long business-critical systems are out of service, it’s a powerful predictor of the impact an IT incident will have on the organization’s bottom line. The higher an IT team’s MTTR, the greater the risk that the organization will experience significant downtime when IT incidents occur, potentially leading to business disruptions, customer dissatisfaction and loss of revenue.

Technological failures are inevitable. Understanding MTTR gives organizations an idea of how quickly and efficiently they can expect to respond to these failures and return business operations to normal. On the whole, lower MTTR ratings are a sign of a healthy computing environment and a positive IT function.


Mean Time to Repair (MTTR) Formula[3]

The MTTR formula is calculated by dividing the total unplanned maintenance time spent on an asset by the total number of failures that asset experienced over a specific period. Mean time to repair is most commonly represented in hours. The MTTR calculation assumes that:

  • Tasks are performed sequentially
  • Tasks are performed by appropriately trained personnel

MTTR Formula

For example, if you have spent 50 hours on unplanned maintenance for an asset that has broken down eight times over the course of a year, the mean time to repair would be 6.25 hours. What is considered world-class MTTR is dependent on several factors, like the type of asset, its criticality, and its age. However, a good rule of thumb is an MTTR of under five hours.


Types of MTTR[4]

There are actually four different definitions of MTTR in use, which can make it hard to be sure which one is being measured and reported on.

  • Mean Time to Repair (MTTR) is the most commonly used variation of MTTR and measures the average time taken to repair a system, including diagnosis, repair and testing. This is more often used for technical or mechanical systems.
  • Mean Time to Recovery (or Mean Time to Restore) is like Mean Time to Repair, as it represents the average time from system failure until it is fully operational again. This metric may be used for online services or software.
  • Mean Time to Respond measures the time taken to repair a system, from the moment you are notified, until it is completely fixed. This does not include any time taken from the initial failure to when you are first alerted.
  • Mean Time to Resolve extends Mean Time to Repair, by including any time taken to reduce the chances of the failure happening again. This additional time may occur after the system is back online but is definitely key to maximizing customer satisfaction.

Creating a clear, documented definition of MTTR for your business will avoid any potential confusion.


How MTTR is Used and What it Means fr Maintenance[5]

Mean time to repair is used as a baseline for increasing efficiency, finding ways to limit unplanned downtime, and boosting the bottom line. Because long repair times for mission-critical equipment mean product scrap, missed orders, and soured business relationships, MTTR helps organizations identify why maintenance may be taking longer than is ideal and make more informed decisions to fix the underlying causes.

Conducting an MTTR analysis can provide insight into the way your maintenance operation purchases equipment, schedules maintenance and completes tasks. Ultimately, MTTR helps your organization wipe out any inefficiencies that are causing lost production and the lost money that comes with that.

MTTR can be used for making repair or replace decisions on aging assets. If an asset takes longer to repair as it ages, it may be more economical to replace it. MTTR can also be used to inform the purchasing and design process by predicting lifecycle costs of new systems.

Tracking MTTR also helps to ensure your preventive maintenance program and PM tasks are as effective and efficient as possible. Although MTTR measures reactive maintenance, assets that take longer to repair may have PMs associated with them that aren’t working well. Mean time to repair is a gateway into the root cause of this problem and provides a path to a solution.

For example, if MTTR is increasing, it might be because PMs aren’t standardized, leading to more equipment failure. A work order may tell a technicians to lubricate a part, but it might not tell them which lubricant or how much. Adding this information to a work order will ensure work is done quickly and accurately, leading to less downtime.


Steps to Improve MTTR[6]

MTTR is seen as a key performance indicator (KPI). Therefore, maintenance teams should always strive to improve it. The benefits of reducing MTTR are fairly obvious – less downtime means stable production, happy customers and reduced maintenance costs. So, what are some steps you can take to help improve your organization's MTTR? The best place to start is understanding the four stages of MTTR and taking steps to reduce each of them.

  • Identification - the period of time from when the failure occurs to when a technician becomes aware of the issue. Things like wireless sensors and alert systems are great ways to shorten the identification time period of MTTR.
  • Knowledge - the period of time after the failure has been identified but before repairs have started. Figuring out or diagnosing the problem is generally the most time-consuming part of MTTR.
  • Fix - the period of time it takes to actually fix the issue at hand. Reducing the time it takes to fix an issue can be accomplished by standardizing procedures to guide well-trained technicians who are tasked with solving the problem.
  • Verify - the period of time it takes to ensure the applied fix is actually working. A real-time monitoring system is a helpful tool to quickly gather data and reports to show the fix is working.


The limitations of Mean Time to Repair (MTTR)[7]

Mean time to repair is not always the same amount of time as the system outage itself. In some cases, repairs start within minutes of a product failure or system outage. In other cases, there’s a lag time between the issue, when the issue is detected, and when the repairs begin.

This metric is most useful when tracking how quickly maintenance staff is able to repair an issue. It’s not meant to identify problems with your system alerts or pre-repair delays—both of which are also important factors when assessing the successes and failures of your incident management programs.


See Also

IT Operations (Information Technology Operations)
IT Operations Analytics (ITOA)
IT Operations Management (ITOM)
Performance Metrics
Metrics


References

  1. What is Mean Time to Repair (MTTR)? Wikipedia
  2. Why is MTTR important? Splunk
  3. How to calculate MTTR FIIX
  4. The four types of MTTR Next Service
  5. How is MTTR used and What does MTTR mean for maintenance? fiixsoftware.com
  6. How to Improve MTTR Reliable Plant
  7. The limitations of Mean Time to Repair (MTTR) Atlassian