A predictive maintenance example

A prediction doesn’t mean that something will happen! A prediction merely says something may happen. Obviously, the more accurate that prediction gets, the closer it comes to determining something will happen. Yet, we often misinterpret accuracy or confidence in a prediction; when something has 20% chance of failing or 90% chance of failing, we often mistake the result of the failure for the chance of failing. In both cases, when the failure occurs, the result is the same; it is only the frequency of this failure happening that changes.

What I describe above is one of the reasons why managers often fail to come up with a solid business case for predictive analytics. Numbers – and especially risk-based numbers – all to often scare off people when they’re really not that hard to understand. Obviously, the underlying predictive math is hard but the interpretation from a business point of view is much simpler than most people dare to appreciate. We’ll illustrate this with an example: Company A is in the business of sand. Could hardly be simpler than that. It’s business consists of unloading barges, transporting and sifting the sand internally and then loading it onto truck for delivery. To do this, they need cranes (to unload the ships), conveyor belts and more cranes (to load the trucks). Some of these items are more expensive (the ship-loading cranes) or static (the conveyor belts) than others (the truck loading cranes). In this case, this has led to a purchasing policy which has focused on getting the best cranes available for offloading the ships (tying down a ship because the crane is broken is very expensive), slightly less stringent on the conveyor belts (if it’s broken, at least the sand is on our yards and we can work around the failures with our mobile cranes) or downright hedged by buying overcapacity on the, cheaper, mobile cranes. This happens quite often: the insurance strategy changes with either the value of the assets as well as with their criticality to the operations. Please also note that criticality goes up with diminishing alternatives… A single asset is typically less critical (from an operational point of view) when part of a fleet of 100 than if it were alone to perform a specific task.

All these assets are subject to downtime; both planned and unplanned. We’ll focus on the unplanned downtime. When a fixed ship-loading crane fails, the ship either can’t be off-loaded any more or it has to be moved in reach of another such cranes (if that one’s available). Either way, the offloading is interrupted and the failure not only yields repair costs (time: diagnose, get the part, fix the problem – parts – people) but also delays the ship’s departure, which may result in additional direct charges or costs due to later bay availability for incoming ships. When a conveyor belt breaks down, there’s the choice of waiting for it to be repaired or for finding an alternative such as charging the sand on trucks and hauling it the processing plant. Both situations come at a high cost. Moreover, both the cranes and the conveyor may cause delays for the sifting plant, which is probably the most expensive asset on site whose utilisation must be maximised! For the truck loading cranes, the solution was to add one extra crane for every 10 in the fleet. That overcapacity should ensure ‘always on’ but comes at the cost of buying spare assets.

Let’s now mix in some numbers. Let’s say a ship-loading crane costs €5,000,000; a conveyor costs €500,000 and a mobile crane costs €250,000. The company has three ship docks with one crane each, 6 conveyors and a fleet of 20 mobile cranes, putting their total asset value at €22,000,000. If we take a conservative estimate that 6% of the ARV (Asset Replacement Value) is spent on maintenance, this installed base costs €1,320,000 to maintain every year. Let’s further assume that 50% of the interventions are planned and 50% are unplanned. We know that unplanned maintenance is 3-9 times more expensive than planned so for this example we’ll take the middle figure of 6x. We can now easily calculate the cost of planned and unplanned events by: €1,320,000 = 0.5x + 0.5*6x, where x is the total planned maintenance cost. Result: of the total maintenance cost, roughly €190,000 is spent on planned maintenance whereas a whopping €1,130,000 is due to unplanned downtime! If the number of maintenance events is 200, that means that one planned maintenance event costs €1,900 and one unplanned event costs €11,300. . Company A has done all it can to optimise the maintenance processes but can’t seem to drive down the costs further and therefore just decided this is part of doing business.

Meanwhile on the other part of town… Company B is a direct competitor of Company A. And for the sake of this example, we’ll even make it an exact copy of Company A but for one difference: it has embarked on a project to diminish the number of unplanned downtime events. They came to the same conclusion that for the 200 maintenance events, the best way to lower the costs was if they could magically transform unplanned maintenance into planned maintenance. They did some research and found that, well, they could – at least for some. Here’s the deal: if we can forecast a failure with enough lead time, we can prevent it from happening by planning maintenance on the asset (or component that is forecasted to fail) either when other maintenance is planned to happen or during times when the asset is not required for production. While the event is still happening, the prevent-fix being planned costs €1,900 as compared to a break-fix costing €11,300 – that’s a €9,400 difference per event!

The realisation that the difference between a break-fix and a prevent-fix was €9,400 per event allowed them to avoid the greatest pitfall of predictive maintenance. Any such project requiring a major shift in mindset is bound to face headwind. In predictive analytics, most of the pushback comes from people not understanding risk-based decision making or people not seeing the value associated with introducing the new approach. The first relates to the fact that many people still believe that predictions should be spot-on. Once they realise this is impossible, they often (choose to) ignore the fact that sensitivity can be tuned to increase precision albeit at a cost: higher precision means less coverage (if we want to get higher prediction confidence, we can get this but out of all failures, we’ll catch a smaller portion). “If you can predict all failures, then what’s the point?” is an often heard objection.

Company B did it’s homework though and concluded that they could live with the high enough prediction accuracy at a 20% catch rate. The accuracy at this (rather low) catch rate meant that for every 11 predictions, 10 actually prevented a failure and 1 was a false positive (these figures are made up for this example). Let’s look at the economics: a 20% catch rate means that of 100 unplanned downtimes, 20 could be prevented, which resulted in a saving of 20 x €9,400 = €188,000. However, the prediction accuracy also means that for catching these 20, they actually had to perform 22 planned activities; the 2 extra events costed 2 x €1,900 = €3,800. The resulting savings were therefore €188,000 – €3,800 = €184,200; savings of more than 16% on the total maintenance budget!


What’s more, there are fringe benefits: avoiding the unplanned downtime results in better planning, which ultimately results in higher availability with the same asset base. Stock-listed companies how important ROCE (Return On Capital Employed) is when investors compare opportunities but even private companies should beware: financial institutions use this kind of KPI’s to evaluate whether or not to allow for credit and at what rate (it plays a major role in determining a company’s risk profile). Another fringe benefit – and not a small one – is that on the fleet sizing for the mobile cranes (remember they took 10% extra machines just as a buffer for unplanned events), fleet size can be adjusted downward for the same production capacity because downtime during planned utilisation will be down by 20%. Even if they play it very safe and just downsize by one crane, that’s a €250,000 one-time saving plus an annual benefit of 6% on that: €15,000!

Company B is gradually improving flow by avoiding surprises; a 20% impact can’t go unnoticed and has a major effect on employee morale. They also did their homework very well and passed (part of) the reduced operational costs on to their clients. Meanwhile, at Company A, employees constantly feel like they’re running after the facts and can’t understand how Company B manages to undercut them on price and still throw a heck of a party to celebrate their great year!