NFF (No Fault Found) is an often-used KPI to which may get a whole new meaning with the introduction of predictive maintenance. Let’s go back to the origin of this KPI. Some two decades ago it gained in attention as companies increasingly focused on customer satisfaction, people found out that many so-called ‘bad parts’ that were injected in the reverse supply chain tested perfectly and were therefore flagged NFF. There has been an ongoing struggle between the field organisations’ and the reverse supply chain’s goals. Field service did all it could to increase the number of interventions per engineer per day, even at the expense of removing too many parts, under the motto that time is more expensive than parts. Removed parts then get injected in the reverse supply chain where they’re typically get sent to a hub to be checked and subsequently repaired. To the reverse logistics/repair organisation, NFF create ‘unnecessary’ activity – parts needing to be checked without any demonstrable problem being uncovered. These parts then need to be re-qualified: documented, repackaged,… Therefore, NFF is really bad to the observed performance of that organisation.
Back to predictive analytics: whereas CBM (Condition Based Maintenance) will order for a maintenance activity based on the condition of the equipment/part – and therefore do so when there’s demonstrable cause for concern – predictive maintenance will ideally generate a warning with longer lead time. Often before any signs of wear/tear become apparent! Provided certain conditions are met (see previous posts on criticality/accuracy/coverage/effort), parts will be therefore be removed which technically will be NFF! Because the removal of these parts will prevent a much costlier event this is not a problem per se, but it will require rethinking not only internal and external processes but also KPI’s! If we want adoption of these predictive approaches to maintenance and operations, KPI’s need to be rethought to reflect the optimised nature of these actions. We can’t allow anybody to be penalised for applying the optimal approach!
Predictive (or by extension, prescriptive) maintenance has huge potential for cost savings and as we’ve seen before (see previous blog entries), these savings should be looked at from a holistic point of view. Some costs may actually go up in order to bring down overall costs. Introducing such methodologies therefore also demand a lot of attention to process changes and to how people’s performance is measured. The good news is that introduction of predictive maintenance can be gradual; i.e. start with those areas that offer high confidence in the predictions and high return. Nothing helps adoption better than proven use cases!
As other domains such as procurement, supply chain, production planning, etc. get increasingly lean, attention focuses on the few remaining areas where large gains are expected from increasing efficiency. Fleet uptime or machine park uptime is thé focus area today. Indeed, investors increasingly look at asset utilisation to determine whether an operation is run efficiently or not. As we know, in the past many mistakes have been made by focusing on acquisition cost at the cost of quality. This has led to a lot of disruption with regards to equipment uptime which, in turn, renders inefficient any of the lean initiatives mentioned above. So, what are the important factors determining uptime? We’ll look at the two most important ones:
– reducing the number of failures
– reducing time to repair (TTR)
Reducing the number of failures sounds pretty obvious: purchase better equipment and you’re set. Sure, but how do you know the equipment is better? Sometimes, it’s easily measurable; i.e. I’ve known a case where steel screws were replaced by titanium ones. Although the latter were maybe five times more expensive, their total cost on the machine may have been less than 1,000$ whereas one failure caused by a steel screw cost 25,000$. Taking an integrated business approach to purchasing saved a lot of money over the lifetime of the equipment. In other cases, the extra quality is hard to measure and one has to trust the supplier. This ‘trust’ can be captured by SLA’s, warranty contracts or even fully servicized approach (where the supplier gets paid if and when the equipment functions according to a preset standard).
Number of failures can also be reduced by improving maintenance; pretty straightforward for run of the mill things such as clogging oil filters, etc. One just sets a measurement by which a trigger is set off and performs the cleaning or replacement. This is what happens with your car; every 15,000 miles or so certain things get replaced, whatever their status. The low price of both the parts involved and the intervention allows for such an approach. Things become more complex when different schedules need to be executed on complex equipment: allow all of the triggers to work independently (engine, landing gear, hydraulics, etc. on a plane for instance) may cause maintenance requirements almost every day. At least some of these need to be synchronised and ideally, the whole maintenance schedule should be optimised. Mind you, optimisation doesn’t necessarily mean a minimisation of the number of interventions! It should rather focus on minimising impact on operational requirements.
In order to further reduce the number of failures, wouldn’t it be great if we could prevent those events that occur less often? This involves predicting the event and prescribing an action in order to minimise its impact on production. This is exactly the focus of prescriptive maintenance; combining predictions (resulting from predictive analytics) with cause/effect/cost analysis to come up with the most appropriate course of action. Ideally, if maintenance is prescribed, it enters the same optimisation logic as described above. Remember, the goal is to optimise asset utilisation.
Reducing TTR is too often overlooked or just approached by process standardisation. However, many studies have shown that TTR is highly impacted by the time it takes to diagnose the problem and the time to get the technician/parts on site – especially in the case of moving equipment. Predictive analytics may help reduce both: the first, by providing the technician with a list of the systems/parts most at risk at any moment in time and the second by making sure the ‘risky’ parts are available. There’s nothing worse than having to set in motion an unprepared chain of actors (technical department, supplier, tier 1,…) for tracking down a hard to find part. This is even worse when the failing machine slows down or halts an entire production chain…
Poor ROA (Return On Assets) is often a trigger for takeovers because the buyer is confident they can easily improve the situation. It’s one of the telltale signs of a poorly run or suboptimal operation and has to be avoided at all cost. If your sights are not yet set on this domain, chances are other people’s are!
The Internet of Things (IoT) is at the precipice of the Gartner Hype cycle and there is no shortage of the “answers to everything” being promised. Many executives are just now beginning to find their feet after the storm wave that was the transition from on-premise to cloud solutions and are now being faced with an even faster paced paradigm shift. The transformative tidal wave that is IoT is crashing through CEO, CTO, and CIO’s offices and they are frantically searching for something to float their IoT strategies on but often are just finding drift wood.
Dr. Timothy Chou and his latest book Precision: Principles, Practices, and Solutions for the Internet of Things is your shipwright. The framework presented by Dr. Chou cuts through the fog that surrounds IoT and provides a straight forward no jargon explanation of IoT and the power that is harnessable. Dr. Chou then goes on to present a showcase of case studies that are real life profitable IoT solutions by a variety of traditional and hi-tech businesses.
One of the case studies Dr. Chou features is based on my work at New York Air Brake where we utilized instrumented and connected locomotives to create the world’s most advanced train control system that has saved the rail industry over a billion dollars in fuel, emissions, and other costs. It was this work that gave me a taste of the power IoT has and gave me the passion to want to make a bigger impact in the rail and transportation industries utilizing IoT data and thus join the Predikto family.
A prediction doesn’t mean that something will happen! A prediction merely says something may happen. Obviously, the more accurate that prediction gets, the closer it comes to determining something will happen. Yet, we often misinterpret accuracy or confidence in a prediction; when something has 20% chance of failing or 90% chance of failing, we often mistake the result of the failure for the chance of failing. In both cases, when the failure occurs, the result is the same; it is only the frequency of this failure happening that changes.
What I describe above is one of the reasons why managers often fail to come up with a solid business case for predictive analytics. Numbers – and especially risk-based numbers – all to often scare off people when they’re really not that hard to understand. Obviously, the underlying predictive math is hard but the interpretation from a business point of view is much simpler than most people dare to appreciate. We’ll illustrate this with an example: Company A is in the business of sand. Could hardly be simpler than that. It’s business consists of unloading barges, transporting and sifting the sand internally and then loading it onto truck for delivery. To do this, they need cranes (to unload the ships), conveyor belts and more cranes (to load the trucks). Some of these items are more expensive (the ship-loading cranes) or static (the conveyor belts) than others (the truck loading cranes). In this case, this has led to a purchasing policy which has focused on getting the best cranes available for offloading the ships (tying down a ship because the crane is broken is very expensive), slightly less stringent on the conveyor belts (if it’s broken, at least the sand is on our yards and we can work around the failures with our mobile cranes) or downright hedged by buying overcapacity on the, cheaper, mobile cranes. This happens quite often: the insurance strategy changes with either the value of the assets as well as with their criticality to the operations. Please also note that criticality goes up with diminishing alternatives… A single asset is typically less critical (from an operational point of view) when part of a fleet of 100 than if it were alone to perform a specific task.
All these assets are subject to downtime; both planned and unplanned. We’ll focus on the unplanned downtime. When a fixed ship-loading crane fails, the ship either can’t be off-loaded any more or it has to be moved in reach of another such cranes (if that one’s available). Either way, the offloading is interrupted and the failure not only yields repair costs (time: diagnose, get the part, fix the problem – parts – people) but also delays the ship’s departure, which may result in additional direct charges or costs due to later bay availability for incoming ships. When a conveyor belt breaks down, there’s the choice of waiting for it to be repaired or for finding an alternative such as charging the sand on trucks and hauling it the processing plant. Both situations come at a high cost. Moreover, both the cranes and the conveyor may cause delays for the sifting plant, which is probably the most expensive asset on site whose utilisation must be maximised! For the truck loading cranes, the solution was to add one extra crane for every 10 in the fleet. That overcapacity should ensure ‘always on’ but comes at the cost of buying spare assets.
Let’s now mix in some numbers. Let’s say a ship-loading crane costs €5,000,000; a conveyor costs €500,000 and a mobile crane costs €250,000. The company has three ship docks with one crane each, 6 conveyors and a fleet of 20 mobile cranes, putting their total asset value at €22,000,000. If we take a conservative estimate that 6% of the ARV (Asset Replacement Value) is spent on maintenance, this installed base costs €1,320,000 to maintain every year. Let’s further assume that 50% of the interventions are planned and 50% are unplanned. We know that unplanned maintenance is 3-9 times more expensive than planned so for this example we’ll take the middle figure of 6x. We can now easily calculate the cost of planned and unplanned events by: €1,320,000 = 0.5x + 0.5*6x, where x is the total planned maintenance cost. Result: of the total maintenance cost, roughly €190,000 is spent on planned maintenance whereas a whopping €1,130,000 is due to unplanned downtime! If the number of maintenance events is 200, that means that one planned maintenance event costs €1,900 and one unplanned event costs €11,300. . Company A has done all it can to optimise the maintenance processes but can’t seem to drive down the costs further and therefore just decided this is part of doing business.
Meanwhile on the other part of town… Company B is a direct competitor of Company A. And for the sake of this example, we’ll even make it an exact copy of Company A but for one difference: it has embarked on a project to diminish the number of unplanned downtime events. They came to the same conclusion that for the 200 maintenance events, the best way to lower the costs was if they could magically transform unplanned maintenance into planned maintenance. They did some research and found that, well, they could – at least for some. Here’s the deal: if we can forecast a failure with enough lead time, we can prevent it from happening by planning maintenance on the asset (or component that is forecasted to fail) either when other maintenance is planned to happen or during times when the asset is not required for production. While the event is still happening, the prevent-fix being planned costs €1,900 as compared to a break-fix costing €11,300 – that’s a €9,400 difference per event!
The realisation that the difference between a break-fix and a prevent-fix was €9,400 per event allowed them to avoid the greatest pitfall of predictive maintenance. Any such project requiring a major shift in mindset is bound to face headwind. In predictive analytics, most of the pushback comes from people not understanding risk-based decision making or people not seeing the value associated with introducing the new approach. The first relates to the fact that many people still believe that predictions should be spot-on. Once they realise this is impossible, they often (choose to) ignore the fact that sensitivity can be tuned to increase precision albeit at a cost: higher precision means less coverage (if we want to get higher prediction confidence, we can get this but out of all failures, we’ll catch a smaller portion). “If you can predict all failures, then what’s the point?” is an often heard objection.
Company B did it’s homework though and concluded that they could live with the high enough prediction accuracy at a 20% catch rate. The accuracy at this (rather low) catch rate meant that for every 11 predictions, 10 actually prevented a failure and 1 was a false positive (these figures are made up for this example). Let’s look at the economics: a 20% catch rate means that of 100 unplanned downtimes, 20 could be prevented, which resulted in a saving of 20 x €9,400 = €188,000. However, the prediction accuracy also means that for catching these 20, they actually had to perform 22 planned activities; the 2 extra events costed 2 x €1,900 = €3,800. The resulting savings were therefore €188,000 – €3,800 = €184,200; savings of more than 16% on the total maintenance budget!
What’s more, there are fringe benefits: avoiding the unplanned downtime results in better planning, which ultimately results in higher availability with the same asset base. Stock-listed companies how important ROCE (Return On Capital Employed) is when investors compare opportunities but even private companies should beware: financial institutions use this kind of KPI’s to evaluate whether or not to allow for credit and at what rate (it plays a major role in determining a company’s risk profile). Another fringe benefit – and not a small one – is that on the fleet sizing for the mobile cranes (remember they took 10% extra machines just as a buffer for unplanned events), fleet size can be adjusted downward for the same production capacity because downtime during planned utilisation will be down by 20%. Even if they play it very safe and just downsize by one crane, that’s a €250,000 one-time saving plus an annual benefit of 6% on that: €15,000!
Company B is gradually improving flow by avoiding surprises; a 20% impact can’t go unnoticed and has a major effect on employee morale. They also did their homework very well and passed (part of) the reduced operational costs on to their clients. Meanwhile, at Company A, employees constantly feel like they’re running after the facts and can’t understand how Company B manages to undercut them on price and still throw a heck of a party to celebrate their great year!
Mountains of consulting dollars have been invested in business process optimisation, manufacturing process optimisation, supply chain optimisation, etc. Now’s the time to bring everything together and with all these processes optimised, our whole production apparatus utilisation rate becomes ever higher. When all goes well, this means more gets done per invested dollar, making CFO and investors happy through better ROA (Return On Assets). However efficient, this increasing load on the machine park comes at a price: less wriggle room in case something unexpected happens. When in the past, companies had excess capacity, this not only served to absorb demand variability; it also came in very handy when machines broke down by allowing the demand to be re-routed to other equipment.
There’s no more place to hide now, so there are a number of options one can consider in order to avoid major disruptions:
- increase preventive maintenance: this may or may not help. Law of diminishing returns applies, especially as preventive maintenance tends to focus on normal wear and tear and parts with a foreseeable degradation behaviour. A better approach is to improve predictive maintenance; don’t overdo where there’s no benefit but try to identify additional quick wins. Your best suppliers will be a good source of information. Suppliers than can’t help; well, you can guess what I think of those.
- improve the production plan: too many companies still approach production planning purely reactively and lack optimisation capabilities. Machine changes, lot’s of stop and go, etc. all add to the fragility of the whole production apparatus (not to mention they typically – negatively – influence the output quality as well).
- improve flow: I’m still perplexed when I see the number of hick-ups in production lines because ‘things bumped into each other’. Crossing flows of unfinished parts is still a major cause of disruption (and a major focus point for top performers such as Toyota). As most plant managers why machines are in a certain place and they either “don’t remember” or will say “that’s the place where they needed the machine first” or even “that was the only place we had left”. Way too rarely do plant layouts get re-considered. Again, the best-in-class do this as often as once a year!
- shift responsibilities: if you can’t (or won’t) be good at maintenance, then outsource it! Get a provider that can improve your maintenance and ideally can work towards your goal, which is usually not to have shinier machines but to get more and better output. If you really decide you don’t care about machine ownership at all, consider performance- or output-based contracts.
- get better machines: sounds trivial but current purchasing approaches often fail to capture the ‘equipment quality’ axis and forget to look at lifetime cost in light of output. Just two months ago I heard of a company buying excavators from a supplier because for every three machines, they got one for free. This was presented as an assurance that the operator would never run out of machine capacity. In this case, it had the adverse effect as the buyer thought why they needed to throw in an extra machine if they claimed they were as reliable as the best.
- connect your machines: this is a very interesting step. Recognising that machines will eventually fail but at least making sure you get maximum visibility on what/where. Most of the time resolving equipment failures is spent… waiting! Waiting for the mechanic to arrive, waiting for the right part, etc.
- add predictive analytics: predictive analytics not only allow you to prevent failures from happening but, relating to the previous point, to the what/where axis, predictive analytics allows the addition of why. Determining why something failed or will fail is crucial in optimising production output. Well-implemented predictive analytics allow us to improve production continuity by avoiding unplanned incidents (through predictive maintenance) but also allows for more efficient (faster) and effective (resulting in better machine uptime) maintenance.
So which of these steps should we take? Frankly, all of them. Maybe not all at once and (hopefully) some of them may already have been implemented. Key is to have a plan. Where are we now, what are our current problems, what are we facing,…? Formulating the problem is half the solution. Then – and this may surprise some – work top down. Start with the end goal, your “ultimate production apparatus”, and work your way back to determine how to get there. All too often people start with the simplest steps without having looked at the end goal and after having taken two or three steps they find out they need to backtrack because they took the wrong turn earlier in the process.
At any step, whether it’s purchasing equipment or to install sensors or whatever, look at whether your supplier understands your goals and is capable of integrating in “the bigger plan”. The next efficiency frontier is APM: Asset Performance Management. Not individually, but from a holistic point of view. While individual APM metrics are interesting for determining rogue equipment, only the overall APM metrics matter for the bottom line; did we deliver on time, was the quality on par, at what cost,…