The Missing Link in Why You’re Not Getting Value From Your Data Science
by Robert Morris, Ph.D.
DECEMBER 28, 2016
Recently, Kalyan Veeramachaneni of MIT published an insightful monologue in the Harvard Business Review entitled “Why You’re Not Getting Value from Your Data Science.“ The author argued that businesses struggle to see value from machine learning/data science solutions because most machine learning experts tend not to build and design models around business value. Rather, machine learning models are built around nuanced tuning and subtle, yet complex, performance enhancements. Further, experts tend to make broad assumptions about the data that will be used in such models (e.g., consistent and clean data sources). With these arguments, I couldn’t agree more.
WHY IS THERE A MISSING LINK?
At Predikto, I have overseen many deployments of our automated predictive analytics software within many Industrial IoT (IIoT) verticals, including the Transportation industry. In many cases, our initial presence at a customer is in part due to limited short-term value gained from an internal (or consulting) human driven data science effort where the focus had been on just what Kalyan mentioned; a focus on the “model” rather than how to actually get business value from the results. Many companies aren’t seeing a return on their investment in human driven data science.
There are many reasons why experts don’t cook business objectives into their analytics from the outset. This is largely due to a disjunction between academic expertise, habit, and operations management (not to mention the immense diversity of focus areas within the machine learning world, which is a separate topic altogether). This is particularly relevant for large industrial businesses striving to cut costs by preventing unplanned operational downtime. Unfortunately, the bulk of the effort in deploying machine learning solutions geared toward business value is that one of the most difficult aspects of this process is actually delivering and demonstrating value to customers.
WHAT IS THE MISSING LINK?
In the world of machine learning, over 80% of the work revolves around cleaning and preparing data for analysis, which comes before the sexy machine learning part (see this recent Forbes article for some survey results supporting this claim). The remaining 20% involves tuning and validating results from a machine learning model(s). Unfortunately, this calculation fails to account for the most important element of the process; extracting value from the model output.
In business, the goal is to gain value from predictive model accuracy (another subjective topic area worthy of its own dialog). We have found that this is the most difficult aspect of deploying predictive analytics for industrial equipment. In my experience, the breakdown of effort required from beginning (data prep) to end (demonstrating business value) is really more like:
40% Cleaning/Preparing the Data
10% Creating/Validating a well performing machine learning model/s
50% Demonstrating Business Value by operationalizing the output of the model
The latter 50% is something that is rarely discussed in machine learning conversations (with the aforementioned exception). Veeramachaneni is right. It makes a lot of sense to keep models simple if you can, cast a wide net to explore more problems, don’t assume you need all of the data, and automate as much as you can. Predikto is doing all of these things. But again, this is only half the battle. Once you have each of the above elements tackled, you still have to:
Provide an outlet for near-real-time performance auditing. In our market (heavy industry), customers want proof that the models work with their historical data, with their “not so perfect” data today, and with their data in the future. The right solution provides fully transparent and consistent access to detailed auditing data from top to bottom; from what data are used to how models are developed, and how the output is being used. This is not only about trust, but it’s about a continuous improvement process.
Provide an interface for users to tune output to fit operational needs and appetites. Tuning output (not the model) is everything. Users want to set their own thresholds for each output, respectively, and have the option to return to a previous setting on the fly, should operating conditions change. One person’s red-alert is not the same as another’s, and this all may be different tomorrow.
Provide a means for taking action from the model output (i.e., the predictions). Users of our predictive output are fleet managers and maintenance technicians. Even with highly precise, high coverage machine learning models, the first thing they all ask is What do I do with this information? They need an easy-to-use, configurable interface that allows them to take a prediction notification, originating from a predicted probability, to business action in a single click. For us, it is often the creation of an inspection work order in an effort to prevent a predicted equipment failure.
Predikto has learned by doing, and iterating. We understand how to get value from machine learning output, and it’s been a big challenge. This understanding led us to create the Predikto Enterprise Platform®, Predikto MAX® [patent pending], and the Predikto Maintain® user interface. We scale across many potential use cases automatically (regardless of the type of equipment), we test countless model specifications on the fly, we give some control to the customer in terms of interfacing with the predictive output, and we provide an outlet for them to take action from their predictions and show value.
As to the missing 50% discussed above, we tackle it directly with Predikto Maintain® and we believe this is why our customers are seeing value from our software.
Robert Morris, Ph.D. is Co-founder and Chief Science/Technology Officer at Predikto, Inc. (and former Associate Professor at University of Texas at Dallas).