The Missing Link in Why You’re Not Getting Value From Your Data Science

The Missing Link in Why You’re Not Getting Value From Your Data Science

by Robert Morris, Ph.D.

DECEMBER 28, 2016

Recently, Kalyan Veeramachaneni of MIT published an insightful monologue in the Harvard Business Review entitled “Why You’re Not Getting Value from Your Data Science. The author argued that bfbusinesses struggle to see value from machine learning/data science solutions because most machine learning experts tend not to build and design models around business value. Rather, machine learning models are built around nuanced tuning and subtle, yet complex, performance enhancements. Further, experts tend to make broad assumptions about the data that will be used in such models (e.g., consistent and clean data sources). With these arguments, I couldn’t agree more.

 

WHY IS THERE A MISSING LINK?

At Predikto, I have overseen many deployments of our automated predictive analytics software within many Industrial IoT (IIoT) verticals, including the Transportation industry. In many cases, our initial presence at a customer is in part due to limited short-term value gained from an internal (or consulting) human driven data science effort where the focus had been on just what Kalyan mentioned; a focus on the “model” rather than how to actually get business value from the results. Many companies aren’t seeing a return on their investment in human driven data science.

wallThere are many reasons why experts don’t cook business objectives into their analytics from the outset. This is largely due to a disjunction between academic expertise, habit, and operations management (not to mention the immense diversity of focus areas within the machine learning world, which is a separate topic altogether). This is particularly relevant for large industrial businesses striving to cut costs by preventing unplanned operational downtime. Unfortunately, the bulk of the effort in deploying machine learning solutions geared toward business value is that one of the most difficult aspects of this process is actually delivering and demonstrating value to customers.

WHAT IS THE MISSING LINK?

In the world of machine learning, over 80% of the work revolves around cleaning and preparing data for analysis, which comes before the sexy machine learning part (see this recent Forbes article for some survey results supporting this claim). The remaining 20% involves tuning and validating results from a machine learning model(s). Unfortunately, this calculation fails to account for the most important element of the process; extracting value from the model output.

In business, the goal is to gain value from predictive model accuracy (another subjective topic area worthy of its own dialog). We have found that this is the most difficult aspect of deploying predictive analytics for industrial equipment. In my experience, the breakdown of effort required from beginning (data prep) to end (demonstrating business value) is really more like:

40% Cleaning/Preparing the Data

10% Creating/Validating a well performing machine learning model/s

50% Demonstrating Business Value by operationalizing the output of the model

The latter 50% is something that is rarely discussed in machine learning conversations (with the aforementioned exception). Veeramachaneni is right. It makes a lot of sense to keep models simple if you can, cast a wide net to explore more problems, don’t assume you need all of the data, and automate as much as you can. Predikto is doing all of these things. But again, this is only half the battle. Once you have each of the above elements tackled, you still have to:

Provide an outlet for near-real-time performance auditing. In our market (heavy industry), customers want proof that the models work with their historical data, with their “not so perfect” data today, and with their data in the future. The right solution provides fully transparent and consistent access to detailed auditing data from top to bottom; from what data are used to how models are developed, and how the output is being used. This is not only about trust, but it’s about a continuous improvement process.

Provide an interface for users to tune output to fit operational needs and appetites. Tuning output (not the model) is everything. Users want to set their own thresholds for each output, respectively, and have the option to return to a previous setting on the fly, should operating conditions change. One person’s red-alert is not the same as another’s, and this all may be different tomorrow.

Provide a means for taking action from the model output (i.e., the predictions). Users of our predictive output are fleet managers and maintenance technicians. Even with highly precise, high coverage machine learning models, the first thing they all ask is What do I do with this information? They need an easy-to-use, configurable interface that allows them to take a prediction notification, originating from a predicted probability, to business action in a single click. For us, it is often the creation of an inspection work order in an effort to prevent a predicted equipment failure.

Predikto has learned by doing, and iterating. We understand how to get value from machine learning output, and it’s been a big challenge. This understanding led us to create the Predikto Enterprise Platform®, Predikto MAX® [patent pending], and the Predikto Maintain® user interface. We scale across many potential use cases automatically (regardless of the type of equipment), we test countless model specifications on the fly, we give some control to the customer in terms of interfacing with the predictive output, and we provide an outlet for them to take action from their predictions and show value.

As to the missing 50% discussed above, we tackle it directly with Predikto Maintain® and we believe this is why our customers are seeing value from our software.

pm1

Robert Morris, Ph.D. is Co-founder and Chief Science/Technology Officer at Predikto, Inc. (and former Associate Professor at University of Texas at Dallas).

How Predikto hired thousands of data scientists

innovation_leadershipGE’s Jeff Immelt was recently interviewed on the Predictive Analytics investments and overall initiatives that have been ongoing over the last 5 years within his walls. The transcript, available here, is an excellent read on how a legacy company is attempting to transform itself for the digital future, leveraging vast amounts of sensor data to predict failure in large machinery. This marks a pivotal moment in GE’s history, where turning around a Titanic-size ship won’t be a trivial matter. The build-out began 5 years ago with a massive scaling of data science and predictive analytics division

Immelt seemed to drive one point home more than others in this interview; the mass hiring of Data Scientists (and ancillary staff) to accomplish the goal of building out the Predix division.

We have probably hired, since we started this, a couple thousand data scientists and people like that. That’s going to continue to grow and multiply. What we’ve found is we’ve got to hire new product managers, different kinds of commercial people. It’s going to be in the thousands.

We also hired thousands of Data Scientists (although we didn’t hire any “people like that”), so I figured I would shed some light on why and how we accomplished this.

The Need for Data Scientists

Data Scientists are the corner-stone of the machine learning world. Generally speaking, data scientists come from varied backgrounds; mechanical engineers, electrical engineers, and statisticians, to name a few. Their function within a predictive analytics organization is to (putting it simply) make sense of the data and select the features that influence the predictive models. Feature Selection goes hand-in-hand with making sense of the data, in that the Data Scientist is analyzing large amounts of data often with sophisticated software designed to choose which sensor readings, external factors, and derivations / combinations of each truly impact whether some *thing* will fail or not. Data scientists are the tip of the spear in determining what features/reading/factors matter and what predictive/mathematical models should be trained and applied to forecast events and probability of failures.

We faced the same crossroad as GE; data scientists are essential in getting things right and you need a lot of them when analyzing machine data. We aren’t talking about a few terabytes of data here. No, you’re typically looking at hundreds of terabytes generated by a system in a month… every month… for years.

Scaling the Data Science Team

Big data beckons a big data science team, and to that end, we had to employ, as GE does, thousands of data scientists.

Unlike GE, our data scientists don’t have names or desks. They don’t require ancillary staff nor coffee to stay awake.

Our data scientists work 24 hours a day, 7 days a week, 365 days a year and never tire or complain. Larger dataset? Our data science team clones itself to meet the demand elastically.

Predikto has a unique approach to machine learning and data science. Our data scientists are tiny workers operating on multi-core computers in a distributed environment, acting as one. Just like machines automated many of the mundane human tasks during the industrial revolution, Predikto has automated machine learning and the mundane tasks once accomplished by humans. Our feature selection? Automated. Feature scoring? Automated. Training models? Automated.

I invite you to read the Immelt interview. It truly is a good read on one way to approach building a predictive analytics company. At Predikto, we chose a different path that we felt was innovative and scalable for our own growth plan.

Also a good read… Innovation Happens Elsewhere (http://dreamsongs.com/IHE/IHE-24.html#pgfId-955288)

ARC Guest Blog: Counting toilet flushes help improve bullet train reliability

Greg Adams was a recent guest blogger on the ARC Advisory Group’s IIoT newsletter. We see a lot of data and it is interesting how mundane and often overlooked data can contain meaning. Read how counting toilet flushes is helping to increase the uptime and reliability of bullet trains.  http://industrial-iot.com/2015/10/how-wc-flushes-relate-to-locomotive-reliability/

Metro Atlanta CEO writes a piece about Predictive Analytics

metroatlantaceo

Metro Atlanta CEO has an article in their October newsletter covering Predictive Analytics and some of the interesting use cases Predikto has in Transportation.

Predikto, a leader in Predictive Analytics solutions Transportation, has begun to deploy their machine learning / artificial intelligence software to help improve equipment reliability at global companies.

Click on the article to read about actual use cases and gain an understanding of this disruptive technology.

5 Reasons Why You Suck at Predictive Analytics

reasons why you may suck at Predictive Analytics

We see it every day… Customer X embarked on a grand in-house initiative to implement Big Data analytics. $X million and Y years later, they can predict nothing, aside from the fact that their consultants seem to get fatter with every passing month, and the engineering guys keep asking for more time and bigger budgets. It’s understandable… These days, cutting through all the noise related to big data analytics can be difficult. The bloggers, analysts, and consultants certainly make it sound easy, yet the road to successful predictive analytics implementations seems to be littered with the corpses of many a well-intentioned executive.

Many of our customers come to us, after they have spun their wheels in the mud trying to implement big data projects on their own or with the help of the talking heads. Below is a list of what I believe to be the top reasons for failed projects with buzzwords omitted:

  1. Data Science and Engineering alignment Fail: Or… The fear that engineering will cannibalize data science. After all, “If I can automate Data Science, why do I need the Data Scientists?”, the reasoning goes. Aligning both camps is difficult in larger organizations, as turf-wars will erupt.  Analytics software should seek to include the data science day-to-day activities rather than exclude them.
  2. Your data sucks: Nothing can save you here. If your vendor/manufacturer is providing shoddy data, you won’t be able to predict or analyze anything. Any consultant that tells you otherwise, is selling fertilizer, not analytics. It is best to reach out to your data generator / vendor and work with them to fix the root of the problem.
  3. You hired IBM: Watson does well with Jeopardy questions, but sadly couldn’t even predict the most recent round of IBM layoffs.
  4. You build when you should buy: Predictive Analytics is really hard, and chances are that it’s not your core competency, so why are you bringing all of this in-house? The real short term costs of implementing and maintaining custom software, data science groups, engineering groups, and infrastructure costs, can easily eat away millions of dollars, and you’re placing really big bets on being able to hire the high-level talent to pull it off.
  5. Operations Misalignment: Predictions are useless unless there is a someone or a some-thing to act on the results. It’s important to make operations a partner in the initiative from the onset. Increasing operational efficiency is the goal here, so really… Operations is the customer. A tight feedback loop with ongoing implementation between both camps is a must.

And so that’s the gist of it – 5 bullets forced out of me at our marketing department’s insistence. As much as I enjoy mocking the hype-masters in the industry, these days I find myself extremely busy helping build a real startup, solving real-world problems, for the Fortune 500, for real dollars. 😉

Predikto: Making Waves in IoT!

The Internet of Things is generally defined as “Smart” + “Connected” + “Edge Devices” (Planes, Trains, Automobiles, Industrial & Farming Equipment, Medical Equipment, and Consumer Electronics)

Predikto focuses on putting the “smart” into managing smart connected devices, equipment and complex capital assets in order to forecast asset behavior/performance.

Industrial asset OEMs, operators and maintenance organizations are challenged by equipment performance degradation and failure as they impact uptime and efficiency. While reliability and condition-based solutions have been around for many years, predictive analytics (machine learning) is providing significant new capabilities to improve performance and profitability.

Approximately 2,000 hardware, software and business leaders attended the second annual O’Reilly Solid 2.0 IoT conference in San Francisco. Attendees were given the opportunity to vote on the startup they believed was making the most innovative impact in the field of industrial or consumer IoT. Of the 30 or so startups at the conference, Predikto was voted best startup by attendees for its telematics / IoT based predictive analytics, predictive maintenance and asset health management solutions.

https://www.youtube.com/watch?v=C0-cYgsT8yI&list=PL055Epbe6d5ZVlSYx7-1k72bm075HkVhq&index=21

This was great exposure for us at Predikto, and now we are up for 2 awards at the upcoming Solutions 2.0 Conference in early August.  We are going head to head against some big players in the industry in the categories of Asset Condition Management and Asset Management.  Mario Montag, Predikto CEO, will be presenting on the topic of Predictive Analytics in Asset Management.  This is another indication of the high demand for IoT products and solutions, the acceleration of Predikto within the Industrial Internet market and the large innovative technology community in Atlanta.

Mario Montag was quoted after the Solid Conference: “It is great to see validation from the market and conferences with regards to our Solution based predictive analytics technology and approach.  We are not a tool to enable customers to do more. We deliver results and bring to light full transparency on the ROI and impact we are having to solve real problems with asset reliability.”

We have also been getting some great traction with customers and partners.  We recently announced a partnership with New York Air Brake, subsidiary of the Knorr-Bremse Group in Germany, to incorporate Predikto’s auto-dynamic predictive analytics platform, MAX, into the company’s LEADER advanced train control technology solutions via its internet of things (IoT) initiative. See the full story here.

Needless to say we are all very are all very excited about the awards and recognition Predikto is receiving and it is legitimizing the need for a real solution in predictive analytics for the IIoT.

A Software Industry Veteran’s Take on Predictive Analytics

I’m about 4 months into the job here at Predikto as VP, Sales.  The predictive analytics market is an exciting new market with predictably (pun intended) its share of hype.  Nevertheless, this is key niche of the Industrial Internet of Things sector. I’d like to share some observations on what I’ve learned thus far.

We focus on asset-intensive industries, helping organizations leverage the terabytes of data they have accumulated to anticipate the likelihood of an adverse event, whether that is a battery on a transit bus about to fail, or indications that a fuel injector on a locomotive diesel engine, while still operating, is doing so at a less than desired level of performance.   We predict these events in a time horizon that allows the customer to take action to rectify the issue before it creates a problem, in a way that minimizes disruptions to operations.  Our technology is cutting edge Open Source, leveraging Spark, Python and Elastic Search hosted by AWS.

The use cases we’re being asked to solve are fascinating and diverse.   Some companies are contacting us as part of an initiative to transform their business model from selling capital assets to selling a service, an approach popularized by Rolls Royce with their jet engines, the “power by the hour” approach and similar to the software industry’s transition from selling perpetual licenses with maintenance contracts, to selling Software as a Service (SaaS).  In order to sell capital assets like construction equipment and industrial printing equipment this way, our customers will offer service level agreements, with Predikto in place to allow them to proactively deal with issues likely to degrade their service commitment.  So while our tactical focus has been on helping clients maximize product “uptime”, the strategic driver is helping them transition to a new way of generating revenue while getting closer to the customers.  It’s been gratifying to realize the impactful role our offering is playing in facilitating these transitions.

Other organizations are complex, asset-intensive businesses, where an equipment failure can have a cascading effect on revenues and customer service.  For example in the work we are doing with railroads we’ve learned there are a multitude of areas where sub-optimal performance of equipment or outright failure, can have significant impact.  The North American railroad network in 2014 set new records for revenue-ton-miles, a key efficiency metric; this was accomplished over a rail network which is highly congested.   In this environment, a delay has huge ripple effects.  Any number of factors can lead to a delay, ranging from a rockslide blocking a section of track to a locomotive breaking down, to a wheel failure on a rail car, which can cause a derailment.   On top of this, in order to operate safely and comply with government regulations, railroads have invested heavily in signaling and equipment monitoring assets, as well as machinery to maintain the track and roadbeds, which must work reliably.  Our abilities to implement in weeks and generate actionable predictions regarding locomotive and rail car health, as well as monitoring other equipment and even the condition of the rails, are making a major difference in helping to facilitate efficient, safe rail operations.

 

Having a blast…more to come.

Kevin Baesler, VP of Sales