Why do we need so much data?

Most of us remember the double slit experiment from our physics classes. For those who don’t, here’s a (very) short reminder: a light is shone through two fine slits and the resulting image is proof of the dual nature of light.

If that experiment is done with an emitter that can send single particles of light and the receptor is a photo-sensitive surface, an interesting phenomenon occurs; ‘looking at’ the receptor too soon (if it’s a photo paper, developing and fixing it will do), and none of the “interesting” information appears – just a seemingly random trace of photons hitting the paper (as illustrated below).
Slide2
If, however, we wait long enough (until enough photons have hit the paper), a clear pattern emerges and some meaningful interpretations can be made (light behaves both like a particle and a wave).
Slide3What’s more, the further you are, the more obvious the pattern becomes (if every dot was a person, and you’d be standing among them, chances are you wouldn’t recognise the pattern either) – until you too far to notice anything but a blob. So, not only the amount of information matters but also the conditions under which they’re interpreted. Another good example is the image produced by a digital camera; unless the resolution is fine enough, we can’t read the letters on a paper in the picture. Things just get too small. So, we zoom in. Provided At some point, the image gets too pixelated for us to make any sense of it either. However, if we keep zooming, at some point the image becomes too pixelated and we no longer can read the letters. Context matters! Modern chefs use this at our expense: they’ll present you with food that looks like one thing but tastes like something different (there’s the famous example of two gellies at The Fat Duck; one is orange and the other is red. The orange one tastes like beet while the red one tastes like oranges). The lesson from this is that our brain – wired for pattern recognition – can easily be fooled!
What does all that have to do with predictive analytics? Well, quite simply:
– we need data, lots of it
– we need perspective
– we need context
– we need to be able to take a step back when things aren’t what we thought they’d be
Repeating the test in different circumstances, with different light sources, etc. will lead to different results and eventually a formula can be found which allows to predict the outcome of the test according to a number of variables. If at some point however, the variables change (i.e. an extra variable is introduced), one may (or may not) have to revise the formula to fit the new observations! The days when atoms were just protons, electrons and neutrons are long gone. With predictive analytics, too many efforts focus on coming up with an algorithm to predict the failure of a part. When just a couple of months later, the operating conditions change, the predictions lose accuracy and, eventually, their reason for being. If we ever want to get autonomous cars on the road, we can’t rely on programmers to describe every possible eventuality a car may find itself in. Instead, we need to make sure that when something happens, the car is able to come up with a workable solution.
So it is with predictive analytics!

What do you mean, no spare?

Forecasting is only useful if the result is applicable… and to the point! I may have perfectly forecasted the weather but if I’m going to spend the whole day inside, this has no impact on me. This is the same with industrial applications. Current interest in predictive analytics results in plenty of pilot projects of which most focus on how well the forecast performs. To the non-initiated this means; out of every 100 failures, did you catch 99? That’s where trouble starts:

  • let’s say I don’t catch 99 but just 30, does that mean I failed? Obviously, this depends on the cost of the avoided failures, the cost to avoid them. But 30 could prove to be very worthwhile.
  • now let’s say I do catch 99 but to do so, I actually forecast 129. Do the 30 extra inspections/repairs void the value of catching the 99? Again, it depends. If the marginal cost of these 30 extra interventions exceeds the cost of the 99 failures, one may be better off not acting at all. However, we need to be very careful when evaluating cost/benefits as we’ve rarely seen correct numbers being utilised (i.e. what is the cost to the organisation, customer aggravation,…)?

Forecasting_Cartoon2

In the case pictured above, the cost of not having the car spare is not huge; however, if one of the riders loses the race because he couldn’t get his bike spare in time, the derivative cost is very high!

There are two approaches to predictive analytics: big bang or step by step. The former is typically launched when a company is so in trouble it just can’t afford to wait and understands this may be the weapon to beat the competition. The step by step approach typically looks (or at least should) at two criteria to define a first project: impact and feasibility – let’s find the 10 parts that cause me the most pain AND for which I have good data to implement a condition based or predictive approach to maintenance.

That way, not only would the team have had a surplus of bike spares on board, it would at least have had one for the car !

How Predikto hired thousands of data scientists

innovation_leadershipGE’s Jeff Immelt was recently interviewed on the Predictive Analytics investments and overall initiatives that have been ongoing over the last 5 years within his walls. The transcript, available here, is an excellent read on how a legacy company is attempting to transform itself for the digital future, leveraging vast amounts of sensor data to predict failure in large machinery. This marks a pivotal moment in GE’s history, where turning around a Titanic-size ship won’t be a trivial matter. The build-out began 5 years ago with a massive scaling of data science and predictive analytics division

Immelt seemed to drive one point home more than others in this interview; the mass hiring of Data Scientists (and ancillary staff) to accomplish the goal of building out the Predix division.

We have probably hired, since we started this, a couple thousand data scientists and people like that. That’s going to continue to grow and multiply. What we’ve found is we’ve got to hire new product managers, different kinds of commercial people. It’s going to be in the thousands.

We also hired thousands of Data Scientists (although we didn’t hire any “people like that”), so I figured I would shed some light on why and how we accomplished this.

The Need for Data Scientists

Data Scientists are the corner-stone of the machine learning world. Generally speaking, data scientists come from varied backgrounds; mechanical engineers, electrical engineers, and statisticians, to name a few. Their function within a predictive analytics organization is to (putting it simply) make sense of the data and select the features that influence the predictive models. Feature Selection goes hand-in-hand with making sense of the data, in that the Data Scientist is analyzing large amounts of data often with sophisticated software designed to choose which sensor readings, external factors, and derivations / combinations of each truly impact whether some *thing* will fail or not. Data scientists are the tip of the spear in determining what features/reading/factors matter and what predictive/mathematical models should be trained and applied to forecast events and probability of failures.

We faced the same crossroad as GE; data scientists are essential in getting things right and you need a lot of them when analyzing machine data. We aren’t talking about a few terabytes of data here. No, you’re typically looking at hundreds of terabytes generated by a system in a month… every month… for years.

Scaling the Data Science Team

Big data beckons a big data science team, and to that end, we had to employ, as GE does, thousands of data scientists.

Unlike GE, our data scientists don’t have names or desks. They don’t require ancillary staff nor coffee to stay awake.

Our data scientists work 24 hours a day, 7 days a week, 365 days a year and never tire or complain. Larger dataset? Our data science team clones itself to meet the demand elastically.

Predikto has a unique approach to machine learning and data science. Our data scientists are tiny workers operating on multi-core computers in a distributed environment, acting as one. Just like machines automated many of the mundane human tasks during the industrial revolution, Predikto has automated machine learning and the mundane tasks once accomplished by humans. Our feature selection? Automated. Feature scoring? Automated. Training models? Automated.

I invite you to read the Immelt interview. It truly is a good read on one way to approach building a predictive analytics company. At Predikto, we chose a different path that we felt was innovative and scalable for our own growth plan.

Also a good read… Innovation Happens Elsewhere (http://dreamsongs.com/IHE/IHE-24.html#pgfId-955288)

Predikto among the 5 Atlanta Startups to Watch in 2015

Venture Atlanta Logo

Predikto, after the well-received presentation at Venture Atlanta 2014, is proud to announce that have been included as one of the 5 Atlanta Startups to Watch in 2015.
The rest of the list is formed by SalesLoft, SafelyStay, GroundFloor, and Pointivo.

You can read the full story here.

Congrats to all finalists.

About Venture Atlanta
Venture Atlanta, Georgia’s technology innovation event, is where the state’s best technology innovators meet the country’s top-tier investors. As the region’s largest investor showcase, Venture Atlanta connects local entrepreneurs with venture capitalists, bankers, angel investors and others who can help them raise the capital they need to grow their businesses. The annual non-profit event is a collaboration of three leading Georgia business organizations: Atlanta CEO Council, Metro Atlanta Chamber and the Technology Association of Georgia (TAG).
For more information, visit www.ventureatlanta.org.

 

5 Reasons Why You Suck at Predictive Analytics

reasons why you may suck at Predictive Analytics

We see it every day… Customer X embarked on a grand in-house initiative to implement Big Data analytics. $X million and Y years later, they can predict nothing, aside from the fact that their consultants seem to get fatter with every passing month, and the engineering guys keep asking for more time and bigger budgets. It’s understandable… These days, cutting through all the noise related to big data analytics can be difficult. The bloggers, analysts, and consultants certainly make it sound easy, yet the road to successful predictive analytics implementations seems to be littered with the corpses of many a well-intentioned executive.

Many of our customers come to us, after they have spun their wheels in the mud trying to implement big data projects on their own or with the help of the talking heads. Below is a list of what I believe to be the top reasons for failed projects with buzzwords omitted:

  1. Data Science and Engineering alignment Fail: Or… The fear that engineering will cannibalize data science. After all, “If I can automate Data Science, why do I need the Data Scientists?”, the reasoning goes. Aligning both camps is difficult in larger organizations, as turf-wars will erupt.  Analytics software should seek to include the data science day-to-day activities rather than exclude them.
  2. Your data sucks: Nothing can save you here. If your vendor/manufacturer is providing shoddy data, you won’t be able to predict or analyze anything. Any consultant that tells you otherwise, is selling fertilizer, not analytics. It is best to reach out to your data generator / vendor and work with them to fix the root of the problem.
  3. You hired IBM: Watson does well with Jeopardy questions, but sadly couldn’t even predict the most recent round of IBM layoffs.
  4. You build when you should buy: Predictive Analytics is really hard, and chances are that it’s not your core competency, so why are you bringing all of this in-house? The real short term costs of implementing and maintaining custom software, data science groups, engineering groups, and infrastructure costs, can easily eat away millions of dollars, and you’re placing really big bets on being able to hire the high-level talent to pull it off.
  5. Operations Misalignment: Predictions are useless unless there is a someone or a some-thing to act on the results. It’s important to make operations a partner in the initiative from the onset. Increasing operational efficiency is the goal here, so really… Operations is the customer. A tight feedback loop with ongoing implementation between both camps is a must.

And so that’s the gist of it – 5 bullets forced out of me at our marketing department’s insistence. As much as I enjoy mocking the hype-masters in the industry, these days I find myself extremely busy helping build a real startup, solving real-world problems, for the Fortune 500, for real dollars. 😉

Should you buy a tool or a solution?

There is an old saying “keep your hand on the plow”. It was quite interesting to read about how this phrase came to be, and how it applies to business today.

Most of us pay little attention to tractors in a field as we drive past.  Yet it was not long ago that this task was done with a horse and plow.  Now I know you are thinking, what the heck does that have to do with business?  Plow the ground, plant the crop, tend the crop and then enjoy the fruits of your labor – sounds a lot like business to me.  Back to plowing – keeping straight rows and turning the soil the right depth is critical. A successful farmer knows to apply constant pressure to the plow.  Too little and the plow pulled out of the ground, too much caused the plow to go too deep and stressed out the horse and farmer.  Another aspect, if you look closely at old pictures, is that you will see the reigns of the horse tossed around the farmer’s neck.  If the farmer looked right the horse knew to pull right.  So if the farmer was day dreaming and looking around the result would be crooked rows.  Crooked rows wasted space and made it hard to care for and harvest the crops.  The farmer knew that to survive he had to keep a steady hand and look forward to get the best results.

Now we interpret this saying to mean don’t get distracted and don’t lose sight of the goal, in other words focus! In predictive maintenance I see this ageless advice often ignored. Many offerings in this space are tools; legacy technology with an element of predictive analytics added as a “check box”.  It makes a good story:  “Here is a tool that gives you the ability to create your own rules and do your own predictive analytics with a pretty UI.”  Customers get enamored with pretty graphs and slick rules configuration and in doing so lose sight of the goal.  The whole purpose of predictive maintenance is to get your maintenance team at the right location, at the right time, with part in hand before something breaks.  It is all about results – period.

Nobody argues the point above, yet we still see some major companies focusing on the wrong thing when considering predictive analytics for maintenance.  Why?  Maintenance teams are conditioned to find a tool versus finding a solution.

The field of data science grew hand in hand with the explosion of cloud computing.  Cloud computing completely turned the concept of buying hardware and operating systems upside down.  Today cloud based SaaS (software as a service) has eliminated the need for deep expertise in technology and instead simply provides the answer to the business problem.  Now business can buy complete turn key solutions instead of buying a tool, buying the hardware to run the tool, paying their IT department to host the tool and hiring expertise in using and maintaining the tool.

There is a reason data science and machine learning have been the domain of PhD’s and academia – advances are happening at a rapid pace and it takes deep expertise with these tools to get results.  In fact the most important aspect of successful machine learning is the process of feature selection.  It is a rare customer who even understands that a feature is data in context, never mind the hundreds of machine learning algorithms available today.  Yet they believe they should buy a tool that requires them to define their own features and algorithms.  Sadly these same companies relate stories of failed attempts at using data science to solve some problem.  Instead of looking for a solution to a business problem they looked for a tool.  They took their “hand off the plow.”  Savy business leaders understand that buying tools and building expertise in a fast changing technology space that is not the core of their business is not a good use of their resources.  Others understand that business moves too fast and is too competitive to waste time and money buying tools and building expertise when they can buy the complete solution now.  Successful companies know they need a solution not another tool.  They “keep their hand on the plow.”


predictive analytics tool or solultion

Predikto, Inc. Partners With New York Air Brake to Incorporate Predictive Analytics In NYAB’s LEADER Advanced Train Control Technology Solutions

ATLANTAJuly 14, 2015 /PRNewswire/ — Predikto, Inc. today announced a new collaborative effort where New York Air Brake will incorporate Predikto’s auto-dynamic predictive analytics platform, MAX, into the company’s LEADER advanced train control technology solutions via its internet of things (IoT) initiative.

New York Air Brake, a subsidiary of the Knorr-Bremse Group (Munich, Germany), an innovation leader and supplier in the rail industry since 1890, will integrate a new predictive analytics component to its Advanced Train Control Technology solution, LEADER (Locomotive Engineer Assist/Display & Event Recorder). The mutually developed solution will now leverage a suite of predictive analytics software applications engineered by Predikto, Inc. Predikto’s patent pending solution, called MAX, is an auto-dynamic machine learning engine that draws upon LEADER train data in addition to capturing data external to the train itself, such as weather and line of road conditions. MAX is a self-learning artificial-intelligence solution that adapts itself to rapid changes in context in near real-time in order to provide the most accurate forecasts possible across an array of use-cases.

“Integrating predictive analytics with the rich train information from LEADER will allow the railroads to utilize their data to proactively identify opportunities to improve operating efficiency and rail safety,” said Mario Montag, CEO of Predikto. “Partnering with a premier technology company in the rail industry, such as New York Air Brake, will allow Predikto’s award-winning platform to make a defining impact on the rail industry.”

The predictions provided by MAX will enable new and existing users to incorporate advanced data analytics to enhance the capabilities currently available through LEADER. Predikto’s MAX platform has already proven success within the rail industry through forecasting failures and health in rail equipment ranging from bullet trains in Europe to wayside detection equipment in North America. This partnership will allow for the deployment of dynamic predictive capabilities that include a locomotive energy efficiency forecaster, a braking efficiency forecaster and track health. The LEADER/MAX solution is poised to revolutionize the rail industry by providing advanced insight to improve velocity and operating efficiency.

“You can have data without information, but you cannot have information without data.  Predikto’s MAX allows us to extract every bit of information and turn it into actionable insights that will improve visibility into operations, provide innovative solutions to improve safety, and provide clarity into the critical maintenance and performance indicators that impact the bottom line most,” states Greg Hrebek, Director of Engineering for New York Air Brake.  “The capability offered between us through this collaboration is unprecedented in the rail industry and will rapidly accelerate the value of the investment the railroads have made into locomotive onboard intelligence.”

About New York Air Brake

New York Air Brake, Inc., headquartered in Watertown, NY, has a long-standing history of innovation and technology in the rail industry ranging from providing advanced braking technology for trains to train control systems. New York Air Brake’s mission is to provide superior railroad brake and train control systems, products, and services with high quality and high value. For more information visit the New York Air Brake website at www.NYAB.com.

About Predikto, Inc.

Predikto, Inc., headquartered in Atlanta, GA, provides actionable solutions for the rail industry as well as industrial equipment and fleets using predictive analytics. Its proprietary data analysis and prediction engine is built on an auto-dynamic machine learning protocol that adapts to changing environments in near real time.  Predikto specializes in operationalizing predictions of key industrial events like asset failures and poor asset health to enhance a company’s overall performance.

The company is comprised of engineers, developers, academics, and industry professionals. Predikto’s technology solution enables companies to achieve seamless operational functionality, efficiency and exponential return on their asset investment.

For more information, visit www.Predikto.com.

 

SOURCE Predikto, Inc.

RELATED LINKS
http://www.predikto.com