Predikto: Making Waves in IoT!

The Internet of Things is generally defined as “Smart” + “Connected” + “Edge Devices” (Planes, Trains, Automobiles, Industrial & Farming Equipment, Medical Equipment, and Consumer Electronics)

Predikto focuses on putting the “smart” into managing smart connected devices, equipment and complex capital assets in order to forecast asset behavior/performance.

Industrial asset OEMs, operators and maintenance organizations are challenged by equipment performance degradation and failure as they impact uptime and efficiency. While reliability and condition-based solutions have been around for many years, predictive analytics (machine learning) is providing significant new capabilities to improve performance and profitability.

Approximately 2,000 hardware, software and business leaders attended the second annual O’Reilly Solid 2.0 IoT conference in San Francisco. Attendees were given the opportunity to vote on the startup they believed was making the most innovative impact in the field of industrial or consumer IoT. Of the 30 or so startups at the conference, Predikto was voted best startup by attendees for its telematics / IoT based predictive analytics, predictive maintenance and asset health management solutions.

https://www.youtube.com/watch?v=C0-cYgsT8yI&list=PL055Epbe6d5ZVlSYx7-1k72bm075HkVhq&index=21

This was great exposure for us at Predikto, and now we are up for 2 awards at the upcoming Solutions 2.0 Conference in early August.  We are going head to head against some big players in the industry in the categories of Asset Condition Management and Asset Management.  Mario Montag, Predikto CEO, will be presenting on the topic of Predictive Analytics in Asset Management.  This is another indication of the high demand for IoT products and solutions, the acceleration of Predikto within the Industrial Internet market and the large innovative technology community in Atlanta.

Mario Montag was quoted after the Solid Conference: “It is great to see validation from the market and conferences with regards to our Solution based predictive analytics technology and approach.  We are not a tool to enable customers to do more. We deliver results and bring to light full transparency on the ROI and impact we are having to solve real problems with asset reliability.”

We have also been getting some great traction with customers and partners.  We recently announced a partnership with New York Air Brake, subsidiary of the Knorr-Bremse Group in Germany, to incorporate Predikto’s auto-dynamic predictive analytics platform, MAX, into the company’s LEADER advanced train control technology solutions via its internet of things (IoT) initiative. See the full story here.

Needless to say we are all very are all very excited about the awards and recognition Predikto is receiving and it is legitimizing the need for a real solution in predictive analytics for the IIoT.

A Software Industry Veteran’s Take on Predictive Analytics

I’m about 4 months into the job here at Predikto as VP, Sales.  The predictive analytics market is an exciting new market with predictably (pun intended) its share of hype.  Nevertheless, this is key niche of the Industrial Internet of Things sector. I’d like to share some observations on what I’ve learned thus far.

We focus on asset-intensive industries, helping organizations leverage the terabytes of data they have accumulated to anticipate the likelihood of an adverse event, whether that is a battery on a transit bus about to fail, or indications that a fuel injector on a locomotive diesel engine, while still operating, is doing so at a less than desired level of performance.   We predict these events in a time horizon that allows the customer to take action to rectify the issue before it creates a problem, in a way that minimizes disruptions to operations.  Our technology is cutting edge Open Source, leveraging Spark, Python and Elastic Search hosted by AWS.

The use cases we’re being asked to solve are fascinating and diverse.   Some companies are contacting us as part of an initiative to transform their business model from selling capital assets to selling a service, an approach popularized by Rolls Royce with their jet engines, the “power by the hour” approach and similar to the software industry’s transition from selling perpetual licenses with maintenance contracts, to selling Software as a Service (SaaS).  In order to sell capital assets like construction equipment and industrial printing equipment this way, our customers will offer service level agreements, with Predikto in place to allow them to proactively deal with issues likely to degrade their service commitment.  So while our tactical focus has been on helping clients maximize product “uptime”, the strategic driver is helping them transition to a new way of generating revenue while getting closer to the customers.  It’s been gratifying to realize the impactful role our offering is playing in facilitating these transitions.

Other organizations are complex, asset-intensive businesses, where an equipment failure can have a cascading effect on revenues and customer service.  For example in the work we are doing with railroads we’ve learned there are a multitude of areas where sub-optimal performance of equipment or outright failure, can have significant impact.  The North American railroad network in 2014 set new records for revenue-ton-miles, a key efficiency metric; this was accomplished over a rail network which is highly congested.   In this environment, a delay has huge ripple effects.  Any number of factors can lead to a delay, ranging from a rockslide blocking a section of track to a locomotive breaking down, to a wheel failure on a rail car, which can cause a derailment.   On top of this, in order to operate safely and comply with government regulations, railroads have invested heavily in signaling and equipment monitoring assets, as well as machinery to maintain the track and roadbeds, which must work reliably.  Our abilities to implement in weeks and generate actionable predictions regarding locomotive and rail car health, as well as monitoring other equipment and even the condition of the rails, are making a major difference in helping to facilitate efficient, safe rail operations.

 

Having a blast…more to come.

Kevin Baesler, VP of Sales

Deploying Predictive Analytics (PdA) as an Operational Improvement Solution: A few things to consider

“…in data science…many decisions must be made and there’s a lot of room to be wrong…”

There are a good number of software companies out there who claim to have developed tools that can potentially deploy a PdA solution to enhance operational performance. Some of these packages appear to be okay, some claim that they are really good, and others seem really ambiguous other than being a tool that a data scientist might use to slice and dice data. What’s missing from most that claim they are more than an over glorified calculator are actual use cases that can demonstrate value. Without calling out any names, the one thing that these offerings share in common is the fact that they require services (i.e., consulting) on top of the software itself, which is a hidden cost, before they are operational. There is nothing inherently unique about any of these packages; all of the features they tout can be carried out via open-source software and some programming prowess, but here lies the challenge. Some so-called solutions bank on training potential users (i.e., servicing) for the long-term. These packages differ in their look-and-feel and their operation/programming language and most seem to either require consulting, servicing, or a data science team. In each of these cases, a data scientist must choose a platform/s, learn its language and/or interface, and then become an expert in the data at hand in order to be successful. In the real world, the problem lies in the fact that data tends to differ for each use case (oftentimes dramatically) and even after data sources have been ingested and modified so they are amenable predictive analytics, many decisions must be made and there’s a lot of room to be wrong and even more room to overlook.

“…a tall order for a human.”

Unfortunately, data scientists, by nature, are subjective (at least in the short term) and slow when good data science must be objectively contextual and quick to deploy since there are so many different ways to develop a solution. A good solution must be dynamic when there may be thousands of options. A good product will be objective, context driven, and be able to capitalize on new information stemming from a rapidly changing environment. This is a tall order for a human. In fairness, data science is manual and tough (there’s a tremendous amount grunt work involved) and in a world of many “not wrong” paths, the optimal solution may not be quickly obtained, if at all. That said, a data science team might not be an ideal end-to-end solution when the goal is for a long-term auto-dynamic solution that is adaptive and can to be deployed in an live environment rapidly and that can scale quickly across different use cases.typical solution

“…a good solution must be dynamic…”

End-to-end PdA platforms are available (Data Ingestion -> Data Cleansing/Modification -> Prediction -> Customer Interfacing). Predikto is one such platform where the difference is auto-dynamic scaleability that relieves much of the burden from a data science team. Predikto doesn’t require a manual data science team to ingest and modify data for a potential predictive analytics solution. This platform takes care of most of the grunt work in a very sophisticated way while capitalizing on detail from domain experts, ultimately providing customers with what they want very rapidly (accurate predictions) at a fraction of the cost of a data science team, particularly when the goal is to deploy predictive analytics solutions across a range of problem areas. This context-based solution also automatically adapts to feedback from operations regarding the response to the predictions themselves.

Predikto Solution Utilizing Predictive Analytics

 

Skeptical? Let us show you what Auto-Dynamic Predictive Analytics is all about and how it can reduce downtime in your organization. And by the way, it works… [patents pending]

Predikto Enterprise Platform

The Scalability of Data Science: Part 3 – Reality Check  

You’re an operations manager, or even a CEO. Let’s say you have a big predictive analytics initiative and need to deploy several instances of your solution. What are your options? Ok, go ahead and hire a team of 5 data scientists, each with a relatively modest salary of $100k (US) per year (this is very conservative estimate). Now step back… You’ve just spent half a million dollars on staffing (plus the time for hiring, software procurement, etc.) for something that’s going to develop slowly and if it works at all, it may not work well. Have you made a wise investment?

This is the reality that most companies entering the realm of IoT and predictive analytics will face. Why? Most predictive analytic solutions can’t scale (i.e., can’t be rapidly applied across different problems). It’s too time consuming and too expensive and the value may be lost in a ramp-up attempt. A deployed predictive analytics solution must be scalable, fast, and affordable. A data scientist can be great (and many are), but they’re bound by the magnitude in which they can scale and the subjectivity of their respective approaches to the solution. There are many ways to approach data analysis that are correct, but there’s probably an alternative that is more valuable.

The next generation of predictive analytics solutions should be able to accomplish most, if not all, of the above automatically and rapidly with decreasing involvement from humans, and should perform as good or better than a good data science team; this is what Predikto has done (patents pending). We enable operations managers and data scientists by tackling the bulk of the grunt work.

I’m well aware that this may downplay the need for a large data science industry, but really, what’s an industry if it can’t scale? A fad perhaps. Data science is not just machine learning and some basic data manipulation skills. There’s much more to a deployed solution that will impact a customer’s bottom line. To make things worse, many of the key components of success are not things covered in textbooks or in an online course offering on data science.

It’s one thing to win a build the “best” predictive analytics solution (e.g., a Kaggle competition), but try repeating this  process of dozens times in a matter of weeks for predictions of different sorts. If any of these solutions are not correct, it costs real dollars. Realistically scaling in an applied predictive analytics environment should scare the pants off of any experienced data scientist who relies on manual development. Good data science is traditionally slow and manual. Does it have to be?

Rest assured, I’m not trying to undercut the value of a good data scientist; this is needed trade. The issue is simply that data science is difficult to scale in a business setting.

The Scalability of Data Science: Part 2 – The Reality of Deployment

To put my previous post into perspective, let me give you a for instance… An organization wants to develop a deployed predictive analytics solution for an entire class of commuter trains. Let’s be modest and go with 10 different instances from within the data (e.g.,  1) predicting engine failure, 2) turbo charger pressure loss, 3) door malfunction, … and so on…). We’ll focus on just one…

Data from dozens of assets (i.e., trains) are streaming in by the second or quicker and these data must be cleaned and aggregated with other data sources. It’s a big deal to get just this far. Next you have to become and expert in the data and begin cleaning and developing context-based feature data from the raw source data. This is where art comes into play and this part is difficult and time consuming for data scientists. Once a set of inputs has been established, then comes the easier part, applying an appropriate statistical model/s to predict something (e.g., event occurrence, time to failure, latent class, etc.) followed by validating and deploying the results. Oh yes, let’s not forget the oft unspoken reality of threshold settings for the customer (i.e., costs of TPs vs FPs, etc.). To this point, we’re assuming that the solution has value and it’s important to keep in mind that a data science team has probably never seen this sort of data ever before.

So on top of requiring computer programming skills, feature engineering prowess (which is art), understanding statistics/machine learning, and having good enough communication skills to both learn from the customer about their data and to be able to “sell” the solution, this must all be accomplished in a reasonable amount of time. We’re talking about 1 instance to this point, remember? And, we’re still not deployed. Do you have expertise in deploying data for the customer? Now repeat this situation ten times and you’re closer to reality. Your team may now just filled up the next 12 months of work and the utility of the solution is still unknown.

Using the Spark Datasource API to access a Database

spark-logo

At Predikto, we’re big fans of in-memory distributed processing for large datasets. Much of our processing occurs inside of Spark (speed + scale), and now with the recently released Datasource API with JDBC connectivity, integrating with any datasource got a lot easier. The Spark documentation covers the basics of the API and Dataframes. There is a lack of information on actually getting this feature to work on the internet, however.

TL;DR; Scroll to the bottom for the complete Gist.

In this example, I’ll cover PostgreSQL connectivity. Really, any JDBC-driver-supported datasource will work.

First, Spark needs to have the JDBC driver added to its classpath:

os.environ['SPARK_CLASSPATH'] = "/path/to/driver/postgresql-9.3-1103.jdbc41.jar"

Once loaded, create your SparkContext as usual:

from pyspark import SparkContext
from pyspark.sql import SQLContext, Row
 
sc = SparkContext("local[*]", '')
sqlctx = SQLContext(sc)

Now, we’re ready to load data using the DataSource API. If we don’t specify a criteria, the entire table is loaded in to memory:

df = sqlctx.load(
  source="jdbc", 
  url="jdbc:postgresql:///?user=&password=",
  dbtable=".")
  1. source: “jdbc” specifies that we will be using the JDBC DataSource API.
  2. dbtable: The JDBC table we will read from, and possible a subquery (more about this, below)
  3. url: The DB to connect to.

Using the above code, the ‘load’ call will execute a ‘SELECT * FROM ‘ immediately.

In some cases, we didn’t want an entire DB table loaded in to memory, so it took a bit of digging to understand how the new API handles “where” clauses. They really act more like subqueries, where anything in the ‘FROM’ clauses will work.

query = "(SELECT email_address as email from schema.users WHERE user_id<=1000) as )"

df = sqlctx.load(
  source="jdbc", 
  url="jdbc:postgresql:///?user=&password=",
  dbtable=query)
  1.  query: This query contains our ‘WHERE’ clause. Note that you must specify an alias for this query.

Given the example above, Spark will consume a list of email addresses from our user table, for all users with an id <= 1000. Once we have a Dataframe in-hand, we can process the data using the API… converting it to an RDD or running SparkSQL calls over the data.

Complete example as Gist:

Deploying Predictive Analytics and Scalability: Part 1 – Patience is required, but can you afford it?

data processing in predictive analytics

Having been involved in applied data science for over a decade, one of the most substantive shortcomings of analytics that I’ve seen, in terms of a viable business product, is scalability. Data management, feature engineering, and multivariate statistics/machine learning are conceptually challenging topics and take time to master. Even for a seasoned data scientist (or team, or whatever you call it…) that can tackle the full-stack solution from data collection to predictive output/validation (i.e., end-to-end), the process is tedious and slow, and veterans are rare. Moreover, each new application, or instance, may require major re-development or starting over from scratch, even for different solutions stemming from the same data sources. In short, “good” data science is slow and arduous and won’t scale without considerable time and a considerable investment.  Tread with caution; this is as much art as it is science, but can you afford the time? Perhaps there may be a better way?

Robert Morris, Ph.D. is Chief Science Officer and Co-founder of Predikto, Inc.