How Predikto hired thousands of data scientists

innovation_leadershipGE’s Jeff Immelt was recently interviewed on the Predictive Analytics investments and overall initiatives that have been ongoing over the last 5 years within his walls. The transcript, available here, is an excellent read on how a legacy company is attempting to transform itself for the digital future, leveraging vast amounts of sensor data to predict failure in large machinery. This marks a pivotal moment in GE’s history, where turning around a Titanic-size ship won’t be a trivial matter. The build-out began 5 years ago with a massive scaling of data science and predictive analytics division

Immelt seemed to drive one point home more than others in this interview; the mass hiring of Data Scientists (and ancillary staff) to accomplish the goal of building out the Predix division.

We have probably hired, since we started this, a couple thousand data scientists and people like that. That’s going to continue to grow and multiply. What we’ve found is we’ve got to hire new product managers, different kinds of commercial people. It’s going to be in the thousands.

We also hired thousands of Data Scientists (although we didn’t hire any “people like that”), so I figured I would shed some light on why and how we accomplished this.

The Need for Data Scientists

Data Scientists are the corner-stone of the machine learning world. Generally speaking, data scientists come from varied backgrounds; mechanical engineers, electrical engineers, and statisticians, to name a few. Their function within a predictive analytics organization is to (putting it simply) make sense of the data and select the features that influence the predictive models. Feature Selection goes hand-in-hand with making sense of the data, in that the Data Scientist is analyzing large amounts of data often with sophisticated software designed to choose which sensor readings, external factors, and derivations / combinations of each truly impact whether some *thing* will fail or not. Data scientists are the tip of the spear in determining what features/reading/factors matter and what predictive/mathematical models should be trained and applied to forecast events and probability of failures.

We faced the same crossroad as GE; data scientists are essential in getting things right and you need a lot of them when analyzing machine data. We aren’t talking about a few terabytes of data here. No, you’re typically looking at hundreds of terabytes generated by a system in a month… every month… for years.

Scaling the Data Science Team

Big data beckons a big data science team, and to that end, we had to employ, as GE does, thousands of data scientists.

Unlike GE, our data scientists don’t have names or desks. They don’t require ancillary staff nor coffee to stay awake.

Our data scientists work 24 hours a day, 7 days a week, 365 days a year and never tire or complain. Larger dataset? Our data science team clones itself to meet the demand elastically.

Predikto has a unique approach to machine learning and data science. Our data scientists are tiny workers operating on multi-core computers in a distributed environment, acting as one. Just like machines automated many of the mundane human tasks during the industrial revolution, Predikto has automated machine learning and the mundane tasks once accomplished by humans. Our feature selection? Automated. Feature scoring? Automated. Training models? Automated.

I invite you to read the Immelt interview. It truly is a good read on one way to approach building a predictive analytics company. At Predikto, we chose a different path that we felt was innovative and scalable for our own growth plan.

Also a good read… Innovation Happens Elsewhere (http://dreamsongs.com/IHE/IHE-24.html#pgfId-955288)