The Scalability of Data Science: Part 3 – Reality Check  

You’re an operations manager, or even a CEO. Let’s say you have a big predictive analytics initiative and need to deploy several instances of your solution. What are your options? Ok, go ahead and hire a team of 5 data scientists, each with a relatively modest salary of $100k (US) per year (this is very conservative estimate). Now step back… You’ve just spent half a million dollars on staffing (plus the time for hiring, software procurement, etc.) for something that’s going to develop slowly and if it works at all, it may not work well. Have you made a wise investment?

This is the reality that most companies entering the realm of IoT and predictive analytics will face. Why? Most predictive analytic solutions can’t scale (i.e., can’t be rapidly applied across different problems). It’s too time consuming and too expensive and the value may be lost in a ramp-up attempt. A deployed predictive analytics solution must be scalable, fast, and affordable. A data scientist can be great (and many are), but they’re bound by the magnitude in which they can scale and the subjectivity of their respective approaches to the solution. There are many ways to approach data analysis that are correct, but there’s probably an alternative that is more valuable.

The next generation of predictive analytics solutions should be able to accomplish most, if not all, of the above automatically and rapidly with decreasing involvement from humans, and should perform as good or better than a good data science team; this is what Predikto has done (patents pending). We enable operations managers and data scientists by tackling the bulk of the grunt work.

I’m well aware that this may downplay the need for a large data science industry, but really, what’s an industry if it can’t scale? A fad perhaps. Data science is not just machine learning and some basic data manipulation skills. There’s much more to a deployed solution that will impact a customer’s bottom line. To make things worse, many of the key components of success are not things covered in textbooks or in an online course offering on data science.

It’s one thing to win a build the “best” predictive analytics solution (e.g., a Kaggle competition), but try repeating this  process of dozens times in a matter of weeks for predictions of different sorts. If any of these solutions are not correct, it costs real dollars. Realistically scaling in an applied predictive analytics environment should scare the pants off of any experienced data scientist who relies on manual development. Good data science is traditionally slow and manual. Does it have to be?

Rest assured, I’m not trying to undercut the value of a good data scientist; this is needed trade. The issue is simply that data science is difficult to scale in a business setting.