Why do we need so much data?

Most of us remember the double slit experiment from our physics classes. For those who don’t, here’s a (very) short reminder: a light is shone through two fine slits and the resulting image is proof of the dual nature of light.

If that experiment is done with an emitter that can send single particles of light and the receptor is a photo-sensitive surface, an interesting phenomenon occurs; ‘looking at’ the receptor too soon (if it’s a photo paper, developing and fixing it will do), and none of the “interesting” information appears – just a seemingly random trace of photons hitting the paper (as illustrated below).
If, however, we wait long enough (until enough photons have hit the paper), a clear pattern emerges and some meaningful interpretations can be made (light behaves both like a particle and a wave).
Slide3What’s more, the further you are, the more obvious the pattern becomes (if every dot was a person, and you’d be standing among them, chances are you wouldn’t recognise the pattern either) – until you too far to notice anything but a blob. So, not only the amount of information matters but also the conditions under which they’re interpreted. Another good example is the image produced by a digital camera; unless the resolution is fine enough, we can’t read the letters on a paper in the picture. Things just get too small. So, we zoom in. Provided At some point, the image gets too pixelated for us to make any sense of it either. However, if we keep zooming, at some point the image becomes too pixelated and we no longer can read the letters. Context matters! Modern chefs use this at our expense: they’ll present you with food that looks like one thing but tastes like something different (there’s the famous example of two gellies at The Fat Duck; one is orange and the other is red. The orange one tastes like beet while the red one tastes like oranges). The lesson from this is that our brain – wired for pattern recognition – can easily be fooled!
What does all that have to do with predictive analytics? Well, quite simply:
– we need data, lots of it
– we need perspective
– we need context
– we need to be able to take a step back when things aren’t what we thought they’d be
Repeating the test in different circumstances, with different light sources, etc. will lead to different results and eventually a formula can be found which allows to predict the outcome of the test according to a number of variables. If at some point however, the variables change (i.e. an extra variable is introduced), one may (or may not) have to revise the formula to fit the new observations! The days when atoms were just protons, electrons and neutrons are long gone. With predictive analytics, too many efforts focus on coming up with an algorithm to predict the failure of a part. When just a couple of months later, the operating conditions change, the predictions lose accuracy and, eventually, their reason for being. If we ever want to get autonomous cars on the road, we can’t rely on programmers to describe every possible eventuality a car may find itself in. Instead, we need to make sure that when something happens, the car is able to come up with a workable solution.
So it is with predictive analytics!