Help, my data’s all over the place!

One of the main problems with any corporate IT project is getting all the data to the system. Data is generated in a multitude of systems and stored in just as many places and formats. Some data gets duplicated and/or transformed before stored in other systems. Rarely do even corporate IT departments know what is stored, where and in what format. Sometimes data is stored and nobody seems to know what it means… 

I visited a major bank recently and just for calculating their portfolio risks, they estimated some 7,000 spreadsheets were being used – I’d hate to be in the shoes of their risk officer (and I’d also be scared to be a shareholder)!

The traditional approach for gathering data for a specific usage has been to set up queries to extract data from source systems in a pre-defined format. This approach is not only long, it is also prone to errors and not very flexible (if the source data changes, adaptations need to be made to the integration). New technologies allow for the creation of data lakes (unstructured data storage) however. This is much faster and retains full flexibility on all the data.

The first result of grouping all (or as much as possible) data is to create visibility. In our case: how many delays, how many breakdown, which parts are causing most problems, etc. The capability to make problems visible allows huge savings (a major European railroad reported that such a project saved them 8% on their maintenance costs) because it allows people to make better decisions, avoids ‘flying blind’ or even avoids abuse (there’s no more hiding).

Once all the data is centrally accessible, projects such as predictive analytics can be launched with less effort. Provided the data centralization environment is up to the task, it should be flexible enough to allow for a stepped approach; i.e. just bringing together all corporate data first (ERP, EAM, CRM, planning,…) and adding equipment sensor data at a later stage. Those familiar with the traditional approach know how painful this can be as it would essentially require the definition of a new data structure. This is avoided by the big data approach.

What’s the conclusion? Don’t be put off any longer by data integration pains; choosing the right technology and approach allows for rapid, performant, flexible and appropriate deployment of a data storage architecture which can be the foundation of your predictive analytics project.

Doing so will allow you to optimise:

  • visibility; before setting off on any data analytics project, make sure you have optimal visibility over existing data (volume, quality, etc.)
  • project efficiency: the stepped approach delivers value faster 
  • consistency: data centralisation lowers IT landscape dependencies
  • flexibility/adaptivity: the tiered approach allows for application adaptation to changing requirements without requiring re-integration
  • functionality: new functions can be developed and rolled out gradually while retaining a single (centralised) data source

Faster,  incremental results with lower project risk leads to higher NPV (Net Present Value) – the only measure that should really matter!