Many organisations are at the point that they want to do a lot more with the data they generate, or that they can acquire from customers and suppliers. They might want to invest in industry ecosystems for mutual benefit, standardising and sharing with peers and partners. They may be looking to introduce automation with RPA tools, exploiting the possibilities of scale and efficiency that can be realised. Or they could be thinking of how they can train AI models to accelerate their growth or reduce their cost base. These are just some of the technologies that are tempting business leaders to invest right now.
Whilst the returns on investments in these areas can be phenomenal, the reality is that they require time, effort and crucially “good” data to work effectively. So, what is good data? The traditional view is that it has five key elements – accuracy, completeness, relevance, reliability, and timeliness. This is a very good list, but it is the relevance trait that I am considering today.
I am lucky enough to have worked in technology for quite a long time. Those who remember the days of big ERP, systems developed slowly, as monoliths, from enormous tomes detailing requirements and specifications, know why it took so long to change those systems. Data feeds were scarce, and outputs were likewise limited. If data went in, it went in via a keyboard operator or through primitive EDI. Changes to systems required slow, expensive feedback loops between “users” and “IT” and these groups rarely interacted in meaningful ways.
I realise that for some people who might read this, it sounds horrendous – you work in an agile, collaborative way deploying changes monthly, weekly, daily or even more often than that, to complex production systems. For others I am not describing the past, but something recognisable even now. But with agility comes temptation – today’s information technology landscape has the vast proliferation of data to deal with. We can ingest data from unlimited sources, internally generated and externally sourced. Where data processing and storage was once a bottleneck, it is now the least of our concerns. So we can be tempted to gather and keep data that adds to the sum of our knowledge in only the most marginal of ways. We are in danger of living on junk food data.
When you live on junk food data it’s cheap and easy, but we are in danger of piling on excess weight and taking what is convenient rather than what is most suitable, and sustainable. When we come to test ourselves with new applications that require quality data in order to produce good results, we can find that we are not in a fit state to work with them.
At IDC we are seeing more use of data catalogues for data quality, governance and self-service access. Where these exist and are combined with thoughtful KPIs for things like the number of data terms, definitions and growth it is possible to provide a high-quality service to the organisation as well as being more properly informed of the sustainability of your estate – if you like, the nutritional value of your data. Aligned with this approach is a continuing effort to improve data literacy within the organisation – these go hand in hand, as you improve your data intelligence through better management you will improve data literacy, and vice-versa better data literacy contributes to more thoughtful data management.
Getting ready for the challenge of automation, artificial intelligence and advanced analytics and predictive tools means choosing your data sources carefully, trimming the fat where it exists, and maintaining as lean a data profile as is sensible. It might seem like a hard challenge now, but you will be thankful for it in future.