Is Your Data Strategy FAIR Enough?
A recent Harvard Business Review: What’s Your Data Strategy? confirms what a lot of us believe about the total cost of data management in business. About 80% of the data effort in your organisation is resource and time taken up finding and making data usable. If your data isn’t findable, accessible, interoperable and reusable, then you have to spend a lot of your organisation’s money to make it so. And guess what?! A lot of this cost is avoidable if you have a data strategy that is based on delivering data that it is: Findable, Accessible, Interoperable and Reusable (FAIR).
The FAIR data principles described in Nature in 2016, elegantly summarise the actual outcomes that every business should want for their data. If you stand up in a C-suite meeting and say I want to unblock our data science, marketing, sales and research teams by ensuring that our data is findable, accessible, interoperable and reusable, nobody should disagree with.
If you want your business to exploit data then the using a FAIR data strategy ensures that your data are shared in a way that enables & enhances reuse, by humans and machines The moment that a CIO or Chief Data Architect becomes woke to this is the moment they should immediately do something about it before anyone else finds out they haven’t been using these principles.
To be fair most CIOs or Data Architects have inherited the state they are in. It has come about through many causes. Silos of unconnected data will exist because of political or technical decisions that have been made in the past. New and unexpected data will be arriving because of advances in technology, science, commercial and regulatory change, mergers and acquisitions. Leaving you with a data landscape that looks more like a bazaar than a cathedral. Typical responses to this situation are some or all of:
Heroic efforts - throwing clever human resources at data engineering to make the ever growing backlog of data usable by parts of the enterprise,
Optimistically searching for a technology or tool that can deal with the whole thing or part of it.
Ignoring the problem and continuing to create projects and programmes of work that will, independently of the rest of enterprise, deliver further data silos with no thought to what will happen to the data after the project closes down.
No matter how much your organisation tries to deal with the flow and change of data through heroic data engineering efforts, blind optimism in a technical solution or just ignoring it, it will never cope. It’s like having a boat holed below the water line, and thinking that bailing or ignorance will keep it afloat. In reality the only way to fix it is to plug the hole, stop it getting any worse, then bailout while continuing on your enterprise journey.
When you create a FAIR data strategy you have made a conscious business decision to plug the hole that is slowly causing you to dip lower in the water.
Your data strategy isn’t a vision of what an organisation wants to achieve. Rather the data strategy describes how the enterprise will achieve that vision for the business. And the FAIR data principles help you test how you are going to achieve it. Although the acronym FAIR conveniently forms a familiar word you actually have to see the gross interdependencies there are between each of those four parts. You can imagine it could be quite difficult to improve the findability of data if no agents are allowed access to it in the first place. And the reusability of data will be greatly affected by how interoperable it is. This means that when you build the roadmap to your business's FAIR data future, each tool you buy or service you create to address each of these aspects needs to be scored against the others as well.
Let’s have a quick look at each of these principles:
Undoubtedly the primary principle. If you cannot locate your data you’ll never be able to use it. With existing large data collections and the constant download of data from external sources, it is key to think of the challenge as one which may have a human in the loop, but more likely needs to be done by machine. That means datasets have to be discoverable by their content well described and the meta-data used to do it must be interpretable by a machine and well indexed.
Once a user, agent or machine has found the required data appropriate access needs to be given. This may require the use of authentication services, authorisation management and clear meta-data which describes entitlement to use. Judging the suitability of technical approach to this requires you to consider whether you want to work with proprietary or open protocols for access, and consider access to the meta-data, even when the data are not available for access.
The enterprise must consider the approach to data publishing (sharing) , will it be via a content management system, APIs to support application development, or querying through endpoints? Each of these will offer different levels of reuse. Publishing in a content management system, will be helpful for humans who want to read documents, but it’s a bit of a dead-end for a data scientist who wants to do some analysis to derive business insight.
Interoperability is a pivotal challenge to meet. The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. There is a cultural need to understand that it is different from standardisation. Until about a decade ago it might have still been credible to exclaim that your business was embarking on a code and identifier standardisation path as the approach to data interoperability. To do so now in the face of the daily downpour of data and meta-data across the web, would be a folly. Key to achieving this is to be able to describe well what the data is and what the data represents. To do this requires a formal, accessible, shared knowledge representation and use of widely used vocabularies (that are also FAIR).
We’ve arrived! The ultimate goal of a FAIR data strategy is to make the reuse of data normal and part of a sustainable way of working. The enterprise acn rapidly respond to market change, improve innovation and instantly gain business insights, through data reuse. Assuming that we’ve passed the expectations for findability, accessibility and interoperability what's left to achieve? There are three key criteria which must be met: (i) Entitlement to use must be clear for the agent, human or machine using the data. This needs to consider all the Personally Identifiable Information (PII) implications, commercial licenses, the different open data licenses that may apply. (ii) Assuming that the data are comprehensive enough for the purpose to which it is going to applied. Provenance should be detailed enough to allow a domain expert to use it. An expert will need to know the history of the data and its origin - so that enterprise decisions can be confidently made on analysis of the data.