Practical Data FAIRification. Part 1: Intro

Dmitrii Kober
10 min readAug 28, 2024

--

Preface

As a passionate data and analytics consultant, I dive deep into the dynamic worlds of various industries daily, supporting clients on creating data strategies and adopting and sustaining associated operating models. Why, you ask? Well, the motivations are as diverse and exciting as the businesses themselves! Let’s consider a few examples.

📚 Picture this: A booming business preparing for significant capability scaling, where seamless communications across departments becomes crucial. Enhanced data interoperability isn’t just nice to have; it’s a lifeline that supports cross-functional use cases, ensuring all units adhere to the organization’s business goals and strategy.

📚 Then there’s the thrilling world of M&A. Imagine two gambling companies uniting — each with its own rich data history of games, licenses, financial information, history of wages, statistics, player registers, and many-many more. The advisory contribution here is to help weave these disparate data strands together, creating a canvas that not only tells a new unified story but does so with a strategic foresight and appropriate governance. The goal is to synergize these resources to not just coexist but to excel, powering ahead in their business domain.

📚 Let’s not overlook the fascinating process of horizontal business integration. Here, companies like franchises strive for alignment in their business, data, and technology management practices. The magic happens when we increate the ROI, reduce redundancy and lower costs, transforming data exchange across subsidiaries into a streamlined, cost-effective operation.

Navigating these varied business needs requires deep understanding of numerous frameworks, approaches, and methodologies, as well as performing comprehensive research for feasibility and cost-benefit balance. Each scenario demands a tailored approach — a specific plan that aligns with unique contexts and constraints.

Today, I would like to share one foundational methodology that when applied correctly and wisely results in well-curated data which (practically!) opens the window for gaining a (greater) business value. Ready to dive in? Let’s explore it!

Intention

As you might have guessed from the title, the methodology I would like to discuss is Data FAIRification — making organizational data follow FAIR principles to the extent it is required by a collection of business-led tasks (just enough FAIR — we will cover this concept in the later articles).

Through this enlightening series, I pursue the goal of sharing the practical guidelines and lessons learned on the topic, which I’ve got to while working days and nights with mid-sized companies and globally-spread enterprises on their data management capability enhancement programs.

I sincerely hope this series shall assist practitioners who are navigating the exciting journey of operating over vast amounts of complex data for greater good. Stay tuned as we continue to explore how FAIR principles can be practically applied to turn raw data into a well-organized, powerful (business) assets.

Business Impact-based Into to FAIR

In this first article in series, as with any great story, we need to set the stage before driving into the action. It’s crucial to not only understand what FAIR stands for but also to clear up any misconceptions about what it isn’t. So, let’s begin by shedding some light on the basics, creating a solid foundation upon which we can build further understanding.

🚨To begin with, it’s essential to note that FAIR data principles are not about (at least directly) equitable or ethical data usage. FAIRification can certainly support establishing ethical data processing and utilization, but this is not the primary focus of the methodology.

It’s common among data practitioners to recognize that FAIR stands for Findability, Accessibility, Interoperability, and Reusability. However, I sometimes encounter occasions when people mistakenly think that this sums up the entirety of the concept — merely attributing these vague (implying many interpretations) and broad characteristics to data, which is quite unfortunate to be honest…

In reality, the domain of FAIR data is extensive and intricate. According to the pioneers of the topic, the framework is built upon 🚧 15 detailed principles 🚧, each elaborating on one of the four overarching ones. Moreover, today the exploration of FAIR data continues to evolve as numerous companies and individual experts relentlessly delve into this field, crafting new processes, procedures, recommendations, tools, and even standards that enrich understanding these principles. This ongoing exploration not only deepens the practical application of FAIR but also continuously shapes the landscape of data management, making it an exciting area of growth and innovation.

While delving into the intricate details of each FAIR principle will be the subject of subsequent articles, where I’ll focus on their applicability and transformative impact, here I’d like to offer a summary, yet showing how businesses face this realm. The aim for now, is to provide the reader with a high-level understanding of the pathways that each of the four characteristics should guide through.

Picture source: cessda’s Data Management Expert’s Guide

Findability

In the data-driven landscapes of modern business (and science), the ability to locate relevant information quickly and efficiently stands as a tollgate of success. The Findability principle ensures that datasets have persistent, globally unique, and resolvable identifiers (GUPRI) and are easy to find and discover for both humans and machines.

The following real-life scenario illustrates why adhering to this principle is not just beneficial but essential.

💊Pharmaceutical Research Case Study

A pharma company (the name is anonymized) — the client I had a pleasure to work with, one of global leaders in drug development, faced challenges in efficiently accessing historical research and clinical trials data. Researchers often spent considerable amounts of time sifting through disjointed and poorly indexed data repositories in hope of finding something useful. This inefficiency not only delayed drug development timelines but also increased overall R&D costs and reduced the potential for discovering new applications for existing compounds.

To address these challenges, the company undertook a strategic initiative to restructure their data management practices, prioritizing the FAIR principle of Findability. Key pillars included:

  • Unique and persistent identifiers: Each data set related to drug research or clinical trials was assigned a unique identifier. These identifiers were persistent, meaning they remained constant over time, irrespective of changes to the data set itself. This practice helped maintain a reliable reference that could be easily cited in research documentation and regulatory submissions.
  • Rich metadata framework: Together with the client we developed a comprehensive metadata schema and a corresponding operating model tailored to the needs, R&D processes, and work culture (!) of the organization. This included detailed descriptors for each data set, such as the compound studied, the nature of the study (clinical trial, in vitro, in vivo, etc.), related health conditions, experimental outcomes, data ownership information, research status update information, and more. This rich metadata made it easier for researchers to understand what each data set contained (discoverability) without needing to delve into the data itself in the first place.
  • Centralized data catalog: All research data was integrated into a centralized data catalog with an advanced indexing and search systems. This allowed researchers to perform quick, targeted searches using specific parameters presented in the metadata (findability).

There were several important outcomes from these Findability-focused enhancements. To name one — according to the KPIs set up for the transformation, the organization has increased its operational efficiency up to 54% due to significantly reduced time spent on searching relevant data, as well as gave a much broader view on the scientific context, thus improving the research results, by making all information easily available.

Accessibility

Once found, data needs to be readily accessible. This means it should be obtainable with minimal barriers, ideally via standardized protocols that ensure data can be accessed, retrieved, and used with reasonable effort.

🏤A Hospital and Clinics Network Case Study

This story briefs the transformative impact I have seen for a hospital network gained by implementing and adopting the Accessibility FAIR principle.

This network consisted of several hospitals and clinics that served a diverse population across multiple locations. The healthcare providers (HCPs) in this network were facing issues in accessing patient records and medical histories, especially when patients visited different facilities within the network (moreover, there were occasions of complex treatments which required specialized services offered by only particular entities). While in the majority of scenarios it was possible to get the required information by contacting corresponding hospital or clinic systems, the disjointed access to diverse applications often led to highly involved, cumbersome, and erroneous process of anamnesis collection. Even though there was SSO enabled (so an HCP should not create and remember lots of accounts where passwords could have got expired with different cadence), a cherry on the cake was the fact that there were no centralized, common register of the network entities’ information systems and their entry points (portals or APIs), and there were cases of an entity having a mix of deprecated and new entry points exposed, which could serve (as you may expect) different (versions of) data sets.

To address these challenges, together with the client we have established a Data Management Improvement Program that contained several projects for enhancing the Accessibility of patient data across all the network facilities. The biggest one was dedicated to implementation of a Unified Access Platform— a centralized data access system employing secure, standardized communication protocols to ensure that patient data could be accessed through safely and efficiently by authorized personnel following common, governed procedures. This system included strong authentication and authorization measures to protect sensitive information while facilitating ease of use and unified user experience.

Outlining some outcomes:

  • The easy and quick access to complete patient histories and records enabled HCPs to make more informed decisions faster, which contributed to enhanced patient care.
  • This efficiency and effectiveness gained resulted in increasing the number of patients served each day.
  • Comprehensive case consideration and improved examination time increased patient satisfaction rates, as evidenced by patient feedback and follow-up surveys.
  • By implementing a centralized access system with robust security protocols, the organization significantly reduced the risk of unauthorized data access and information leakage, also ensuring compliance with healthcare regulations.

Interoperability

Data should not only be accessible but also ready to be integrated with other data sets and applications. Interoperability involves using common formats, common or aligned vocabularies, and standards that allow data from diverse sources to work together seamlessly.

📈 A Global Investment Bank Case Study

There was the consulting engagement of mine for a global investment bank that operated in various segments including derivatives trading. In the reality of derivatives and banking, data comes from multiple sources including market feeds, internal transaction logs, client portfolios, regulatory reports, and many other parties. The bank struggled with integrating these disparate data types and data type versions (a common industry issue, taking FIXML with all its extensions as an example), which led to various issues in transaction processing, multi-step deal handling (with usual failures of clearing and allocation procedures), and such. The lack of interoperability not only slowed down operations but also increased the rate of errors and compliance issues.

A group of business, data, and technology consultants were invited to help with resolving these problems by initiating a comprehensive Data Management Transformation Program, focusing, above all, on enhancing interoperability among the parties. One of the solution parts was establishment of a Data Integration Hub, which aimed at (semi-)automating the processes of data harmonization and semantic interpretation alignment. This appeared to be possible after introducing a number of measures for transforming the information exchange infrastructure and for executing data pre-processing at sources (for partnering organizations and the bank’s subsidiaries) or right after the data was ingested (for other parties’ information flows) to adhere to the FAIR Interoperability principle, e.g. synchronizing the information delivery channels and protocols, associating provenance, classification, ownership, term ontology references, and other metadata with data sets, etc.

Highlighting some outcomes, the improved integration of market data with internal trading systems allowed bank’s internal and external authorities, involved into the transaction handling process, to access required information in full, timely, and with much less errors, improving thus the overall operational efficiency by 30%, together with reducing maintenance costs by approximately 800 hours per month within only the US region.

Reusability

Lastly, data should be well-documented and richly described to facilitate its reuse. Clear and accessible metadata, including licensing and provenance information, ensures that the data can be effectively used for future research, analysis, or processing.

Global Chemicals Distribution Company Case Study

This is the case from a leading chemicals distribution company that operated across multiple continents. The company’s operations involved complex logistical processes across different regions, where managing and reusing information effectively among involved parties was becoming increasingly cumbersome due to inconsistent data formats, varying data delivery approaches, misaligned information semantics, a lack of comprehensive metadata, even due to different languages used in the contracts setting the mutual agreements for the course of material transportation. These challenges blocked the strategic business growth expected.

To address data exchange challenges and leverage data as a reusable asset, the company initiated a strategic initiative focusing on several data management aspect including the enhancement of data Reusability. This initiative introduced (but was not limited to):

  • Rich metadata annotation: A structured approach of associating standardized metadata to data sets, including the details of the data origin, quality, conditions of use, processing steps, localization, was adopted. This ensured that data could be properly and consistently interpreted, understood, and effectively used by anyone within the value chain(s).
  • Standardization of data formats: Standardizing data formats across parties within the supply chain enabled the company to integrate data smoothly from various logistics sources and ensured that data could be easily exchanged and reused across different companies, branches, and business units appropriately.
  • Introduction of data stewardship: The company introduced a new role of a Data Steward who were responsible for maintaining the quality and reusability of data, ensuring that data sets correctly describe the business reality and adhere to the company’s business processes. Data stewards implemented policies and guidelines to ensure data was accurately cataloged, made easily accessible for future use and scale, and fit to defined (usage) licenses. The latter, btw, were particularly useful for secured collaboration with external partners such as local freight companies and customs agents, who were granted conditional access to relevant data through specific licensing agreements. These licenses ensured that data shared for operational purposes (like expediting customs clearance) could not be used for any other purpose, protecting the company’s business interests and intellectual property.

Having the above activities adopted and sustained, by decreasing the need to recollect, reproduce, protect in an ad-hoc way, and correct the data, the company saw significant reductions in associated costs, not mentioning significant decrease on operational expenses from logistics failures (both financial and reputational) caused by inconsistent data exchange.

Conclusion

The article appeared to be much longer than initially planned, however I believe that it is worth it, as the better the overall understanding of FAIR principles for real-life business cases, the more companies shall evaluate the great value that can be gained by them by utilizing Data FAIRification.

This overview should prepare readers for delving deeper into the actionable aspects of FAIR. Moving forward, our journey will transition from theoretical groundwork to more hands-on strategies.

Any feedback and ideas, as always, are highly appreciated!

--

--