Data Mesh Operating Model

Umesh Bhatt

--

If “Data Mesh” has/is being sold and led by IT organizations as a strategy for managing your data better, it will fail. Read on…

Data mesh, while sounding like an architectural framework, is anything but. It is a set of principles that align and direct the workforce to all think of data as a valuable asset.

Data mesh is a framework for building data organizations. It is a construct that brings together all key players of data — producers, consumers, engineers, data management professionals, stewards and governors of data into a singular focus — to get the most out of data.

It does so by organizing data (and people) into domains. Each domain is a self-directed, fully inclusive and integrated data value generation factory. Its principal purpose is to develop products that are needed to fulfill the domain’s objectives, set forth by its creators (management organization that all domains belong to).

It is an art to define a “domain”. When companies like Spotify are used in the example of a data mesh — it works because it started its life as a data company. When companies like Netflix are mentioned, it works because Netflix realized that data is their key and the only appreciating asset in the future as soon as streaming videos became a thing in their early days.

Traditional companies such as financial companies treat each business’ products as their own profit and loss centers (personal wealth management, collateral etc.). They can innovate better because they can “try” new things in smaller product groups; and scale.

Pharmaceutical companies are not organized like financial companies. They are monolithic, and products (drugs) have a value chain that cuts across several divisions/departments (R&D, Commercial, Manufacturing etc.).

For traditional companies like Pharma to become data organizations, they will need to think bigger than data lakes, data lake houses, data fabric etc. These IT architectural patterns will not solve the data problems. They are organized around maximizing value from a business process, not data. Data is helping these organizations to make data driven decisions, and used for validation of their process outcomes. While IT frameworks do help reduce cycle time and efficiencies; it will not eliminate them.

In current trends, technologies such as ChatGPT and OpenAI have shown us that they can perform things more efficiently and have the potential to cut many business processes and decisions. However, traditional organizations such as the Pharma are not organized around data to leverage the power of such technologies.

How do you get started?

The section below describes a new data focused organization for a mid sized pharmaceutical company.

The first step is to understand the term “domain”. A Domain is a smallest organizational unit that has discrete authority over a small section of the organization. For example, a R&D division that has multiple functional groups, such as Research and Development are too large to be called domains. An example of a good domain unit would be a small molecule drug discovery group. They are looking at advancing drug discovery using small molecules through in-vitro and in-vivo processes. It’s large enough to have specific high impact outcomes, but small enough from a data perspective. That would be an ideal definition of a domain for a medium sized pharmaceutical.

One can go higher or lower than this — in-vivo or in-vitro areas; or limit it to certain functional areas such as lead discovery ( target ID and selection), compound screening or target validation areas for example. There is no one size fits all (goldilocks principle). In larger pharmaceutical companies, these can be smaller because they may have more functions, processes, volumes etc. In small companies these can be larger.

It is however, important to identify this first. This defines the boundary of this self governance team responsible for the oversight and strategy for the use of data in that domain. This governance body/team (depicted in orange in the picture) defines the objectives, remit and use of data. They set the priorities for what types of information are suitable for operational analytics, company’s analytical goals and machine learning use. They are, for all practical purposes, considered to be data owners of all data produced and consumed in the domain, and approve data uses that become the data products (for that domain). They also set rules on use of this data outside the domain. The team composition is almost entirely business leadership.

Operating model for an effective a Data Ecosystem

The domain creates data products needed to meet the domain’s objectives.

Data products are produced, managed and serviced by a data team (depicted in light blue boxes inside the orange circle). The data team consists of data engineers, data developers, data analysts, data scientists and data stewards. This is a new organizational function for most companies. This is a set of team members from both Business and IT organizations fully dedicated to domain and data product development. Their responsibilities include data architecture, data ingestion, annotation and labeling, data transformations, data cataloging and publishing of data products. They work with other IT teams (data & platform teams, infrastructure teams etc.) to obtain the best infrastructure needed to host their data products. Data teams are supported by data engineering teams (green) in data FAIRification, semantic data linking, search, data capture, developing tools for building data products, software services etc.

The data products are enabled (support, change management and adoption) by the enablement team (yellow), who’s core responsibilities include advocacy, training, adoption, maintenance of data products, cataloging, data stewardship, executing governance and access etc.

The domain uses platform teams (depicted in green) who rely on enterprise IT services (depicted in teal) for publishing their data products, storing their domain data, measuring data quality, and managing access to and from other domains. Enterprise IT also provides the data infrastructure capabilities such as cloud based data management capabilities, servers, compute etc.

Here are some examples of services provided by the enterprise IT teams:

  • Data products can be published in a a managed marketplace with data contracts (data use and access)
  • Data products can be virtualized or physically implemented on managed data platforms
  • Data products can be supported by managed data management & operations to supply tools, techniques, subject matter expertise on databases, cloud computing, compute etc.)
  • Data products can be certified by managed data quality (FAIR etc.)

Lastly, the analytics team (depicted in yellow) consists of data scientists and data analysts who are responsible for aggregate and predictive analytics. They rely on the capabilities developed by the domain teams, and access data via the tools supplied by them to find, access and use the data in their analytical environments. These analytical environments will include programming environments, tools to develop and manage their machine learning models, execute them on various computing platforms etc.

The color coding of the picture loosely aligns with organizational structures typical of medium sized Pharma. Green teams could be IT teams with functional expertise to service specific functional groups (R&D, manufacturing, commercial etc.), while yellow teams are mostly business functional groups. The orange and light blue are cross functional teams; with varied composition of both business and IT membership. This is a new organizational structure for most traditional organizations, where organizational boundaries separate the two functions. The teal team is mostly an enterprise or corporate IT with a view of corporate standardization for cost optimization and standardization of technology across the company.

What is a product?

A data product is a “product” of a data domain and data thinking. Let me explain, we know a data domain from earlier. For example, let’s assume small molecule drug discovery is a domain. Product thinking is the application of technologies and processes to make this domain’s data findable and usable via multiple modalities (APIs, Applications, ML Models etc.).

A data team produces data products. A data product satisfies one or more business objectives for the use of data from that domain. A technical implementation of a data product could be a data table + manifest combination.

A data table could be a super table format (e.g., one table per unique entity) or it can be an entire system or a database as long as it can comply with the rules defined for a product.

A manifest contains information such as: access and use conditions, schemas, entities in the data table, their business metadata (e.g., definitions of attributes, terminology PURLs, keywords that allow data tables to be findable) .

The combination of the data table + manifest could be referred to as a FAIR dataset.

A data table is considered to be “right sized”, if you are able to establish a clear owner accountable for its product development and management, and are able to assign use(s), and a commitment to develop and sustain it until it reaches target value..

Companies such as nexla and nextdata are pioneering technology development to manage data products as containers, a data-product/domain management system. A gap in our current modern technology architecture stack.

Where do you go from here?

A data ecosystem is a collection of capabilities (people, process, technology) of all the domains in an organization (or division — e.g., Therapeutic Area, Research or R&D).

From a technology perspective, one can identify a set of principles, upon which to build technical capabilities to drive the domain needs. Such principles could include an API first strategy (every application/system built in-house or COTS must have standardized APIs), a platform that allows them to seamlessly move data across domains (self-service capabilities. a governance capability that ensures human and machine compatibility for reusability and set/enforce data contracts.

Welcome to the new data organization.

--

--

Umesh Bhatt
Umesh Bhatt

Written by Umesh Bhatt

Engineer, Introvert, Traditionalist, ADHD, Artist, History, Culture, Food

No responses yet