How to create a great data platform, one step at a time

A simple mental model for data platform managers to lead and manage high-ROI changes

Ella Pham
Data & Beyond
5 min readAug 27, 2024

--

“You can benefit from treating your data platform as a factory “— Image by Google DeepMind

Let’s start by pondering a question:

“If this year you can improve only one aspect of your data platform, what would that be?”

It is easy to have a long list of things you’d like to improve. Yet in reality, resource constraints always force us to prioritise and place our best bets. To prioritise is quite a difficult thing, especially in an emerging, dynamic and complex domain like data platform operation. In this article, I will offer a framework for you to systematically answer that question. It consists of 3 steps:

  1. Understand different performance attributes of a data platform
  2. Understand the key drivers in your situations
  3. Make decision and move on

Let’s dive in!

Step 1: Understand different performance attributes of a Data Platform

Regardless of technologies, operation models or maturity levels, the performance of every data platform can be assessed in three main attributes:

  • Input
  • Processing
  • Consumption

You can see in this model, a data platform is similar to any common factory, which takes a certain type of input, processes it, then offers it to the users to consume. From a data-product manufacturing perspective, a data platform is a factory — a specific kind of factory whose input, processing and consumption can be measured in very specific ways:

Measurements of input:

  • Number of data sources integrated: covering a wide range of source systems such as internal applications, 3rd-parties sources, public sources.
  • Number of data types processed: structured, unstructured.

Measurements of processing:

  • Speed and timeliness of processing: data pipelines are run at sufficient speed and frequency, enabled by either batch or stream processing.
  • Quality of processing: Errors are well-prevented, detected, and managed, enabled by capabilities such as observability, CI/CD, testing, version controls. Environment is secured, enabled by RBAC and other network security tactics. Processing is cost-efficient, enabled by configuration of storage and compute technologies.
  • Integrity of processed inventory: data are well integrated, showing consistent values for a record, enabled by master data management and reference data library.

Measurement of consumption:

  • Quality of output: Quality of data products can be inspected by end users, enabled by data quality metrics, or reference to data validation results, data owners.
  • Ease of consumption: Data products are discoverable and understandable to business users, enabled by data catalogue, data marketplace, data dictionary. Consumption tools such as BI, Analytics, ML/ AI tools are easy to use, mostly enabled by the choice of vendors for these.
Image by author

As you can see, when the measurements of the data platform’s attributes are clear, we can improve them, and therefore improve the data platform, by changing the specific components enabling those measurements. For example, your choices can be:

  • Increase the number of data sources integrated in the platform
  • Add a data observability tool to the platform
  • Embed a communication channel with data product owners in the consumption interfaces

Knowing what choices are there, now we can move on to the next step.

Step 2: Understand the key drivers in your situations

There are often three kind of drivers for any organisational transformation initiative, no matter big or small:

  • Business needs
  • Existing capabilities
  • Future trends and foreseeable changes

For business needs, you can ask yourself what are the most common use cases for data in your organisation. Is this analytics for cost efficiencies, is this real-time detection of fraud? Identify the most painful challenges that your data consumers are facing, and put a big star on it.

For existing capabilities, the goal is to identify if you have enough fuel to go to the end of the change initiative. Depending on which attribute is the potential target of change (Input, Processing, Consumption), you will have to survey different capabilities (data domain expertise, engineering, analytics, product management, etc….). Look both in your team and across the organisation. Don’t limit yourself only to the resources you can directly control.

Finally, be aware of future trends and foreseeable changes. I’d specifically consider those foreseeable changes within a 6-month horizon. Look for factors that may disrupt the current data landscape, both from business needs and existing capability perspectives. You can ask questions such as:

  • Will a specific need go away due to organisational restructure? For example, your company is selling off the real-estate division, focusing on private equity only. That will change your business needs, impacting your plan.
  • Will a capability be significantly enhanced soon due adoption of new technology? If a department is looking to migrate an on-prem database to cloud, that would definitely impact your integration agenda.
Image by author

Step 3: Make decision and move on

After considering the business needs, existing capabilities, future trends and foreseeable changes, a few candidates of priorities will stand out, which:

  • Support critical and significant business needs.
  • Can be supported by existing capabilities, or require a reasonable amount of investment with tangible ROI.
  • Will still stay relevant given the changing landscape, or will even become more relevant.

To make the final cut and pick your one and only candidate, simply narrow down to the one with the highest score across those 3 boxes. Also look to your stakeholders (CFO, CDO, department heads) for buy-ins of your ideas and choose the path of least-resistant. Then, make the decision and stick with your plan.

Once the change has been implemented, remember to measure the impact, provide tangible numbers for your achievements. You can refer to the measurements of the data platform mentioned in step 1 for ideas of how to measure your success. After that, broadcast your results to your CDO and business stakeholders. You can use that as a platform to move on to the next items on your priority list.

Besides articles like this one, I also write a newsletter — Data & Beyond Dispatch. Aiming to help data leaders extract the most value out of their data assets, I write about wisdom, strategic principle and operational best practice from the Data industry. Subscribe if you want to make more impact with data!

--

--

Ella Pham
Data & Beyond

I write about how to use data to bring values to businesses. Check out my weekly newsletter: https://dataandbeyond.substack.com/