Data as a product at Oda

Xavier Gumara Rigol
Oda Product & Tech
Published in
11 min readMar 1, 2023

We’ve developed six principles at Oda that capture the heart, soul, and aspirations for how we think about value creation from data and insights. This article explains how we applied the “data as a product” principle to insight and how we treat datasets, dashboards, and Machine Learning models as products.

Our six principles for how we create value 🪄from data

What is data as a product?

Treating data as a product is one of the principles of the data mesh paradigm, and it basically consists of applying product thinking to data assets — datasets, dashboards and Machine Learning models in our case — and making sure they fulfill a set of characteristics, like being discoverable, secure, addressable, understandable, and trustworthy.

While the concept is not new, and data teams have been treating datasets as products since Data Warehousing times, it has been popularized and its importance elevated thanks to data mesh because:

  • Zhamak Dehghani, the concept’s creator, came up with a short, simple, catchy name (data as a product) to refer to it.
  • The decentralized nature of the distributed data ownership model made the principle even more valuable and important. In a scenario where data is to be used seamlessly across domains, it must be treated with well-defined interfaces, made discoverable and trustworthy, and so on.

Why data as a product?

At Oda, we strive to use a product mentality on our datasets, dashboards and Machine Learning models to lower the cost of discovering, understanding, trusting, and ultimately creating value from data.

The principle is necessary because we have a decentralized and distributed approach to data ownership. This means the organization’s Data Engineers, Analysts and Scientists create any type of data asset in their own domain. In the illustration below, we show the principle in practice, which you can read more about in this previous article of our series: Distributed ownership of everything data at Oda.

The decentralized and distributed approach to data ownership, by Marianne Askheim

The rest of this article focuses on the processes and governance practices we have in place to treat the three types of data assets (datasets, dashboards and ML models) as products.

Treating datasets as data products

At Oda, when you decide to share a dataset outside of your team, we consider it a product. This has two implications: firstly, each team is responsible for providing a certain service level to enable others to build on top of your team’s data products; and secondly, it means not every dataset is a data product.

In practice, we put the principle into effect by providing guidelines and tools to keep the operational model scalable and decision-making efficient. The next sections detail these guidelines.

Types of datasets as products

Based on the purpose the datasets serve, they’re grouped into different logical layers in our analytics database. Here’s a diagram of them, followed by explanations of the layers.

The Raw layer keeps raw data in the same format as it comes in from the source systems.

The Stage layer is a thin layer on top of Raw. The purpose of Stage is to have a more stable interface between the Raw tables and the rest of the platform, and the ability to enact lightweight transformations, like aligning on naming or mapping enumerated values. This way the Stage layer ensures interoperability between different domains, and is also a (more) stable interface. There should be no business logic or operations, like filtering, joining, etc. All models in the Stage layer are pretty much 1:1 with their respective source tables which makes them native (or source-aligned) data products according to data mesh terminology.

The Model layer is where consumers query the data. A solid data model maps all of the many business domains and associated entities (users, orders, geographies, etc.) that serve as aggregate data products. These Model layers data products require little post-processing to be utilized for reporting and dashboarding.

In the Serve layer, data is organized in fit-for-purpose or consumer-aligned datasets tailored to specific needs. A typical example is a dataset created to match the need of a specific report.

There can be datasets in between these layers that help split the pipeline into several steps. We call them Intermediate.

In order to make this layered architecture work, we standardized the use of these logical layers for all teams so that people have the same experience when querying different data products. Some rules we’ve agreed upon are:

  • Staging models should generally be data products (but teams can also choose to not define them as data products).
  • Datasets in the Model and Serve layers must generally also be data products. If great effort is put into modeling and building great structures and pipelines, others should benefit from that.
  • Intermediate models are not data products.

Everyone part of the Data & Insight discipline is trained on these conventions during their first two weeks at Oda and we have expert Data Engineers sitting in a central platform team who help out whenever needed.

Effort tiers

Another dimension we use to classify datasets are the “effort tiers”. Depending on the criticality of the data product being built, we follow different practices.

Critical/important — best practices are applied: For critical datasets (i.e. granular data for events, clicks, users, orders…) and important company wide metrics these best practices are applied:

  • Pipelines and datasets are built for all layers in the stack: Staging, Model and Serve. Data should flow from left to right in this model and each layer should only use data from the previous layer. There should not be “multiple sublayers” in one layer (i.e. pipelines in Stage building on other datasets in Stage).
  • A solid data model is required.
  • The data product should not break if other teams’ pipelines break. Therefore, pipelines should be built on data products in the Stage layer to avoid being exposed to other pipelines breaking and to ensure fresh data is delivered on the right frequency.
  • All pipelines and fields should be properly documented. Example queries and how-to guides are encouraged.

Rest of shared datasets — good practices: For the rest of shared datasets, we apply good practices. They make datasets good enough to consume and build upon. They follow a set of light weight standards that will secure conformity across teams, without compromising heavily on speed and agility:

  • Create Staging datasets mirroring raw tables, and make them available as data products.
  • Use Intermediate datasets if and when needed.
  • Create a Serve dataset and consider making it available as a data product.

Non-shared datasets — minimum practices: For datasets we don’t intend to share with the rest of the organization we follow minimum practices:

  • Your pipeline can pull on data from any source or layer.
  • They can be implemented as views to reduce implementation time.
  • Do transformations in the pipeline, not in the data visualization tool.
  • Explicitly mark your dataset as not a data product.

All of these governance rules and best practices have been documented and customized for our analytics stack, which includes Snowflake (for processing pipelines and accessing data), dbt (for dataset engineering) and Looker (to build and access dashboards).

Treating dashboards as data products

At Oda, we strive to treat important dashboards as data products too. To do so, we’ve documented specific guidelines for creators and viewers that make these data assets discoverable, understood and trusted.

In Looker, we have a folder structure that closely reflects the organization; dashboards and analysis inside those folders follow the same naming convention: [Quality Assurance status] Title — [Key stakeholder] — [Data Granularity].

Although we strive to allow and enable all employees and teams to create their own dashboards and analyses, we realize that the amount of available data is huge, and that the experience with data analyses and statistics varies across the organization. Thus, we set up a process for cross-functional teams to get Quality Assurance support from someone in Data & Insight. We use emojis in the title of the dashboard to reflect the QA status:

🔦= Ad-hoc analysis
⌛= Awaiting QA from Data & Insight
💻 = Quality approved by Data & Insight practitioner

Other emojis used in combination to mark dashboard status are:

✅ = Dashboards and analysis that are treated as products, are shared with other teams and follow best practices
🔧 = Dashboards and analysis that are team specific (not data products) and do follow best practices
⚠️ = Content that does not follow best practices and should not be used if it can be avoided

Using this simple trick, data consumers can be confident that if a dashboard is marked with 💻✅, they can really trust the numbers and use the data to inform their decision.

Another critical aspect of treating dashboards as products is curating the process of creating the dashboard itself, ensuring that it meets a user’s needs. At the moment, we are working on better defining this process, but we believe having it will:

  • Limit the total number of dashboards created and make sure unused dashboards are deleted. To provide some context, we recently deleted Looker content that had not been accessed in the previous 6 months: 27% of dashboards and 46% of analysis (“looks”) were deleted.
  • Improve the relevance of existing dashboards. Less is more; each domain should have its own go-to dashboard for monitoring key metrics.
  • Increase knowledge about product thinking for data practitioners. This can be done by including UX methodology, such as user interviews and testing when making dashboards a product.
  • Increase data literacy across the organization by also training data consumers on basic and intermediate dashboard creation aspects.

Treating Machine Learning models as data products

Data and algorithms are at the heart of our retail system and operational model at Oda. Our cross-functional teams own and operate different optimization and Machine Learning models solving different business and operational problems: demand forecasting, recommender systems, user segmentations, fulfillment optimization and more.

We follow the same ownership principle as with other types of data products (the team owning the business problem owns the data product) and we have a central platform team maintaining our Data Science Platform which contains the required capabilities to run and operate these models.

Accompanying the data science platform we have developed best practices and guidelines for Data Scientists to be self-served allowing them to both deploy and monitor their models in production. More details on how we enable Data Scientists to work end-to-end can be found in this previous article: Empowering End-to-End Data Science at Oda. Additional descriptions on data science products and governance model at Oda will be introduced in an upcoming blog post.

Who owns the data products?

Each data product (dataset, dashboard or Machine Learning model) is assigned to a cross-functional product team that is responsible for the data product lifecycle. We believe that placing ownership of data assets and products on cross functional teams, with domain experts, product managers, UX and software engineers, will positively impact applying product thinking to data. This should increase the likelihood that we solve the right problems in the right way more of the time.

Data products ownership changes when the organization changes and it is difficult to say if we ever will be solving the ownership problem fully, as there are always constraints on capacity and resources. Because of this, we take a pragmatic approach and work to have the necessary processes in place so that it is easy to change the owner of a data asset at a given time.

For some important domains (user behavior, users, orders,…) we also differentiate between core (aggregated) data products and downstream (fit-for-purpose) data products. At the moment of writing this, core data products are owned by our central Data & Insight Platform team. Downstream data products are those built on top of core datasets and owned by domain teams.

One example to showcase we have not yet fully resolved the ownership issue is that we are considering reducing the Data & Insight Platform team’s ownership of core data products over the next 3 to 6 months for the following reasons:

  • An increase in the number of capabilities that our Data & Insight Platform team must manage makes core data products a second class citizen in terms of priorities.
  • There exists a cognitive difference between managing infrastructure capabilities versus datasets (the fine line between Data Engineering and Analytics Engineering).
  • Ideally, we want teams in charge of datasets, especially core datasets, to be near to teams in charge of operational systems that touch core data, allowing for better end-to-end and holistic ownership and without treating data differently than any other development.
  • We cannot afford data downtime for core data products as they hold critical factual data about our business. They must be the highest priority for the team that is in charge of them.

We are still under discussion on the best next step here but the tentative target owners of these data products are other domain teams or a newly created team; always making sure the new owners have the capacity and the priority to work on this effort.

Final words

Thanks to considering data as a product and the distributed nature of data ownership, cross-functional teams are self-served on low complexity insights (counts, trends and descriptive analytics in general) without the need, in the majority of teams, to invest Data Engineers time.

This ability to be self-served and the nature of distributing Analysts and Scientists in domains close to the business problem, allows for more capacity to experiment and solve more advanced insights that are very specific to the domain itself.

Once we enter this feedback loop, experienced teams are able to be self-serve on more complex insights and invest more time on further exploration on user behavior and more experimentation.

It should be mentioned that Data Analysts and Data Scientists at Oda are working end to end in our stack, and more technical than similar roles in other companies. If you want to read more about how we understand the different Data roles at Oda, you can check the following articles about the role of Data Science, Data Analytics and Data Engineering.

If you are interested in reading more about the other principles on how we capture the heart, soul and aspirations for how we think about value creation from data and insights you can check the first blog post on this series: The six principles for how we run Data & Insight at Oda.

--

--

Xavier Gumara Rigol
Oda Product & Tech

Passionate about data product management, distributed data ownership and experimentation. Engineering Manager at oda.com