Distributed ownership of everything data at Oda

Published in

Oda Product & Tech

12 min readNov 28, 2022

In Oda, distributed data ownership, shared data governance is one of the six principles for how we create value from data. This principle has been key to our success in scaling Data & Insight from a one pizza team to a sizeable discipline, and in pushing the boundaries of what is possible when data meets real-world problems in the online grocery space. In this post, we dig deep into what we mean by distributed data ownership, shared data governance, and how we have solved this in practice.

Our six principles for how we create value with data: Digging deep into *distributed data ownership, shared data governance*.

Data is a capability, not a function

Most of our Data Analysts, Data Scientists, and some Data Engineers work as part of cross-functional product teams with Software Engineers, Product Managers, UX Designers, and domain experts (like logistics, commercial and growth specialists) in different parts of our organization. Co-locating data skills and domain problems is only the first step — we also want to empower our teams to move autonomously and with speed to solve the problems at hand, and this is where distributed ownership plays an important part. Inspired by the data mesh concept, we have placed most of the responsibility for data on the different product teams who take full ownership of “everything data” within their business domain. The teams are supported by central platform teams, which provide platform and enablement services like infrastructure, tooling, guidelines, and training everyone needs to work efficiently with data.

In practice, this means that each team is responsible for the entire data value chain in their domain. This includes everything from data production and ingestion, data pipelines and products, as well as topics like data literacy and how we take action on insight. In Oda, data is a capability, not a function. We do not have a central data team that solves “all data problems”. This is up to each and every team.

In Oda, data is a capability, not a function. We do not have a central data team that solves “all data problems”. This is up to each and every team.

Product teams are the new data teams

In the next sections, we will use the Delivery team as an example of how distributed ownership works. The team is part of our mission to provide the world’s most worry-free delivery experience, and we have Data Analysts working alongside Software Engineers, a Product Manager, Designers and Distribution specialists to make that happen. The team is responsible for things like vehicle management, route staffing, and returns from customers, and they build and operate the technology and applications that support these processes.

The responsibilities for a team like Delivery in the distributed ownership model can be summed up in six bullets:

Produce and expose data from applications
Make data easily available for themselves and others
Build and run data pipelines
Build and manage data products
Drive product development with data
Enable the teams and people they are supporting

Let us go through them one by one.

Produce and expose data from applications

The most important sources of data for the Delivery team are the applications that they build and run. Examples of this are the mobile app the drivers use when making their deliveries and the application the dispatch office uses for planning, monitoring, and assisting drivers on routes. By building and running their own data sources, the team has full control over what, how, and when data is generated. The Data Analysts will work together with the Software Engineers to make sure that the right data is stored in the right format in the source systems and that the relevant event data is tracked in the applications. This is a “shift left” on data for product teams: Data is part of every step of the design and build process instead of being an afterthought. Data quality issues are nipped in the bud instead of piling up at the bottom of the backlog.

This is a “shift left” on data for product teams: Data is part of every step of the design and build process instead of being an afterthought. Data quality issues are nipped in the bud instead of piling up at the bottom of the backlog.

Make data easily available for themselves and others

Each team is also responsible for making their data available and interoperable for other teams to use. This would not be possible without the platform teams supporting the product teams with shared infrastructure, tooling, and guidelines. In Oda, we use Fivetran for batch ingestion of transactional data and Snowplow for event data from web, apps, and server-side, and all data is landed in our data warehouse, Snowflake. In Snowflake, data is made available for other teams to query and build on, making it interoperable with data from other teams and domains. As an example, Data Analysts in Delivery are responsible for pulling in data produced in the vehicle management process and setting up regular snapshots of the datasets that we want to keep an historical record of. This can then be queried by any other team that might be interested in using data about vehicles.

To make sure data is interoperable throughout our insights stack, we follow shared guidelines for how to set names and data structures. This way, we ensure that data from different teams and domains can be used together in different logical layers in Snowflake and in the semantic (explore) layer in Looker.

Our insights architecture: Data is ingested from source systems using Fivetran and Snowplow, stored in Snowflake, transformed by dbt and exposed through Amplitude, Looker, notebooks, applications, and Growthbook.

Build and run data pipelines

Raw data is very rarely delivered in the right shape and context we need for analytical purposes. Thus, an important part of the job for our Data Analysts, Scientists and Engineers is to build data transformations and chain transformations into pipelines that are scheduled to run at regular intervals. We use dbt to transform data into star schema format and wide datasets that are used for business intelligence, ad hoc analysis, and input to machine learning models. Our platform teams make sure that each team has the tools, training, and support it’s needs to manage every aspect of its data pipelines. A few examples of things that all teams have access to:

Separate Slack channel where they are notified when something is off or broken in their pipelines.
Cost dashboard where they get an overview of their pipelines’ Snowflake credit spend and worst performing dbt jobs.
#data-platform-support Slack channel where they can reach out to data engineers for assistance with tasks like performance tuning.

Monitoring data pipelines: The platform teams provide the product teams with the infrastructure and tooling they need to be effective at building and running their own data pipelines.

Build and manage data products

The Data Analysts in Delivery are in a unique position to understand how data can make an impact in the delivery domain, and they have the skills to build the data products that address the delivery area’s specific needs and opportunities. We will do a deep dive on data products in a follow-up article, but for now, let us say a data product could be anything from a data mart in Snowflake, a Looker explore, to a machine learning model, and many things in between. The main point is that the team takes full responsibility for discovering, building, running, and managing the right data products, doing data product portfolio gardening, and making sure the data products are properly implemented and operationalized.

Drive product development with data

In any high-performing product team, data will be front and center when discovering, building, and managing great products. Having the right data and data products is only a small part of what it takes to operate at this level. Equally important is having the right competency, culture, frameworks, metrics, and way of working. This will mean different things for the different roles in each team:

The Product Manager will spend significant time reviewing and analyzing the team’s product metrics and their impact on business metrics. For a product manager in Delivery, route loading time and on-time deliveries are examples of metrics to track and understand. For teams building on the customer-facing parts of our product, metrics like click-through rates, conversion rates, scroll depth, and results from the latest experiments are more relevant. Because we use objectives and key results (OKRs) to align strategy with team execution, the Product Manager will also want to measure and analyze progress towards the key results in focus during an OKR period.
The Software Engineers will make sure that their applications are properly instrumented and will build tracking and feature flags into every part of the application and all new features. This enables the team to run experiments and gradual rollouts to understand when product changes are not as useful, usable, or effective as we thought they would be, and to minimize the impact of bugs and bad code. They will also keep a close eye on tech metrics like load time, downtime, and mean time to recover to make sure we always push quality code.
The UX Designer will be interested in combining learnings from their qualitative research with quantitative data on how our customers are actually behaving. They will set up and run experiments to make sure any assumptions are tested and validated, and they will dig deep into data on different customer segments.
The Data Analysts, Scientists and Engineers are mainly there to help facilitate this way of working. They will support the team by building useful data products, helping set up experiments and analyze the results, coach and train on how to analyze data, and just about everything else the team needs to drive product development with data. To read more about the three different roles and what they typically do, check out our three roles in Data & Insight at Oda.

It is important to note that even though the Data Analyst, Scientists and Engineers are the “data professional” on the team, distributed data ownership is a team responsibility and not something that only concerns parts of the team.

It is important to note that even though the Data Analyst, Scientists and Engineers are the “data professional” on the team, distributed data ownership is a team responsibility and not something that only concerns parts of the team.

*Cross-functional teams: People with different skill sets and backgrounds come together in cross-functional product teams to solve our most difficult problems.*

Enable the teams and people they are supporting

Another of our principles is to value enablement over handovers. In our example, this means the Delivery team will work to drive data literacy, upskill their coworkers on data topics, tools, and methodology, and give them the mentoring and coaching they need to be self-served and able to solve most of their own day-to-day problems. Many of our product teams, including Delivery, are also responsible for supporting operational teams. Hence, coworkers are not only members of their own team, but also people responsible for driver management, yard control and so on.

To some extent, deficiencies in data literacy and competency can be compensated for by providing more refined data products: Not everyone is able to build the dashboard they need, and then a Data Analyst can build it for them. The challenge is to find a balance between who and how much to enable versus what and how much to build. Ideally, our Data Analysts, Scientists and Engineers spend most of their time on the high-leverage tasks that really require their full set of specialized skills and less on tasks that, with some enablement, could be performed by others. In any case, and whoever ends up building those dashboards, the team is responsible for making sure that the areas it supports have a well-defined, cohesive, and holistic dashboard structure. Ensuring that popular and business critical content holds the appropriate standards in terms of usability, performance, stability and freshness, is also the team’s responsibility.

Ideally, our Data Analysts, Scientists and Engineers spend most of their time on the high-leverage tasks that really require their full set of specialized skills and less on tasks that, with some enablement, could be performed by others.

Enablement is also important when operationalizing machine learning models that our Data Scientists build. For our operations to make good staffing decisions based on results from our sales and demand forecasting models, they need a good understanding of the underlying mechanisms and the models’ inputs, assumptions, strengths, and weaknesses.

Enabling others: The Delivery team is supporting operational teams such as Delivery Site Management and Fleet Development and local distribution operations.

To sum up our example, Delivery is responsible for every aspect of value creation from data in the delivery domain, and this extends well beyond running their own data pipelines. The same setup applies to every other product team in Oda and is, in essence, what we mean by distributed ownership.

Shared governance: Solving for cohesion and harmony

By distributing ownership of “everything data”, we empower all our teams to move autonomously and fast. But although we value freedom and autonomy for our teams, we also think it is important to align on some aspects of our data practice. Providing a holistic user experience in Looker, using the same names for the same data concepts, and using the same data modeling techniques and coding standards are all examples of things we need to solve across teams. To a certain extent, the teams are able to self-govern and coordinate, and there is also an element of intrinsic authority in a team being the clear owner of a data domain (Delivery gets to decide that vehicles are called “vehicles“ and not “cars,” for example). The remaining issues are mostly addressed through platform services or as a community of data professionals in the Data & Insight discipline context.

Shared tooling and infrastructure

There are lots of good reasons to align on common tooling for similar jobs to be done. There is typically some overhead in procuring and managing tools, as they often require specialist skills to integrate, operate and use. The marginal cost of adding more users or use cases to an existing tool is often lower than buying a new, and common tooling also cater for internal mobility. In Oda, it is an important part of the mandate for our data platform teams to understand the common tooling needs across the organization, and buy, build, integrate, and operate the tools that cover those needs.

Shared guidelines and best practices

Having guidelines and documented best practices on how to perform similar tasks across teams helps us keep the technical complexity down, improve interoperability and the user experience, and lower the barrier for internal mobility. Examples of this could be naming conventions, coding standards, practices for handling historical data, and standard color palettes to use in dashboards.

Enablement and training

By providing the fundamental training on our tooling and data concepts, we make sure that our tools, methods and best practices are well understood and used, and that we have a common understanding of our most important data concepts. We run regular Looker trainings, provide ad-hoc support, and facilitate communities of practice. As an example of the latter, the platform team responsible for providing experimentation tooling is also responsible for facilitating the experimentation community of practice, where people from all over the organization come together to learn about experimentation and agree on common practices.

Finally, it is worth highlighting the value of having a strong Data & Insight discipline where data professionals from different teams come together to learn, hack, collaborate, build relationships and have fun. By having a strong data community, it is easier to find common solutions to common problems, cross-pollinate ideas and practices, crowdsource different approaches to complex problems, and team up to solve problems spanning multiple areas. It also plays an important part in the professional development of many and in attracting and retaining talent.

Together with the five other principles, distributed data ownership, shared data governance plays a key role in how we operate and evolve our data practice in Oda and is, at its best, a very powerful approach to solving data at scale. At its core, it is about viewing data as a capability, not a function, and giving product teams great freedom (and with great freedom comes great responsibility).

If you liked this post, you should check out our Oda Product & Tech Medium blog for more. There, you can read how the Delivery team went from zero insight to predicting service time with a machine learning model and how empower End-to-End Data Science in Oda with our Data Science Platform.