Data Mesh @ Meli: Empowering data owners

Published in

Mercado Libre Tech

8 min readApr 18, 2024

Picture yourself facing a major decision, a very big one! Struggling to choose between several options. What is the first thing you’ll need? Bingo!!! Data!!!! High-quality, fresh and comprehensive data about the topic you are grappling with.

Now, imagine yourself working for a very important large company where the data you need could only be delivered by one department -a group of highly skilled technical data engineers who are also in high demand from multiple other areas within the company. To crown it all, these other areas also need to make the most important decisions ever. Tough call, right? This is the very beginning of most of the challenges associated with centralized data production.

In very complex, large and dynamic companies, this centralized data production approach is no longer meeting the growing demand. A new strategy must be implemented to break free from this dependency and bottleneck. That’s where the concept of Data Mesh, introduced by Zhamak Dehghani a few years ago (2019) comes into play to bring some light at the end of the tunnel. She proposes a new way to think of data as a product and redefine the way data consumers and producers relate. In this new paradigm, decentralization of ownership is crucial.

In 2021, Mercado Libre (MELI) was ready for a change. We were determined to reinvent ourselves, transitioning our team from being mere providers to becoming supporters and enablers. Support empowers others.

If you want to learn more about our initial steps, I invite you to read (if you haven’t yet!) “Data Mesh@MELI: Building Highways for Thousands of Data Producers’’ written by my colleague and teammate Ignacio Weinberg. You can’t miss it; he shares valuable insights into the context and origins.

So, in 2021, we began our transformation journey, which was both exciting and challenging. Five squads and fifteen tech developers participated in this project. We were highly motivated because we weren’t able to find a data mesh implementation as vast and complex as ours… we were making history! Additionally, we took some liberties and decided not to go exactly by the book (as Ignacio so eloquently describes in his article).

A few months later, our platform was ready. In addition, culture is the other key ingredient in this recipe, and Mercado Libre has excelled in this respect. With such a high level of data-driven culture, the outcome was quite amazing… let me share some numbers to illustrate the impressive adoption of the initiative.

We launched it in July 2022 with just four Domain teams, also known as DMEs (aka Data Mesh Environments), and approached the end of 2023 with more than 90. Yes, you’ve read well… more than 90 Data Mesh Environments, each with its own separate infrastructure and a full range of capabilities to create data products.

Also, we scale up from 52 data engineers in a centralized team organization, to 1800 data producer members in a decentralized ecosystem… not bad, right?

When we embarked on this journey, our more-than-10-year-old data warehouse contained about 3000 tables. Within just 18 months, over 6000 tables have been created by our Domain teams, ingesting and processing all kinds of data from different sources and technologies. Data production has skyrocketed.

As you may imagine so far, achieving such a significant decentralization requires a strong focus on governance and culture, which take a main role in this new strategy. Let me describe some key actions that we have implemented:

1.- Creation of a Data Observability Team

This team is focused on creating data observability solutions, providing DMEs with a complete set of metrics to efficiently manage data products lifecycle. A complete set of KPI is available within our platform, we called it “DME Metrics”. We are currently working on penalizing critical deviations, in order to maintain control over the ecosystem.

A quick glance of our “DME Metrics” Section

2.- Best practices, golden rules and learning path

Additionally, a Multi-discipline Governance Squad was formed to consolidate and summarize all best practices for producing data products. They have created a comprehensive set of golden rules and established an asynchronous learning path to spread this knowledge to every DME team member. This training is mandatory to become a data producer.

3.- Golden Rules embedded in our Platform

To ensure governance, most of these golden rules have been embedded in our Platform. A robust release process with more than 40 control points validates every golden rule to publish a data product. A robust Poka-Yoke style Process.

4.- Master Data

In order to maintain the integrity and consistency of our data model, we have defined a new concept called Master Data. These are very important assets within the data ecosystem we need to govern. We began with “Master Columns”, a specially curated set of columns in our data model, selected by their relevance. These columns have been flagged in our data catalog to serve as suggestions whenever someone needs to reference these assets. This helps keep the data modeling of business concepts consistent across the mesh in terms of naming, definition and datatypes. Today, more than 60% of new tables have inherited at least one master column. This is a continuous process to increase our number of master data assets.

This is an example of how our master data works. When defining a new column, the platform validates if you are referring to a master column, then suggests and validates the curated metadata for a master column

5.- Data Observability Tool

The main concern of every data consumer is to know how healthy the data is. To address this, we have implemented monitors for freshness, volume, and schema changes using the Monte Carlo tool. These monitors alert the respective Domain owners whenever an incident requires attention. Additionally, we have integrated these monitors with Data Catalog (our inhouse developed tool to catalog almost everything), allowing data consumers to easily identify data products with open incidents. On top of that, every domain member has the flexibility to develop all kinds of custom monitors for specific fields.

Montecarlo Monitors Incidents embedded in our Data Catalog

6.- Data Catalog & Lineage

As you may know, data discoverability is a key point when it comes to decentralization. At Meli, we have built our own data catalog and lineage tool that seamlessly integrates with our Data Suite tools. Why did we do this? Because we believe in cataloging nearly everything, regardless of the underlying technology. Our tool allows us to track every relationship and apply our own visualization rules.

Lineage Section

7.-Data Artifact Criticality

With decentralization, there is no longer a single owner to determine what is trully critical. And the most important point is determining some homogenized criteria about how critical a data product is. So to address this issue, we have developed a machine learning model that classifies every artifact according to 5 levels of criticality, helping us focus on the most relevant artifacts. For more details on this aspect, I invite you to read the article “How Data Artifacts Criticality is Enhancing Data Governance at Meli’’ written by my colleague Marielle Cortes de Souza Nunes. It showcases a very Cool Feature!

Moreover, the data artifact criticality level is consumed via API to control certain platform functionality. Depending on the criticality of an artifact, different actions may be taken or certain actions may be restricted or banned, for instance when deprecating an artifact.

8.- DME Reputation level

We are currently working on classifying every DME with a reputation or maturity level indicator . This indicator will assess the quality and maturity of their data production practices. Factors such as uptime, service level, error rates, incident attention percentage, relevance, audience, etc are taken into account to generate this metric for each DME. Opportunities, learning path requirements, and penalties are derived from this classification.

9.- Data Mesh Customer Council

Periodically, we discuss important governance topics, prioritize new features and also listen to our heavy and mature data producers, in order to create new features or capabilities that meet their needs. Our aim is to provide the best customer experience in terms of data product production.

But… All is not yet rosy either. Decentralization brings other complexities that must be addressed and managed. Our current focus is on how to prevent and identify redundancy or data product duplicity. In order to ensure efficiency, we are developing new machine learning models that help us understand the distance between two data products and calculate a uniqueness score. This allows us to maintain an efficient and redundancy-free ecosystem. Additionally, we are deeply committed to data quality so many new features to come are going to help us become stronger in this field.

This journey has been amazing so far. Many backlogs of our demanding business units have decreased to 0. They have become owners, autonomous and data self-sufficient and we, as a team, have transformed ourselves into data enablers and data production accelerators. Empowering business users is essential for pioneering companies and market leaders, and Mercado Libre certainly embodies this principle!

So… Stay tuned — this is just the beginning…

References:

Dehghani, Z. (2019). How to move beyond a monolithic data lake to a distributed data mesh.

MartinFowler.com. https://martinfowler.com/articles/data-monolith-to-mesh.html

Data Mesh @ Meli: Empowering data owners

Written by Vanina Bertello