Optimizing for Savings: Cleaning Your [Ware]house

Published in

Mercado Libre Tech

9 min readMar 22, 2024

Have you noticed that the more time you spend at home, the more disorder seems to accumulate? As the day goes by, the kitchen starts to fill up with things to clean: the breakfast cup, the juice glass that you drank from, the lunch dishes you said “I’ll clean later.” And before you know it, the sink is full of dirty items. When you want to use the kitchen, you’ll first have to take a good amount of time to clean everything and put each thing in its place. It becomes necessary, then, to maintain a minimum order during the day, so that when the night comes, you can spend that time doing things you enjoy, like pursuing hobbies, engaging in self-care, or simply relaxing.

Well, a data warehouse is not much different. Data models are dynamic creatures that change and transform with greater or lesser frequency, depending on the model. This presents a challenge for data teams that need to keep the data model consistent, coherent, and at the same time, flexible enough to meet the needs of both business and users. And, just like in your house, it is necessary to keep it clean and clutter-free; otherwise, disorder gets out of control, and then it will cost twice as much to put things back in their place.

This is the reason why in the Commerce Benefits Team here at Mercado Libre, we put on our gloves and set out to clean our warehouse. We believed, and confirmed, that this refining process would lead us not only to an efficient model but also to cost reduction. Just as a dirty house costs time, effort and money to clean if not maintained, similarly, a disorganized, decentralized and poorly maintained data warehouse is costly. We started by applying this methodology to our Coupons model and then extended it to the entire Benefits model. We worked on a total of 110 jobs and 114 tables over a period of 3 months.

While striving to achieve this, Data Mesh arose — a novel paradigm of Infrastructure and Culture developed by Mercado Libre’s IT team. This paradigm empowers each Data Team with a Domain, a self-managed environment. This helped us complete a vision where not only would the data warehouse be efficient and optimal, but also where we would gain total ownership over our data products. You can learn more about this new paradigm by reading Ignacio Weinberg’s article Data Mesh @ MELI: Building Highways for Thousands of Data Producers. Consequently, our task expanded beyond cleaning and streamlining the model to include migrating our tables and jobs to this new environment. To reach both goals, our strategy was to bring into our domain only those elements that had undergone this cleaning and optimization process. Now, let’s take a deep dive into it.

Measuring the disorder

Before starting to organize our [ware]house, we need to figure out where to begin. The kitchen, the bathroom, the bedroom? Which one is dirtier? Which one is the most frequently used, whose disorder, therefore, affects the most users? Also, and not less important, which one costs us the most?

To quantify this complexity, we analyzed the metrics of each component. This involved gathering detailed information for each job within our model, including execution times, error counts, the number of alerts generated, service level compliance, the number of tables each job updated, and how many users each one served. We further supplemented this with a cost analysis, assessing the dollar cost of each execution, as well as dependencies on other processes and their criticality level. This extensive information was primarily sourced from the Observability and Governance team at Mercado Libre, as well as from the metadata provided in each executed process.

By gathering all this information, we were able to identify three main problems:

unused or outdated information
redundant data
processes with low performance

Let’s see how we solved them in order to have our [ware]house clean and tidy.

Taking out the trash

The unused elements were the easiest and quickest problem to solve. It was the first issue to be tackled, and its effect was almost immediate, constituting what we call a ‘quick win’. We deprecated all tables that were no longer being queried, or whose consumption level over a considerable period of time did not justify their existence. Their information had lost relevance, or users had migrated to other sources, rendering them obsolete. These idle elements were within our model, taking up space, and incurring costs.

Another problem to address was outdated information. Due to sources that became obsolete without being removed or processes that no longer generated updates, the tasks executed to fill them no longer brought new data. As a result, we had jobs running in vain — tasks that took up space in the infrastructure, consumed resources, and, in some cases, triggered unnecessary alerts.

By eliminating or stopping the execution of these idle components, we freed up resources, achieved savings in storage and processing costs, and brought more clarity and efficiency to our model.

This action not only simplified our model but also had repercussions on other teams, where users who were unaware they were consuming outdated sources became informed. This triggered a chain of deprecations and/or migrations to the correct and updated sources, improving the data quality throughout the system.

Efficiency through Centralization

Would you put socks in the silverware drawer? The answer is no. That would confuse someone opening the drawer looking for a fork. You wouldn’t use socks to eat soup, either. Further, someone looking for socks wouldn’t know they’re there.This is why everything has its place in a tidy house, with a distinctive name and place and a specific purpose.

Attempting to get everything in order in our [ware]house, we found:

Different processes querying the same source and bringing the same information, thus creating redundancy.
Same metrics with different calculations and residing in different tables, causing inconsistencies and confusion for users.
Tables with the same information but with different levels of granularity.
Tables with similar names, making it difficult to discern the difference between them or, conversely, with names that have no reference to the data they contain.

To address these inconsistencies, a more analytical approach was required. It was necessary to review each source, each logic, determine the information intended for the user, and collaborate with other teams. We succeeded in consolidating the information into central and unique tables, serving as bases for the model.

What did we gain? Once again, we saved resources by streamlining our processes. Instead of many processes performing the same tasks, we consolidated into a few, each handling the necessary tasks. Further, instead of duplicated information in different tables, we maintained only one with the necessary granularity and the amount of information effectively used.

Additionally, we enhanced confidence in our data and reduced the model’s size, making maintenance easier. We decreased questions from confused users to the data team and provided greater clarity and understanding by eliminating unnecessary complexity. This way we managed to achieve a higher data quality.

Centralization allowed us to transition from a crowded scheme, as seen in the image at the top, to a much cleaner and simpler one as seen in the bottom image.

Efficiency through Optimization

One of the challenges that leads to high costs is the inefficient performance of certain processes. These procedures often take longer than necessary, making it harder for users to access the data they need. Additionally, these processes are unnecessarily complex, which further adds to the expenses. In addition, they also tend to have a higher rate of errors or failures, which worsens the problem.

To address this pain point, we had to adopt a different perspective, focusing on how data flows. Execution maps were highly useful for this purpose, aiding in the analysis of lengthy processes with multiple steps and pinpointing where bottlenecks were occurring.

Diagramming the flow has also facilitated the identification of common points and elements that could be reused within it. Another aspect to review has been the use of very voluminous temporary tables; performance was improved by using persistent tables and leveraging indexes or partitions and clusters, depending on the database. Applying best practices consistently has been another crucial point for gaining efficiency; after all, that’s why they are called “Best practices”. All these improvements inevitably led to an undeniable and measurable optimization. In fact, by resolving this issue, we achieved our greatest savings in resources, with 74% of this savings attributed to process optimization.

With process optimization, we attained the highest percentage of savings.

Cross-cutting Aspects

There are two crosscutting aspects to the three solutions presented. One is documentation, where we must detail the analyses performed and the changes made to the model. This will help us share and transmit the acquired knowledge to other team members within and outside the team. It will save time in future implementations and facilitate maintenance.

Communication with the user is another aspect that cuts across all solutions. It must be concise and timely. If we need an action to be taken, such as migrating a source or ceasing to consume a table, we must be specific about the action that needs to be carried out and provide a clear deadline for compliance.

Additionally, it is advisable to direct communication only to the users affected, avoiding general and mass communications that may result in loss of interest and ineffectiveness. Bombarding users with irrelevant messages can lead to a decrease in engagement.

There are two cross-cutting aspects to the whole process of cleansing: effective communication to the user and documentation of everything analyzed and modified.

In conclusion, was all this work worth it? What benefits did the cost and effort of carrying it out bring? We started this process with 110 jobs and 114 tables, and we ended up with 70 jobs and 73 tables. In addition, our error rate decreased by 12% and the execution costs of our jobs in BigQuery by 29% monthly. We’ve improved our service level, which is now above 99.5% and have achieved 85% migration of our model to Data Mesh with complete optimization.

So the answer is yes, it’s been worth it. As you can see, cleaning your data warehouse is not just about maintaining a clutter-free environment. It is a strategic move that can result in significant cost savings for your organization. Just like when you clean your house and see it finally tidy, there’s a sense of satisfaction and a kind of mental peace; similarly, cleaning up the warehouse has had a positive impact on our day-to-day work.

We’ve gained time by knowing which source to consume; we no longer have to navigate the model, searching for the information we need. Now we know what information it contains and where to find it. We’ve gained confidence and quality in our data, leading to better insights that enhance decision-making. This, in turn, reduces missed opportunities and losses generated by inaccurate information.

And finally, and no less important, we’ve saved a considerable amount of money in the process. By eliminating sources of inefficiency, we have freed up resources that have become available for relevant processes. Monetizing this saving has made us realize the significant impact of a cleaning job. Now, we can enjoy our clean [ware]house and use our time to do fun things.