What we do to sell lechugas, a vision from the data team

Daily work & Challenges

Pedro Díaz
Mercadona Tech
6 min readNov 2, 2022

--

We are about to open a few data engineering positions, and we would like to know how much you know about what the data team does here at Mercadona Tech.

After asking some friends in the field, both inside and outside Mercadona, we quickly realized that we hadn’t said much about our work here. So here it is! This blog post is just an effort to open up our daily work and, in a way, get to know us.

Daily work

As we are a small team, we try not to have a strong division on whether some initiative, as we call the collaboration projects with other teams, is for the data scientist or the data engineers. We like that each of us can take what is needed. This approach brings some drawbacks, but we are pretty happy with the tradeoffs. We prefer widening our scope to specialize.

This is a short list of what we have been dealing with, so you can get a grasp of what type of initiatives we get:

  • Improve home page product recommendation. We learned a lot from this initiative. We worked closely with the PMs to define meaningful KPIs to guide our developments. We double-checked that the infrastructure was supporting the AB testing, Firebase plus Amplitude, was working as expected. We launched an AA test to verify that the metrics we picked were stable enough. After an algorithm fine-tuning, we achieved a conversion rate relative improvement of 20%. Also, as a countereffect, we discovered that extensive home page usage could imply that our clients do not visit other lovely sections on our web page! So much work to be done in this direction!
  • Automatic holiday assignment. As the business grows, we are opening more centers (a.k.a. beehives, colmenas) and managing more people (+1500 workers). Adjusting their holidays one year in advance becomes more and more complex. In this case, our colleagues in Mercadona already faced this problem for Mercadona stores, so we learned a lot from them. We adapted the problem to our use case and have an optimization algorithm that considers business needs and worker preferences to propose an optimal solution. The significant difficulty we faced here was softening certain constraints so that the optimization converges.
  • Diminish the number of manually routed orders. Each night a routing algorithm (ruin & recreate strategy for the most curious readers) builds the routes for the next day. The algorithm could leave outside a few orders with a high penalty in the optimization. Here, we managed to break down the routing problem into different delivery areas, and we were able to obtain better solutions. We were not doing rocket science; we just guaranteed that the solution search was more exhaustive, bringing optimal solutions for each one of the delivery areas. With this change, we could considerably reduce the amount of non-routed orders.

Challenges

Just by reading the previous section, you may wonder if we live in a fairytale, and everything that the data team touches becomes golden. Trust us; this is not the reality, so we would like to give you some context of our open challenges.

Demand forecasting

One typical question a data team confronts is, “can you predict demand”? We had our request too. This initiative turned out to be way more challenging than we would have ever anticipated. It took us a few months, and we had high levels of frustration at times during that period.

So what went wrong? Now that we can analyze it with perspective, it is relatively easy to do a post-mortem.

  • Not fully standardized metrics layers. The target and features used in the model suffered modifications while testing new models, deriving into retrying the same experiment more than once.
  • Biassed past data due to censorship. We realized that our historical values could be infra-estimated as we could have some days where we restricted the demand (if the day is already complete, the client cannot book for that day). Reckoning this issue led us to derive new metrics which alleviate this behavior by doping past historical values.
  • Non-controllable exogenous variables. We did not pick the best timing to do demand forecasting; COVID or fear of particular product rationalization did not help, as we observed patterns that were not present in the past data each month.

We have reached an equilibrium, and we can provide the business team with a monthly forecast that serves as a baseline and helps them draw the final prediction used to adjust next month’s availability.

These challenges happen, and even though they are frustrating, we are quite happy with the current results. We delivered an important tool to our colleagues on the business team that significantly decreased the time to draw the forecast.

Data Quality.

The joy of the crown of all challenges. We learned that this is an endless game. This is not something that we can deliver at a given time and forget about it. It requires constant maintenance. The silent murderer slows decision-making, and no one wants to talk about it.

We learnt that this is an endless game

In the past, we decided to tackle this problem as an initiative that the data team has to solve in standalone mode so that the rest of the teams do not have to worry about this. This path had two significant implications; the first one is that you need to scale up the data team with an army of data engineers. The second one, probably the scariest, is that we will generate knowledge silos in the organization. PM developing metrics without any peer review, engineering (backend and frontend) not having a clue of what happens outside of the operational DB, and data-team trying to hold up the ETL procedures for all teams. Moreover, we put everything under Agile and product mentality, where metrics are developed weekly, and deployments are daily. In that case, it becomes challenging to keep up with data quality.

Now we have chosen another path, hopefully, the path of light. Knowledge and best practices should flourish in each one of the teams. Thus, whenever there is a new requirement, we involve engineering and PM in equal parts so that everyone knows the implications. We constantly share context through Slack public channels and pair/mob programming seasons. We already had some lovely feedback where teams autonomously integrated changes in the transformation layer using tools like DBT. As a counterpart to this path, each team will decide which data quality status they are comfortable with. That means we may risk having teams where metrics are done with “paper and pen.” Let’s see :)

Tech. Stack

In our tool belt, you will find the basics of a modern data team: Python, Jupyter notebooks, Pandas, SQL (our flavors are PostgreSQL and Google’s BigQuery), Kubernetes, and Docker (we primarily run all in containers). Our dashboards are mostly built on Metabase and Grafana.

We are pretty aligned with what is called The Modern Data stack, and out of those recommended tools, we are investing heavily in the adoption of, for example, DBT for all the data transformations. Other pieces are still on the drawing board. So far, our biggest challenge with the data itself is the transformation part. We have quite a bit of legacy software for the transformations thus, we are starting our modernisation with the expansion of DBT. We would also lighten up and strengthen our EL procedures by means of CDC and orchestrators

This last part ignited this need to grow the team and start the search for some help on the Data Engineering side.

Did you like it? Check out our Data Engineer job opening in Valencia or Madrid. We would be more than happy to get to know you! Join us and be part of this fantastic journey!

--

--