Why and when to build a Machine Learning Platform (part 2)

Enter the solution…

Mercado Libre Tech

Published in

Mercado Libre Tech

9 min readMar 15, 2021

You will find Part 1 of this story here.

Whenever we speak of FDA at MercadoLibre we need you to picture a framework, created in house, to support data mining, development and deployment of Machine Learning models for currently more than 500 users, including analysts, data scientists, machine learning and data engineers.

In this acronym, the “F” stands for Fury, a PaaS (Platform-as-a-Service) tool for building, deploying, monitoring, and managing services in a cloud-agnostic way, also brewed at MELI. Needless to say, it is where Fury Data Apps a.k.a. FDA, is embedded in.

Thus, FDA enables creating end-to-end ML apps in a scalable, secure and efficient way, delivering high value to our users. Because of this, every time we consider adding a new service to our ML platform we stick to our original objectives, agreed to in that original interdisciplinary council of developers, cloud architects and data scientists (mentioned in Part 1), still current:

Democratize Machine Learning at MercadoLibre: Flatten the curve for non experts, opening the use of tools for everyone.
Promote best practices, create synergy across teams and prevent silos.
Share knowledge: support our own library!
Support data driven solutions at scale.
Deliver fast access for everyone, earn agility and create impact.

These drivers define and leverage every decision made when designing and building our platform.

Also, one of the first principles we follow is to see ML as a process in which we develop, build, deploy, operate and monitor processes, chained to a pipeline made of steps that need to be met to deliver quality and production readiness.

In order to create this platform, our first task was a hard one: designing a cycle flexible enough to cover most of ML projects’ use cases, for both developers and data scientists. The main idea of this pipeline was to provide a solution to each step during the development of our users’ projects. In order to standardize these solutions in the company, we set up some restrictions that would set the framework for work, trying to avoid becoming too restrictive to cut off any yet unknown scenarios.

In an early approach, we agreed to this high-level abstraction of the ML process as a result of the vision across the teams which applied Machine Learning in MercadoLibre. As we see it, ML is a practice that iterates among these steps, where each step requires specific infrastructure to be executed.

Here’s our blueprint for the ML pipeline:

According to our perspective, every ML project begins in the Experimentation step. During this stage, the developer/data scientist (our “user”) gathers its datasets from multiple datasources (databases, APIs, files), explores them, gets an idea whether to use one or another model and eventually develops some initial end-to-end solution.

Let’s explore each of these steps.

Experiment

Laboratories are safe personal workspaces, thought by data scientists for data scientists. Through Labs, we deliver a platform for data analysis and sampling with support for Python, with simple access to all available data sources and other analysis tools through an open source library repository, as is our own shared PyPi.

One of the coolest features about Labs is that if the code you created has found gold, it is very intuitive to get it ready and push it to production (the truth is, on Fury grounds we are always 3 clicks away from production anyway).

Ultimately, a Lab offers a Jupyter Notebook ready to start coding and access data using any common data science toolkits. You need to make few or no sacrifices to switch from working locally to start doing machine learning on a hosted FDA Lab but a whole bunch of benefits to take advantage of:

Access to data sources comes built in our library, together with other super cool utilities to read and write different file types.
Own GitHub repository created automatically in every new FDA, ready to trace your code in favor of reproducibility.
Unlimited computing and memory.
Access to examples, cookie cutters, and other amazing stuff all teams provide!
No need to know about or handle cloud infrastructure, whatever degree of complexity that means at any stage.

Labs are safe because they are personal. When you create a Lab, behind the scenes there’s infrastructure building right out of the box. Of course, to be able to do so, you’ll be using a personal token so everything you do there will be of your authoring. Labs will be up during a preset time frame unless you need to provide a longer lifespan, making it reliable in case you forget to shut it down.

And in keeping things simple, we have come up with a handful preset flavors of {processing, RAM, storage} capacities that will help you decide among many combinations of infrastructure that otherwise might be overwhelming.

Just for the record, in this palette, we also offer GPU! GPU processing has started to make sense in MercadoLibre due to the enormous volume and variety of data eager to be processed, where images aren’t the exception. So as you can see, we do take agility and democratization seriously.

ETL

ETL (Extract Transform Load) is the first step in a machine learning pipeline. The term itself represents the three data engineering pillars required to obtain, prepare and present data to a model.

In the FDA environment, we call ETL to the step that outputs an artifact (typically, a dataset). In short, each ETL is built from a piece of code (from GitHub), together with a user-defined identifier we call version. This is what gives you reproducibility, allowing you to trace back a result set to its original execution and find its source code.This will become especially important in the next step, Train.

Train

Model generation and monitoring is, perhaps, where data science and software development come closest. It is as important to trace the ETL process and its outputs as is critical to save trained models and register their lineage. Later, through analyzing model performance metrics you will be able to monitor, capture degradation and re-train as needed.

Similarly to the prior process when creating a new model you need to provide an identifier/version and code location. But also — and here’s the end game — you have to specify your ETL version attached to this model. By doing this, the output of the ETL step (the dataset, of course!) will be available from the Training scope, without having to manually move data or deal with authorization schemas to achieve it!

So, the Training step focuses on associating an ETL version to a model. And this is really interesting because you do want to try out different models on the same dataset so using this step allows you to create many training pipeline runs, associated to one same ETL version in the search for the best fit. The outcome of this process is a trained model which can be saved and stored to be served in the following step.

Predict

Online serving

Deploying a model with online scoring capabilities at MercadoLibre is possible through a web service, accessible securely from any workstation with dedicated metering, auditing and autoscaling, as any other Fury-based app.

The magic behind this is a service named “Osobuco”, a zero-configuration toolkit which offers a minimal boilerplate to share, experiment, deploy and serve trained-models. Again, and back to the pipeline, this step has no specific output but enables your model to be served in a scope: in Fury terms, this means you get your /predict endpoint up and running, delivering every model as an API.This, we call “MPI” as in “Model Programming Interface”: a REST API that communicates and grants access to the inferences produced by your model. To create an MPI identifier, you only need to locate your code from GitHub and select a preexistent Training version (which, remember, is linked to an ETL version!).

Batch predictions

As opposed to serving individual predictions through an MPI, we found there were many use cases that required offline batch scoring. Picture the simplest: given a preprocessed dataset (an ETL output!), you need to append a prediction on every row and leave the resulting scored data somewhere expected, for instance, waiting to be fed to a dashboard.

To attend these, we provided a special module that allows you to specify which ETL version to get the input from and which Training version (model) to apply to it. The output is similar to that of an ETL, only this is related to both a Training and an ETL version, again aiming at reproducibility.

Automate

To crown the main ETL > Training > Predict schema, we’ve added this complementary feature that has made our scientists’ days even brighter.

Automation: not much to say that isn’t inferred from the name. To us, being able to trigger ETL and/or Training executions on a time schedule — either to deliver a report, retrain a model or update data in any possible way — were enough excuses to make it happen. So with automation, you will be setting up a relation between steps and adding a timerule expression that will leave each run in the form of an “execution” with a timestamp.

Monitor

To begin with, we’ll say the everyday dev-ops related microservices monitoring is solved by Fury (PaaS). Fury provides mature tools to ensure uptime, service health and observability at a web-app level. But as one of the main areas of an MLOps infrastructure, specific machine-learning models monitoring demands a completely new set of features and tools.

Therefore, we have a dedicated team-building cross-capabilities in this scope, tightly integrated with FDA. The current services include:

Unobtrusive scalable collection of models’ inputs and outputs, for both online and offline (batch predictions) serving.
Interfaces for the collection of targets or ground-truth data, correlated to model outputs, to track the model’s performance and business impact.
Interfaces to persist general metadata or assets related to modeling or training assumptions and properties, to be used later during the monitoring process.
A custom low-level package to implement checks or tests on the data (aka Monitors), allowing the integration or use of other existing tools and libraries.
A service to periodically run the Monitors with a configurable alert system (emails or real-time alarms to on-duty teams)
Programmatic access to the collected data for analysis, experimentation and development.

The roadmap of this framework includes adding higher-level ML monitoring capabilities such as automatic detection of data and concept drift, model stagnation, outliers and so on.

Current challenges

*+500 active users representing +10% penetration over IT*

With some time on the “slope of enlightenment” (as Gartner describes it in its Hype Cycle) and as we enter the “plateau of productivity”, we now face the challenge of finding the right balance between walking the natural path of maturity many benchmarks suggest we should explore, and assessing our own users’ requests, maybe driven away from market standards but undoubtedly sponsored by need.

Right now our trade off implies attending to both so while studying how to integrate a scalable machine learning pipeline feature into our platform, our team raises the bar merging software release process standards applicable to machine learning projects. All of this takes place while giving support to +600 data scientists working on +500 experiments and with +30 productive services — on the move.

So with more challenges ahead, we embrace change relentlessly.

We invite you to follow us in our quest and keep it up for data science at scale!