#3 — This is why you are not iterating your ML project fast enough

Stop delaying that deploy, take risks to base decisions on production metrics

Nicolas Rodriguez Presta

Published in

Mercado Libre Tech

4 min readJun 10, 2021

This is story #3 of the series Flight checks for any (big) machine learning project.

Ok, at this point i suppose you already have clear KPIs and the right team.

The next typical mistake in machine learning projects that we should avoid is taking longer than necessary to get something in production.

And having something in production and having a machine learning model in production are two different things.

Not all in a machine learning project its about machine learning — Definitely not all is machine learning in a machine learning project

Let’s split this into 2 stages:

To have something in production: this stage consists in solving the problem of integration with the system — at this point the model is only a mock, with some random output. This stage can be more or less complex and includes several things. If we are making a recommendation system, it includes integrating the frontend with the backend, ensuring that it scales, transforming the design of the KPIs into a dashboard, assembling all the necessary machinery so that AI can make its appearance in an already assembled scenario.
To have the first ML model in production: This means swapping the mock for a piece that (hopefully) is better than just a random one. The random part may sound funny, but actually A/B against random is actually a good idea depending on the problem. So for stage 1, you need a sharp engineering and backend team. Meanwhile, the data scientist (DS) can develop the first model.

For part 2, my advice is to try the simplest solution first. And when I say the simplest, I mean it. This includes:

Use few variables, few data sources and the easiest to integrate into production — having many variables and complex input data sources is usually a bad call.
Do not waste time building a custom model; spending time in this part of the project tuning hyperparameters of a DNN is usually a bad call too. Take advantage of an autoML service or a very simple model.
If you have billions of records available, sample and use a few that are workable on an instance or even a local notebook. Having to raise a spark to start making the first model tends to be a wrong choice.
If you come up with great ideas to iterate over in the future, write them down for future execution.
A sharp witted DS can generate a baseline (really simple) model in less than a week.

Coming up quickly with a model helps in a number of ways:

To gain information on whether the whole pipeline — as well as its full integration — is running smoothly.
To understand if the implementation of KPIs measurement is correct.
To have a base on which to stand and measure the marginal gain of the next iteration. This is important. The model can always be improved and the DS knows it; the right question is whether the marginal gain is enough to justify another iteration.
Consider that each new data source that is added has a cost to incorporate and maintain. Each new feature in the model is like a child you take responsibility for. For a child, you will provide food, housing, education, etc. For a feature, you will ensure correctness over time, care about the integration, dependency, etc.
Consider that sometimes making it more complex by adding more data to the model increases the cost of maintaining and retraining it, in addition to the general cost of infrastructure.
To see quick results: happy sponsors and motivated teams.
To reduce Time-To-Market: Sometimes going out quickly with something ‘not too good’ is better than going out later with something ‘good’. You have to see the real profit as the integral of the profit curve at each point in time, and not just as the specific final value of the function at the final delivery date.

The real final profit is the integral of the profit curve at each point in time

This means that once we have a first iteration, all the great improvement ideas are never going to be executed, right? Not at all!

It only means that we will be able to resort to more information in order to decide on the next best step. And iterate in that direction.

Without a quick baseline and grounded on assumptions and insights, we can’t do a marginal analysis of improvement and what’s worse, we run the risk of having costs and time skyrocketing in no time.

In short, the sooner we can base our decisions on concrete values and less on constructs of hypothetical deductive reasoning, the better.

#3 — This is why you are not iterating your ML project fast enough

Stop delaying that deploy, take risks to base decisions on production metrics

Written by Nicolas Rodriguez Presta