Taking the leap: from Data Scientist to Machine Learning Engineer

Learn the difference between these two roles and their functions within the Data area

etermax tech
etermax technology
Published in
8 min readApr 12, 2022

--

By Mailen Gómez Mayol and Ezequiel Panzarasa, Machine Learning Engineers at etermax

Data Scientist vs. Machine Learning Engineer

Intro:
When comparing the two roles it is difficult to generalize. It depends on the size of the company, the time they have been doing Machine Learning (ML) (culture), the category, and even the teams they are in.
They both have some skills in common and others that are very different. These skills complement each other and come into play at different points in the data lifecycle.

Disclaimer:
Both are very broad terms. The most important factor is probably the size of the company and/or the team. It is very common in startups (or in small teams) that both roles fall on the same person. On the contrary, in large organizations, these roles may be so specialized that there is almost no overlap in terms of tasks, and even in terms of skills.
It also depends on the business of the company: Is Machine Learning at the core of it’s business or is it more of an “extra” optimizer? The more specialized the organization is in ML, the higher technical requirements it will have, so the greater the distinction between these two roles.
Finally, there is a cultural factor. The longer machine learning models (or flows) are developed and deployed, the more optimized the workflow will be and the more defined these roles will be. It is common that in organizations that are just beginning to adopt these practices, or that are just beginning to develop their own data teams, the roles are not separated. As the true needs of the business develop and are discovered, teams tend to specialize and roles begin to diverge.

The bottom line is that the functions of these two roles are going to be more defined by the organization than by the title of the position.

The roles and the data life cycle: The data exploitation cycle begins with Data Engineering. A Data Engineer is responsible for automating the process of data ingestion, validation, transformation and storage. In this article we are not going to delve into the tasks or skills of this role. What we need to know is that the Data Engineer makes sure that all the information we need is available in the correct tables (if the data is non-tabular, it will be in another format).

From here, we enter the field of Data Science. The Data Scientist is the one in charge of extracting value from data. Broadly speaking, the tasks of this role include:

● Acquiring specific knowledge of the business

● Data cleaning and exploration

● Generating insights from such exploration

● Suggesting actionable from insights

In general, actionables are usually tied to the development of some ML model, but I don’t want to reduce the scope of the role to this. For example, if a data scientist is analyzing feedback data from a workplace survey and finds a relationship between chair comfort and productivity, the likely actionable thing to do is swap out existing chairs for more comfortable ones. It wouldn’t be worth making an ML model that predicts how much more productive an employee is, based on which chair they buy.

Having said this, it is true that 90% of actionables will become ML models based on the insights obtained. It is important to highlight the specific knowledge of the business that the Data Scientist needs to generate these insights, since it is one of the main differences with the ML Engineer.

For their part, an ML Engineer takes over once the work of the Data Scientist is finished. While an ML Engineer is involved in the development of the ML model, they are more focused on the aspects of how that model is going to be implemented in a productive environment. How does this model scale? How is it going to be implemented? How is it going to train? How often? These are some of the questions that an ML Engineer considers in the model development stage.

Looking for an analogy, it would be fair to say that the ML Engineer’s job is to get the model out of the lab and into the real world. This implies a greater need for knowledge related to infrastructure and coding skills more oriented towards software engineering than data cleaning and transformation. The ML Engineer adds value to the business by bringing the models that the Data Scientist creates to production. Therefore, their tasks are more related to:

● Automating processes (data query, model training)

● Infrastructure design to support the model

● Model Monitoring

● System maintenance

The ML Engineer designs and maintains an ML system, which has an ML model as its central axis.

Lately, process automation, monitoring and quality control have taken more preponderance at the hands of a paradigm that has been gaining popularity called MLOps (something similar to what happened with DevOps in the past).

That’s the theory. But does this work in etermax?

At etermax there is a Data Science team and a Machine Learning team, and the different profiles are assigned to different projects depending on the type of skill required. The profiles of the Data Science team are focused on the development of new models, while the profiles of the Machine Learning team focus on the production of models based on machine learning.

The Data Science team at etermax is focused on finding opportunities to optimize products and processes by exploiting large data sets. The tools used by a Data Scientist are very varied, from simulations, statistical techniques, machine learning models, deep learning, among others. The objective of the area is to find insights in data, propose actions and measure the results. Some of the team’s projects include applying natural language processing, topic analysis, LTV calculation.

On the other hand, the Machine Learning team at etermax is involved in the end-to-end development of models based on machine learning. The Machine Learning team works together with the AdTech team in order to optimize real-time procedures such as auctions between potential buyers of advertising space, ad delivery and user acquisition processes. The MLEs are responsible for developing the models, integrating them into production environments, and monitoring them. Unlike the problems addressed by the Data Science team, the processes to be optimized in AdTech are limited by time, so the models must be highly efficient.

If you are interested in seeing an example of the problems that the ML team solves at etermax, you can take a look at this article!

Also, if you’re interested in seeing a problem solved by the DS team, you can read this article.

Now: Why do we punctually go from DS to ML?

Mailen Gómez Mayol:
I started my career as a scientist, and soon developed an interest in machine learning models. My first development was a deep learning model to improve images. After months of thinking about the model, the variables, cleaning data and training it, it was time for the model to be used. To do this, I needed to leave the Jupyter notebook I used for developments, write a script that did only what was necessary, put together a series of instructions to be able to train the model and make the predictions. This process was one of the most interesting things about the project. At that moment I discovered that I was no longer going to write code just for myself and it was necessary for other people to be able to understand it. I started to learn about good code practices and software engineering. At that moment I knew that what I like most about data science is putting the models (or rather, their results) at the user’s fingertips. At etermax , as a Machine Learning Engineer, I am part of the Machine Learning team and I can do both things: develop models and also put them into production. My tasks include developing, productizing and monitoring high performance models to optimize processes within the AdTech team.

Ezequiel Panzarasa:
Personally, the decision to become a Machine Learning Engineer was linked to my interest in software engineering. Throughout my (still short) career, I’ve been lucky enough to be able to move through different roles and make a relatively informed decision about where I wanted to orient my future.

I took my first steps in the field in a start-up that provided data consulting services. I was the 8th employee, so the roles were not defined at all. The team functioned on a combination of a willingness to take on new tasks and a fake-it-till-you-make-it attitude. I started doing Data Engineering tasks, automating and modernizing the data pipelines of a bank. When this same bank needed to mine the data it now had available, my opportunity to take on Data Science assignments arose. So I did my first analysis and developed my first model. Then, it was time to bring the model to production and then maintain it, and with this, my first Machine Learning Engineering tasks came up. Of course, none of this was done following best practices, nor using the latest technologies (these were things that took me the longest to learn). But the system worked! That was enough to make me a small reputation within the team, and I became part of all the projects that involved DS or ML. That’s how I got more into that world, learning new things with each project, but without completely abandoning the Data Engineering tasks: help was always needed with a pipeline.

After a while I went to work in another start-up that wanted to develop a product based on Artificial Intelligence. I was very surprised by the difference in work approach between a service-oriented company and a product-oriented company. In this new role, I worked mainly as a Data Scientist. I learned a lot about the business and the specific industry. The project was ambitious and challenging, but the Software Engineering part was missing. I didn’t just want to get insights and develop models. I wanted to develop complete ML systems. Not only was I interested in a model having good predictions, but I was also interested in things like where that model was going to run, what the data flow was going to be like, how often it was going to be trained. I missed the engineering part.

That’s how I got to etermax. I learned about it through a job advertisement: “ML Engineer wanted”. Reading the job description, I knew it was what I was looking for. The ML engineer role falls at the intersection between Data Science and Software Engineering. Without being a specialist in any of these disciplines, it is necessary to master both. I knew nothing about the AdTech industry, and had never worked on anything as big as etermax. Without a doubt, I had (and still have!) a lot to learn. I remember feeling a bit of the impostor syndrome in my early days: I didn’t know many of the tools they worked with, and the infrastructure was too big to understand all of its components in detail in a few weeks. But little by little I was taking on different tasks and learning about the technologies involved. With time, I was repairing things when they were broken, or introducing small features. I was putting together models and taking them to production, creating data pipelines. Every task helped me grow until I reached a level of ownership that today allows me to be the point of reference in terms of the platform that my team develops and maintains in etermax .

--

--