Machine Learning Engineer: Making Your Dreams Come True

Published in

Life at Telkomsel

6 min readFeb 8, 2021

“With great power comes great responsibility” — Spiderman

Artificial Intelligence is proving as one of the most significant technological advancement across all industries in recent decades. It ranges from simple predictive modeling to more complex tasks, such as chatbot or self-driving car. It has proven to be useful not only in the field of research, but it has been applied to help companies achieving their goals, with everything from improving customer retention rates to driving enhanced insights from big data and even serve automated customer care agents.

With the global machine learning market anticipated to grow from $1.4B in 2017 to $8.8B by 2022 according to a recent report by Research and Markets, the industry demands for a stable artificial intelligence team have exploded. Whilst artificial intelligence itself is still rapidly growing, the roles inside an AI team also evolving. According to Gartner, there is one emerging role that is needed inside an AI team, which is Machine Learning Engineer.

So, what exactly a machine learning engineer does? How it differs from a data scientist? How it is positioned inside an AI team? Here I would share some of my experience working as a machine learning engineer in recent years.

What is Machine Learning Engineer?

The rising need for machine learning engineers comes from the burden of moving AI projects from the development environment to production. The AI model development, which includes data preprocessing, experiments, and building models has been done by data scientists. However, making it to production is not as trivial.

There are things to consider when productionize an AI pipeline, such as inference speed, coding framework, SLA, platform requirements, scalability, ensuring feedback loops, and many more. With more complex AI models that have been found in the research field, this role will surely be more challenging in the future. Through 2023, Gartner estimates that 50% of IT leaders will struggle to move their AI projects past proof of concept (POC) to a production level of maturity. In order to reduce that high failure rate, machine learning engineer comes into action.

“ML engineers need to ensure that AI platforms deliver against technical and business SLA requirements. ML engineers are expected to be the connecting fabric with data scientists from an IT perspective and ensure their ML models run well in production.”
- Arun Chandrasekaran, VP Analyst at Gartner

As Telkomsel evolving into a digital telco company, we have leveraged the use of big data and artificial intelligence. In such circumstances, more and more exciting projects and mind-blowing ideas arise and they are ready to be executed. In our team, IT Business Intelligence and Analytics Group, we witness the change, as machine learning nowadays not only being used to drive business decisions. The usage has come to a level in which it has a direct impact on the customers.

Realizing the challenges and opportunities, we come up with a conclusion that we need to have a robust and mature deployment strategy for every model we built. Delivering the result within the business SLA, having a 24/7 system uptime, and can make the most out of our big data ecosystem are just some requirements to build a strong AI foundation. On the other hand, we also face issues as the business requirements regarding our projects are constantly added or changed. So, having that kind of segregation between data scientists and machine learning engineers can be a huge benefit, not only in terms of reliability but also the overall team’s agility. The data scientists can concentrate on building their dream models, while machine learning engineers can make it real in their well-managed and carefully crafted production system.

A Glimpse of Life as a Machine Learning Engineer at Telkomsel

As a machine learning engineer, we work closely with data scientists. We act as a connector between their research activities and live production environment. We are given their best-performing model, then try to unleash its power so that the others can get the benefit of employing the model. Some people usually call the data scientists as heavy statistic and mathematical geeks, while machine learning engineers as its software development savvies.

Our task varies between projects and teams, but in general, we do the following activities.

1. Code Enhancement and Fortification

Data scientists love Python and Jupyter Notebook. They are very helpful since Python has a broad option of an open-source library that helps data scientists do their experiments. Hence, most of the time we are given jupyter notebook files as a research result.

We then translate some codes inside the notebook to a production-ready script. Just like software developers, we enforce clean-code, modularity, and efficient runtime. Often, we need to deal with concurrency, parallel processing, or distributed computing. Machine learning algorithms (especially deep learning) are well-known for their power-hungry nature, so we should treat them carefully. In more rare cases, we do need to change the programming language, translating Python into Java, Scala, Javascript, C++, or other languages. Those actually come from software engineering best-practices and need to be done to ensure our system’s speed, stability, and scalability.

2. Design Architecture and Tools Selection

When we talk about the production system, we need to ensure the end-to-end pipeline goes as smoothly as possible. We need to connect between data sources (given by data engineers), the machine learning model (given by data scientists), and the end-users. So, stable and ready-to-scale architecture is a must.

We should consider how the data is consumed from the data lake, where the consumed data will reside, how the machine learning model can interact with the data, and then delivering machine learning results to the end-user. It depends on the requirements, for example, whether we need to build a real-time scoring pipeline or a batch scoring pipeline, using deep learning or a traditional machine learning algorithm, etc. Thus, we need to be familiar with building and designing our architecture diagram.

Distributed machine learning engines like Apache Spark and workflow management platforms like Apache Airflow and Kubeflow are just a few of the many tools ML engineers employ to build data pipelines. We also use machine learning frameworks and libraries, such as scikit-learn, Kedro, Tensorflow, PyTorch, Keras, and others.

3. Integration with other systems

In delivering results to end-user, we need to do an integration with other systems. Commonly it is done by using API-based communication. However, in some cases, it is not impossible that the users request to do it by using database-to-database integration or even just dumping plain text to their servers. Again, highly depends on the requirements and capability of surrounding systems.

Before the integration phase, we need to discuss with the surrounding system about the API blueprint: how they will invoke requests to our service and how we should respond, also with the required transaction-per-second (TPS) or latency. We also work closely with the Quality Assurance (QA) engineers to test the API after development. Let’s not forget to also ensure the security of our API with the security team 😃.

TL;DR

As we’ve tried to discuss above, a machine learning engineer is someone who may lack the in-depth scientific skills of a data scientist but has other in-demand skills including programming, machine learning, and deep learning frameworks, MLOps architecture, and data engineering. Its role comes from the fact that machine learning models in production need to scale. One thing is to do experiments, train a model with a specific batch of data, and result in a highly accurate model. Another thing is to deploy a model that feeds on streaming data and can transform data by reading from multiple data sources at scale for millions of users within business SLA.