What Do Teams Really Want From An MLOps Engineer?

Published in

Ubuntu AI

8 min readMar 5, 2024

Like any relatively new area of specialism within technology, MLOps Engineering has been, and will continually be, an evolving and adapting position, particularly with the speed ML and AI is currently moving.

It’s an exciting place to be, but one with a lot of gray area in what teams want from an MLOps Engineer, making it a tricky place to both find a role, and grow a team in. Like its sibling DevOps, it’s important to remember that there is no one definition of an MLOps Engineer, one teams idea of the role, and skills needed to perform the role, can be very different to another, but they may share commonalities in tasks performed and technologies needed to perform them.

So how do we provide a useful answer to what teams want from an MLOps Engineer? To start, let’s do what data scientists and engineers do well, dig into some live data and see what teams are hiring for in the real world!

What data are we using?

We’ve (painstakingly) manually reviewed 310 live MLOps positions, advertised across various platforms in Q4 this year, charting common, and not so common technical requirements, grouping into key areas you may expect to find in an MLOps position, whilst also looking for outlier or unusual skill sets.

These positions are real, currently or very recently live, from teams hiring across Europe and the United States. The positions are industry agnostic, and diverse in terms of company size and scale. They weren’t chosen specifically for this article, but collated originally from a recruitment team looking to track the breadth of the MLOps market.

How did we group this?

We created eight groups based on core technologies used in this space, and an ‘outliers’ group:

Containers: Docker and Kubernetes
AWS
Azure
GCP
Python
Workflow: Kubeflow, MLflow, Sagemaker
Frameworks: Pytorch, Tensorflow
Observability & Monitoring
Outliers: wider infrastructure, data, ML or specialist tools

What questions are we asking of the data?

We approached the data with a very open mind, looking to build a picture of what teams currently want from an MLOps Engineer, looking for any patterns or common requirements that may be useful for those seeking a role or those hiring in this space. Is there consensus on what an MLOps Engineer looks like technically? Are there outlier skills and technologies that we should keep in mind?

What did we find?

Point one: Hiring teams want you to have experience of A LOT of tools and technologies! You likely knew this, but the breadth is astonishing, drawing from the depths of DevOps and Infrastructure Engineering, getting deep into container and cloud tooling, to being a killer Python Dev, all the way through to utilizing not only your broader data tooling knowledge but some highly specialized and often bleeding edge machine learning technologies. In short, teams want you to be a one person army, an engineering superhero, equipped with a seemingly unending bread of skills and knowledge!

But don’t fear, many of the technical skills highlighted across the 310 role descriptions weren’t strict requirements, and we will come on to these when we discuss the wider outlier technologies. There were however three core skills, that a large percentage of roles required, and that form the base of what teams currently want from an MLOps Engineer.

The Big Three

Containers

Coming in at the top of the requirements list, and highlighted in 40% of the total group, was knowledge of container tooling, specifically Docker and Kubernetes. Traditionally, this has been the domain of DevOps, Reliability and Platform Engineers, but it has become a fundamental part of MLOps Engineering. A lack of knowledge in this area puts you at a significant disadvantage to those that do and is likely to be a key development area for those moving into MLOps from ML or Data Engineering backgrounds, who may not have had the opportunity to work on container tooling in production.

Level of knowledge was varied, from simply having a baseline understanding of container tooling, through to hands-on experience containerizing complex ML models in production. As we know, Kubernetes can also be more or less complex depending on set up, whether through a cloud managed service, or getting really hard core and spinning up your own Kubernetes clusters in bare metal environments. This is team dependent, but the key takeaway is that a large proportion of teams currently hiring MLOps Engineers require some form of container knowledge.

Python

Hot on the heels on container knowledge was Python, with 37% of the total group highlighting as a core skill. This makes total sense, and is already likely to be a core skill for most moving to MLOps from Data backgrounds, also having the benefit of being linked to wider Python based data tools and technologies.

Whilst container tooling pipped Python in the ‘most mentioned’ stakes, teams were looking for stronger Python knowledge than any other technology we saw. Whilst many were comfortable with exposure to containers and other technologies we will discuss, strong Python engineering came out as a foundational skill set for a large percentage of teams.

Where Python wasn’t required, in almost every case, software engineering knowledge was, and we saw a variety of languages: C++, Scala, Kotlin, Rust and Golang. If you are looking to make the move to an MLOps position, or looking for a new team, Python is likely your number one skill, or a willingness to move in this direction from another language, however if you are a die hard fan in any of the wider languages discussed, there are fewer, but still some, options for you.

Cloud

When combining AWS, Azure and GCP, cloud technologies were actually the most requested skill sets from the group, but we felt it more useful to break this down individually and give a picture of what cloud providers teams are using and what knowledge they are seeking.

Given its prevalence in general, it isn’t unexpected that AWS topped the list, with 36% of teams requesting some level of knowledge. Azure and GCP were way back, at 17% and 14% respectively. Like container tooling, teams were more flexible in level of knowledge than other technologies, with exposure to operating ML pipelines in the cloud often good enough alongside strengths in other areas.

A key finding was that in many instances, teams were cloud agnostic in their requirements, whilst usually discussing the cloud provider they used, not making it a hard requirement to come from that environment. This does also skew the data, and is important to note that the percentages above include descriptions mentioning two or more of the providers. In many cases, AWS was preferred, with Azure or GCP secondary, further reducing the real numbers of teams using the latter two as their main provider. The takeaway is that in the majority of cases, some cloud knowledge is required, but many teams will show flexibility on the environment you’ve worked within.

Wider tooling and technologies

As mentioned above, the descriptions showed a huge array of tooling. Frameworks were required in many instances, Pytorch and Tensorflow alone appearing in 17% of the descriptions. Kubeflow, MLflow and Sagemaker were requirements in 12% of descriptions. It is worth highlighting that whilst we were tracking these tools specifically, many others appeared, and the main takeaway is that whilst not necessarily an absolutely key requirement, given the very nature of the role, you are likely going to have exposure to some tooling in this area.

Observability and Monitoring was an interesting case. We were tracking mentions of these terms as well as technologies, however they appeared in only 4% of the descriptions. Whilst we’re sure teams are paying attention to data and model quality or data drift, it doesn’t appear many are seeking those with experience implementing tooling and processes around it.

From this point, the breadth of technologies mentioned gets really broad! From IaC, with Terraform topping the list, to experiment tracking using Hydra or Neptune, data processing and event streaming, team most commonly aiming for experience of Kafka and Spark, through to very specialist knowledge with tools like Nvidia Triton, TensorRT or Ray Serve.

On top of this, and harder to quantify were teams looking for experience within certain environments, particularly working with large GPU clusters, HPC, deep learning inference models, LLMs or Hardware accelerators. These tended to be very specific requirements for a very small number of positions, or bonus skills.

What did we miss?

Whilst we have purposely focused on technical requirements in attempting to give an answer to what teams really want from an MLOps Engineer, we can’t leave without mentioning a factor that may be even more important. Cultural fit.

Whilst reviewing the descriptions, we could have simply tracked cultural terms to give us an idea of who teams are seeking, not just what they are seeking. But cultural fit is far harder to quantify, particularly when using ‘buzzwords’. We tended to see cliché terms popping up again and again: collaborative, an ability to take ownership, an inquisitive nature.

To do this justice, we need far more detail about the individual teams, environments and cultures, and this is perhaps a brand new article for another time!

Conclusions

You’ve got to know a whole lot to be an MLOps Engineer! However, from our time spent reviewing hundreds of specifications and role descriptions, our view is that teams are in general appreciative of the fact that you can’t be an expert in every area we’ve discussed, there is a balance.

There are some fundamental skills that will stand you in good stead. Python development and associated data tooling. Container tooling, particularly if you’ve had the opportunity to containerize ml models in a production environment. And exposure to cloud environments, whichever provider that may be.

Outside of this, you will likely be exposed to myriad tools and technologies, and have different levels of knowledge across these. It is unlikely to be a defining factor in what teams really want in an MLOps Engineer.

It is also important to note that this will change, it is a snapshot of current needs, and as ML continues to accelerate through 2024, we will see changes in what MLOps looks like as we have with the likes of DevOps as it has matured over the years.

The review of this group, and our general feel for the market, indicates that what teams really want in an MLOps Engineer is a balance of tooling knowledge drawn from a range of technical areas: infrastructure, data and software engineering. The weighting of this depends on current teams needs and environment, and we hope that in providing this snapshot of what the market looks like currently, it may assist you in a job search, when hiring for your team, or simply helping to guide your next area of personal development.

What Do Teams Really Want From An MLOps Engineer?

Written by Tom Parker