Building MLOps at Roche: an interview with Christophe Chabbert & Oswaldo Gomez

Tom Parker
Ubuntu AI
Published in
21 min readJul 17, 2024

Building and growing MLOps platforms and teams is a continuing and evolving challenge for organisations looking to deploy ever more numerous and complex models into production. There are clear advantages in deciding to adopt MLOps within your organisation, but also pitfalls and lessons to be learnt.

I was lucky to spend time with Christophe Chabbert, AI & ML Product Line Manager, MLOps, and Oswaldo Gomez, Lead IT Expert MLOps Engineer, who have been instrumental in the growth of MLOps at Roche, and who were kind enough to discuss their journey, and share their thoughts during a recent interview.

Christophe and Oswaldo provide insights into:

● Their initial decision to build an MLOps platform

● The key points to consider when deploying models to production

● The biggest challenges they have faced

● Their recommendations for making models a success and serving machine learning at scale

● How they have managed a complex technical stack with ever-growing tooling and options.

Let’s get into the interview!

How did you come to the conclusion that you needed an MLOps platform in your organisation?

Christophe
So I think it actually started four or five years ago. And it started when we observed that there was a widespread challenge in the organisation. Indeed, we have a lot of scientist colleagues in drug discovery who want to use more machine learning and AI models in their daily work to make decisions.

Conversely, we also had data scientists who were building such ML and AI models and who were really keen on putting them in the hands of end users, the scientists that I just mentioned and who want to use models in their daily work. And what we observed repeatedly was that bridging that gap was extremely challenging, and a lot of the challenges were of an operational nature.

If they wanted to be successful, the data scientists who built models had to understand how to scale, how to make their workloads reliable, how to integrate their pipelines with the key data systems and sources, and how to integrate the output of the models in the right context so that their colleagues could use them. This was particularly important for end users who didn’t know how to call an API, for example.

All of these were very big hurdles for many, and a lot of scientists didn’t manage to overcome them. Ironically enough, the ones that did manage often ended up trapped in some sort of operational nightmare: the more models they put in production, the more work they had to maintain them and it was as if they were somehow punished for being successful.

This is where we realised, OK, we need to do something about that, we need to solve that problem. This is where it began — and this is also when the MLOps term got coined and it got defined a little bit more specifically over time. But for us, it really all started with the need to bridge that gap.

Oswaldo
Christophe was one of the visionaries because it wasn’t obvious to create an MLOps team so early in the innovation cycle. When I was not yet working in Roche, I was part of a company in the financial industry, and we only had a few teams in the entire organisation of 20,000 people that were interested in operationalizing AI. I was at a team that also saw the need like five years ago. Our idea was to have a 50/50 split between scientists and engineers, but at some point in this company we had only me as MLOps engineer and the rest were Data Scientists. Even when we had this idea of building an MLOps team, it became difficult to find talented engineers that had this blend of Cloud Computing skills and ML knowledge.

Around this time I was offered a job in Poland to come work for pRED MLOps where they had a full team of MLOps engineers, so I decided to move since I did not want to be alone in the journey of putting models in production.

So to me, I think that the fact that Christophe and others called the fact that we needed to have an MLOps team at a time when the term didn’t exist (we started calling it AI Ops) was visionary.

Christophe
I was not the only one, by the way, but indeed the term did not exist at that time.

Oswaldo
My point being it wasn’t obvious. But I think that once you have crossed this bridge of putting models in production, you start to see the need for having scalable, reproducible, and explainable models. As an example from a previous company, one of the legacy Python packages I inherited had 1500 unit tests, and they were all failing because it was written in Python 2 and was never changed after many years of serving in production. It took a gigantic effort to upgrade, and we were spending more time fixing the tests than bringing any value to our company. In the end, we lost our internal customers and had to completely abandon these models and all the work put into them. This type of situation really led us to look at a kind of DevOps for ML, and this is how it began, so we could start iterating faster and provide more value to our internal customers.

What are the key elements to consider to successfully put a model in production?

Christophe
You’re probably about to get a long list!

I’m gonna start with a less technical perspective and more of a product and value aspect. I would say that for me, the first key aspect is to be sure that the model you’re building and deploying is going to solve an actual problem that is worth solving.

So you really need to have a good understanding of what the problem is, especially for us in the context of drug discovery, which is a very complicated process with lots of multi disciplinary experts.

Therefore, the first thing that we really try to assess is whether we are solving a problem worth solving. To get a good feel about this, we’re asking: who is going to use that model? In which context? Is it going to have an impact on our value chain?

And the reason we ask this, is that it is still not trivial to put a model in production in a reliable and scalable fashion. So for me, the value aspect is the first thing to confirm, because if this is not there, you’re exposing yourself to a very high risk a few months down the road.

One other thing I do want to mention is the data, because without data you don’t go very far in ML and AI. It sounds obvious, but it’s still not always fully comprehended.

You want to make sure that you have a good understanding of the data landscape, especially if you are in a large organisation like ours, this data landscape can be very fragmented. Data can be hosted in different systems and applications, some of which were built using a different paradigm to what you’re seeing now with cloud platforms. So you have to understand, is the data ready? Can we start using it and so on? Can we start working with it, ingesting it etc?

I think those are the two key elements.

Oswaldo

Great point of view from Christophe. In my case, I’m much more focused on the technical aspects and interested in trying to find the limits and thinking of what could break.

It’s very refreshing to take a step back and consider whether we should even put this model in production. You need to be expert in the actual business use case, you can’t make this call if you aren’t. As an Engineer, you need to be aligned with someone like Christophe who has a unique blend of knowledge and skills to decide whether a model is worth it or not.

Once the decision to put the model in production is taken, this is where our MLOps team comes in. There are several things to consider, but one of the most important aspects is environment management. This sounds super obvious and trivial, but I can tell you that in large organisations, everybody does this differently.

I actually saw several posts and talks about this from Marvellous MLOps. It’s less popular to care about your environment management, but it is very important.

Another aspect is containerization, since everything nowadays is being run in Kubernetes, but it’s also non-trivial and must be done using secure base images and using multistage builds, security scanning, etc.

Then you come to automation, which is where CI/CD comes into play by bringing code quality, readability and reproducibility. Testing plays a key part to ensure that you follow best practices. You want to have clean code and must take a moment to make sure that nothing goes through a merge request without detailed peer reviews.

I will add one more, you want to ask yourself what is the load that this model will have in production? Will this model be called a thousand times per second, or will it be called once every month? With this knowledge, you can create an architecture around these specific needs.

Architectures look quite different if you have online inference or batch inference.

Christophe

One last one, Tom. Because we work in a large company it is also important to remember the people aspect, because the moment you put a model in production you need so many different skill sets.

And especially if you’re looking at drug discovery, where you need very specialised profiles at different stages to really assess whether the model is well integrated in the workflows, you need to find the key people in the organisation that make this happen. You have to overcome silos so that everyone can work together. You need people like Oswaldo, with expertise in engineering and operations, but you also need people who are very strong with data, but maybe have very specialised skills such as computational chemistry. So altogether you need a very broad spectrum of expertise and you really must bring all of that together — this is key in a large organisation.

What are the biggest challenges that you typically encounter when onboarding models onto your platform?

Oswaldo
I think it’s important to remember that data scientists aren’t necessarily software engineers, and all they want is to get their model to production.

It comes back to understanding if assumptions have been made that may not be true. Like, if you are told that you need to handle payloads of millions of molecules, then you start engineering based around this assumption. But if it turns out that it was only like 1000s of molecules instead, these wrong assumptions have a huge impact on your final architecture. As you can imagine, this is a very different situation, and we simply had to adapt. So it’s really important to have a feedback loop to continuously judge what metrics are important, and this is why MLOps is critical.

Additionally, it’s important to understand the ML code, its structure and maybe propose some refactoring. Also, I think that monitoring can be challenging, particularly finding edge cases, or detecting something that is not as it should be. Maybe contracts are not well-defined at the beginning — like payload schemas for example. An entire model can break if schemas are changed without properly handling it, so having this kind of contract is essential. And this is coming back to what Christophe just mentioned, there are a lot of people involved in putting models in production and it isn’t just MLOps.

So we need to have a kind of Team API so that we can say here is the contract with you, here is the observability dashboard, our documentation, etc. In our case we are not building the models, but we partner with many scientists, so clear communication channels, unit tests and boundaries are essential.

Christophe
And just to finish on that one, I want to emphasise the data aspect again. This can be a very big challenge because others in large enterprise organisations will probably be facing what we are facing, which is that you have the data in various sources, not just organised in a beautiful, cloud hosted warehouse. It could be data sitting around a couple of legacy systems that you need to integrate and it has to all come together. This can take a lot of time.

What types of models are currently supported on your platform and how is this impacting drug discovery?

Christophe
I can start with this one. We are working with different data modalities such as text and images. We work with descriptors for chemicals, for proteins etc. So we really have a very wide scope.

We have models that for example will predict compound properties. For us it’s very important in drug discovery because you have a stage where you want to design compounds with properties of interest. These properties can be related to the efficacy of the compound, safety, solubility and many other things. As you can imagine, the more you can leverage your existing data and predict these properties in silico, the faster you can go and then you can also run much better and more targeted experiments. So we have a lot of those property predictors.

We have image classifiers.

We label text as well, such as abstracts from scientific publications to send alerts to scientists who might be interested in some specific topics.

We’re covering many different steps in drug discovery from the very early stages, something called target identification, target validation, all the way up until a clinical lead selection.

We have single task and multitask models, and we cover multiple frameworks such as pytorch, scikit learn, etcetera. We have a couple of Transformers as well that are now deployed, so it’s quite diverse and I think this diversity is also inherent to drug discovery, because you really need a lot of very different expertises to come together, lots of different data types to come together to actually make a drug.

So I don’t know if Oswaldo wants to add anything. Maybe more on the technical front as well.

Oswaldo
I think that on the technical side, we have large models that give us different types of problems, let’s say if you have a model that you can fit inside a Docker image, you’re pretty much done, you basically deploy this via REST API leveraging serverless technologies like KServe on top of Istio and Kubernetes and this gives us lots of flexibility for inferencing. But there is sometimes a different problem if you need a batch inference.

But you know, sometimes we have the case that we want to have a model, but it’s very large, and it doesn’t fit on a Docker image nicely. In such a case, we can use Kubeflow where we can have persistent volumes, and leverage the power of the Kubernetes architecture to just store these models there. Then you can have a prediction pipeline that you will grab the model from this storage, you mount the volume that’s already there and that makes the whole thing simple rather than downloading it every time you want to make predictions.

So from the technical point of view, it depends on the requirements of the model. And sometimes even the unspecified requirements could be an issue. For example, when scientists are training the models on a shared HPC computing environment, requirements might not be well registered, and it’s hard for us to properly recreate the model training step on our platform.

What are your recommendations for scientists getting started with AI and ML and who want to make their models a success?

Christophe
So my first recommendation would actually be non-technical and I will go back to what I mentioned earlier.

Make sure that the problem you’re solving with AI and ML is really a problem that exists and be very keen on confirming this early and often. Talk to people, understand who would be consuming your model. Do they like it? Is this something they are going to use every day?

Because the more information you can get about that, the more successful you will be. Indeed, you can have the best model in the world, but if it predicts something or generates outputs that nobody’s going to use, then this doesn’t make sense.

Look, I have a couple of others also, when it comes to MLOps, but I think for me that’s the start, really make sure that you’re really solving a problem.

Oswaldo
One thing that I learned from Christophe is that you have to make sure that you have your internal clients, someone that will consume those predictions.

People might be blinded by super smart team members, but what they need may not be grounded in evidence.

Basically what Christophe just said, they have the best model in the world, however maybe nobody actually needs it, and this is not an easy situation.

I’ve also had an issue more than once where some technical work gets oversimplified. Some non-developer teams tend to think that building UI is trivial because what they do in their scientific realm is so much more complex than building UIs. So they might say ‘just put a UI on top of this’, but it’s actually incredibly difficult. People have entire jobs doing this as UI/UX developers.

Of course, there are things like Streamlit that let you bridge the gap so that you can quickly write a couple of lines of Python, and then you have a UI that kind of works.

But then some of these ideas have been presented to us, like “just completely transform this open source platform and adapt it to use our own models”. And these things take time, I’ve seen this in many companies and industries, not only here. They are rarely easy wins.

If I could talk to a scientist today and give them some advice, it would be to really invest in the environment and package management. Then move into containerization and finally migrate your code to an orchestrator where you automatically build your pipelines. Make sure that you’re not the only person in the world who can build these predictive models. .

And of course, when it comes to MLOps, you have to monitor not just the code, but the data.

As an example, fraud detection models were breaking during the pandemic because all the models were observing strange credit card traffic: all of a sudden people were buying on Amazon at three in the morning, and everyone was having their cards blocked. The models were correctly trained, the hyperparameters were correct, the people who wrote them probably were using very nice MacBook pros with environment management like PyCharm and clean unbiased data when they trained the model using the latest algorithms. But the data changed because the world changed.

In some cases, the data changes quickly and you have to react. This is where the ML monitoring aspect is super important.

So I would ask scientists, you know, to have a coffee with an Engineer early in the process, try to work out what is important together, so that I as an Engineer can work with you to deploy the model well.

Christophe

I would even say, especially for people who really come from a scientific background, really reflect early on the fact that the Ops part in ML and AI is incredibly important. It’s probably as important as the experimentation that you’ve been doing in the first place.

And often you can create a model that’s fantastic, but then you need to put this in real life, similar to the example Oswaldo gave with the credit card fraud, that’s real life. In many instances that’s what happens when you put a model in production and into real life. You need to care about this Ops part even if you’re not the one doing it yourself.

And as Oswaldo mentioned, monitoring usage patterns correctly, connecting with people if you’re in a large organisation with the skill sets that you need, understanding how people use your model, is it helping them? Is it performing well? All of these real life aspects beyond the initial experimentation are incredibly important.

What strategies do you recommend for serving machine learning at scale?

Oswaldo
Firstly, I will say that I love this question and I think that serving ML at scale is all about not having to care so much about the infrastructure by using serverless. There are a lot of technologies that allow you ‘not to care’, or try to predict the future model demand in terms of the computing infrastructure that you will need for serving model endpoints.

If you know you have very specific use cases, OK, maybe you can predict the ML model demand upfront.

For example, if you know you will only call this model a maximum of 10 times per minute and latency isn’t an issue, then just grab a big enough server that serves the endpoint for you. But that’s rarely the case.

Usually you will have spiky, unpredictable workloads. For me this is more common than anything else, and for that luckily you have a serverless paradigm.

It’s also possible to think about serverless computing in the open source community by using Kubeflow and Kubernetes and basically all these nice stacks where you have istio. Indeed, if you deploy it correctly you can have one endpoint that can have hundreds of replicas when receiving a high traffic and 0 replicas when nobody is using the model as well. And this is the way we actually do it.

On our MLOps platform, we have hundreds of models that are called whenever a scientist in some part of Switzerland wakes up, you know, makes coffee and then wants to get some predictions to plan an experiment.

Usually when they do this, they might maybe need predictions for 20 or 50 compounds at the same time. But sometimes, some larger batches are needed, and across a lot of models at the same time. And the problem is that there is some intermediate IT infrastructure that processes and batches the requests if a scientist wants to know a certain property for large numbers of molecules, such as 10,000 or 1,000,000 molecules.

This internal tool batches the 1,000,000 requests and calls the existing endpoints. Basically, this is similar to a denial of service attack on our ML server because it’s parallelizing the molecules that require predictions and a traditional infrastructure would not be able to cope with traffic.

And this is actually the reason why one of our main internal clients came to us. Their server was breaking when his end users started making a lot of calls to the endpoint. The naive solution is to have hundreds of replicas per ML model, but this is wasteful and expensive. So we use the latest cloud native serverless technologies to cope with unpredictable spikes of traffic and low cost and latency.

So how do you manage the technical stack due to the breadth of the tools within MLOps?

Oswaldo
I’ve given a couple of talks about this. I think, Christophe and Muller (our architect), you just gave a talk about this in Texas too.

So essentially, what we tend to do in our team is try to minimise the cognitive load by not jumping into every single thing that becomes popular. We didn’t jump into parts of the architecture that we felt weren’t important at the time. We didn’t use the so-called latest and greatest tooling for the sake of it when we had something that was good enough for us. We didn’t want to grow our stack if it wasn’t necessary with things like ML Flow (even though I love this tool).

Feature stores were huge a couple of years ago and we talked about it, we thought about it, and then we decided, OK, we don’t really need it at the moment.

So one approach we have is to try to minimise this and really try to be minimalistic, even to the point that for certain use cases we just have a single batch inference pipeline where results are published to a Kafka topic.

And maybe this is not perfect, but we are not trying to get to the highest maturity level of Google in every single use case. We are trying to maximise the value that we can bring to the company by doing the minimal amount of work on our side that provides that value. Then we move into higher levels of maturity as we go along.

So that’s been the approach, being minimalist, there is no technical silver bullet. This takes a lot of effort to ignore some hot trends, especially if you are very innovative, you love new stuff like I do, but we have a budget in terms of cognitive load and we try to minimise it.

Christophe
I think again to reemphasize, we are working for a drug discovery organisation. So really the value we have to deliver and we must deliver is on the drug discovery pipeline. It’s really about keeping a healthy balance between what’s right and choosing the right tools. We cannot just follow the trends for the sake of following trends, we don’t have the bandwidth and it’s also not what we should do.

So on one hand, we don’t want to miss the big next thing, because it is really important to remain up to date and follow the way the field evolves but on the other, we cannot pivot every two months into new tech. Oswaldo gave a very good example of how sometimes simpler things or things that are good enough deliver value.

And I think one of the things we mentioned in a talk last year was that sometimes less is more. Especially in the early days. When we started we had nothing so going with a full blown textbook kind of architecture with lots of different models and data stores and things like that, it’s overwhelming also for the organisation.

Especially if you have a lot of scientists that are learning about this and you’re in the middle of the change management process. So sometimes less is more. You really want to provide those key functionalities that people need now and to help them upskill and understand what’s coming little by little, then we add more. That’s the way we’ve been doing it.

And that has served us well.

Oswaldo
I will add, because maybe it’s obvious to me, but not for somebody that hasn’t worked with our team or who is reading this, we also tried to lean on anything that’s managed, but without going absolutely crazy. In the sense that, we wanted everything managed, and we tried that at the beginning. We had a vendor for a managed Kubeflow that just was not working as it gave us a lot of trouble. Instead, we actually decided to manage this part of the infrastructure ourselves. It’s not an obvious choice and it’s trial and error.

However, internally we have cloud teams that can provide support so that we don’t have to manage everything in the infrastructure. So that there are teams that we partner with and they help us simplify our stack. They share their cloud formation stack and we use it without hardly touching anything.

We partner with these teams and they are basically owners of our cloud landing zones and we don’t have to worry about that aspect of the stack. At some point we were less than five engineers and we could not have actually done anything if we had to worry about everything in the infrastructure and this is where we started to think differently. This really reminded me about this team topologies book that states that we want to have teams that are more focused so that the cognitive load does not explode.

I gave a talk last year where I mentioned that we don’t want to have everything open sourced but also not everything through a vendor. We have to keep this balance, what is best for us might not be best for others, but we need to find this balance. Maybe if you don’t have the bandwidth, don’t go with Kubeflow, because maybe you have only two engineers on your team, and I’d suggest going for something else. You have to be honest with yourself about what you can manage and what you cannot. It’s a compromise you have to make.

Post Interview Summary

Christophe and Oswaldo covered a huge amount of ground during the interview, providing a number of insights and recommendations that will undoubtedly add value to your MLOps journey. To summarise my key takeaways:

● Putting models into production is not a trivial exercise, there must be a problem worth solving. First and foremost, you need to ensure the model you are building is solving a problem, and take the time to really understand what this problem is.

● To do this communication is key. It is vital to align business and technical thinking and teams, and ensure decisions are grounded in evidence. As Oswaldo suggested, go grab a coffee, collaborate and overcome silos.

● Once you’ve made the decision to build a model and deploy to production, the importance of infrastructure challenges can’t be understated. Again, communication is key: what infrastructure setup will this model require? What is the load of the model in question?

● Environment management may not be a sexy topic, but it is vital.

● There can be big benefits in the use of serverless in serving ML at scale.

● Minimise cognitive load on the team. Utilise existing platforms to reduce your management workload. Balance open source or self-managed technologies with managed services depending on your need.

● Take a minimalistic approach to your stack. Whilst following trends and working with new technologies is great fun, it may not serve the overarching value you are aiming to add.

Finally a huge thank you to Christophe and Oswaldo for spending their valuable time with me and sharing their insights with the community.

--

--

Tom Parker
Ubuntu AI

I help global teams hire DevOps, SecOps & MLOps