MedOps Study: What is it?
What is MedOps?
A group of researchers from the Cancer Screening Department of Lunit AI Research has gathered to explore one vast topic: “Medical Data Operations” or “MedOps” for short 😱🤔. If you are in the field of machine learning and deep learning in particular, you have likely heard the overloaded term “MLOps” being tossed around in conversations about ML products and AI in practice. Also with some high probability, the participants of that conversation were likely nodding with a level of enthusiasm expressing something along the lines of “yeah, MLOps is the way to go!”. But what exactly is MLOps? It is a term that is used ubiquitously in such diverse contexts that the actual meaning of it seems to have faded away. It is only appropriate that we define what MLOps is before we introduce our MedOps study in the Cancer Screening department.
From this extensive arXiv paper [1] by researchers of KIT and IBM, we find a thorough definition of the term MLOps:
MLOps (Machine Learning Operations) is a paradigm, including aspects like best practices, sets of concepts, as well as a development culture when it comes to the end-to-end conceptualization, implementation, monitoring, deployment, and scalability of machine learning products. Most of all, it is an engineering practice that leverages three contributing disciplines: machine learning, software engineering (especially DevOps), and data engineering. MLOps is aimed at productionizing machine learning systems by bridging the gap between development (Dev) and operations (Ops). Essentially, MLOps aims to facilitate the creation of machine learning products by leveraging these principles: CI/CD automation, workflow orchestration, reproducibility; versioning of data, model, and code; collaboration; continuous ML training and evaluation; ML metadata tracking and logging; continuous monitoring; and feedback loops.
In short, it is an engineering paradigm for productionizing ML systems with CI/CD automation, workflow orchestration and reproducibility as its core principles. It is a catch-all definition that basically covers the entire ML eco-system. As shown below, Matt Turck’s blog post on MAD (ML, AI and Data) landscape (2021) visualizes nicely just how vast this eco-system is.
We ask ourselves: What is still missing in this eco-system? Where is the bottleneck? What is a problem worth solving? What is our ultimate goal as researchers of Lunit?
Ultimate goal of the researchers at Lunit is to contribute to our mission of “Conquer Cancer through AI” by building AI models that drive our products. Cancer Screening department’s AI researchers have been building the best-in-class data-driven ML models for early detection of cancer and other findings in medical images. We firmly believe that our next big breakthrough and innovation depends on how effectively we manage all processes related to ML research with large amounts of medical data. The key will be to automate processes related to managing medical data for ML model development in a reliable and a scalable manner. We have been spending a lot of time designing, engineering and implementing automated pipelines: we believe this unlocks new exciting research opportunities.
We launched the Medical Data Operations Study (MedOps Study) to achieve the following goals:
- Identify current and emerging best practices for enabling MedOps tailored to our needs.
- Propose Focus Groups* after the end of the study to embed MedOps in our ML productization pipeline to ultimately improve AI models that drive our products (The INSIGHT lineup).
- Automate work related to processing large amounts of medical data to facilitate new research breakthroughs.
- * In the AI Research department, we have a system called Focus Groups : A Focus Group is a small group of researchers with a common goal of executing a bottom-up proposed research project. It is similar to the concept of Guilds in the Spotify Agile Team Organization but more research/engineering project oriented. We will cover this in more depth in a separate blog post!
A series of blog posts will share what we have discussed so far and the journey itself with the MedOps study. This first blog post covers one of the topics that we have discussed in the first month of our study: principles of MLOps and how it applies to MedOps.
The MLOps Principles: the MedOps Perspective
Let’s dive in to the nine principles of MLOps listed in [1] and think about how they would apply to our domain. (We also referred to the blog post by Alessandro Artoni [3] that provides an excellent summary of [1]).
- P1: CI/CD Automation. CI/CD improves team productivity by allowing quicker yet more robust code development, testing and deployment. This principle applies in general regardless of the application domain.
- P2: Workflow Orchestration. It coordinates and automates some steps in the ML workflow pipeline (digest raw data, preprocess data, train/test a model and deploy it). As the name of our study (MedOps) suggests, this principle is of key interest to us. Especially, raw data digestion and preprocessing are big pain points for dealing with large scale medical data but at the same time they are opportunities where we can innovate the most.
- P3: Reproducibility. The ability to reproduce a certain experiment or a specific AI engine version is of critical importance especially for medical AI software. Productization of medical AI software requires rigorous validation and regulatory approval processes which often take several months to over a year. However, the development of AI algorithms by the AI researchers do not stop nor should it. The systematic support to reproduce results easily that were generated years ago is part of standard practice for us.
- P4: Versioning. Fundamental to enabling P3. We are particularly interested in a strategy and a system for versioning our data.
- P5: Collaboration. This covers technical collaboration such as code repository management as well as collaboration with stakeholders outside of research as well. We also view building pipelines (especially for processing medical data) as a way to collaborate more efficiently: better pipelines mean better collaboration. Also, researchers in this field must think about how to effectively communicate results with domain experts like radiologists. The better we share and present our model’s performance, the better we can leverage expertise of radiologists that we have access to.
- P6: Continuous ML Training & Evaluation. Being able to automate and continuously train/evaluate ML models is a key concept in any MLOps system. At the core of this principle is setting up an easy-to-use Model Registry. We are also especially interested in the evaluation of our models: see this nice blog post about model evaluation by Thijs Kooi, Lunit’s VP of Cancer Screening Research Division.
- P7: ML Metadata Tracking and Logging. ML metadata tracking usually means keeping track of ML model’s metadata such as its evaluation metrics and version of code deployed. For our purposes, we are particularly interested in tracking the metadata of our data and building a system to manage metadata efficiently.
- P8: Continuous Monitoring. Monitoring model performance post deployment is key in detecting model/data drift. Continuous monitoring is one of the key components that drive evidence based product design. This is one of the most challenging and interesting topics that requires innovation. Compared to other ML based products, medical AI products are deployed in environments where setting up continuous monitoring and feedback loops are infeasible (most hospitals block access to the public internet and cloud for privacy reasons). This is one of the areas where we think we can set ourselves apart from our competitors.
- P9: Feedback Loops. Having a system to ensure feedback loops is helpful for improving the overall development cycle. At Lunit, we have been implementing this feedback loop at the data and annotation level. More specifically, the so-called Closing the Loop project aims to create a systematic flow of data from collection, ingestion, preprocessing, annotation, training, evaluation, deployment and back to collection. It has been our core strategy to develop best-in-class AI algorithms for Lunit’s products.
Conclusion
Through the MedOps study, Lunit’s Cancer Screening researchers will investigate new strategies to scale up our current pipelines and ultimately contribute to developing best-in-class AI models for INSIGHT products. As the name of the study suggests, we are particularly interested in how to manage large amounts of medical data more effectively. We will share our findings from the MedOps study!
References
[1] Kreuzberger et. al: Machine Learning Operations (MLOps): Overview, Definition, and Architecture. arXiv 2022
[2] Matt Turck’s blog post: https://mattturck.com/data2021/
[3] Alessandro Artoni’s blog post: https://medium.com/geekculture/general-overview-of-machine-learning-operations-mlops-d76520c3c09f