MLOps in Practice — Machine Learning (ML) model deployment patterns (Part 1)
Machine Learning (ML) model serving and deployment is one of the most critical components of any solid ML solution architecture. This article specifically talks about the ML model deployment scenarios.
In my previous article, MLOps in Practice — De-constructing an ML Solution Architecture into 10 components, we walked through the 10 critical components to build an end-to-end ML system. One of the 10 components is model serving and deployment.
Generally speaking, deploying an ML model to a production environment refers the process of getting data to be ingested by a model to compute predictions. This process requires a model, an interpreter for the execution, and input data.
However, depending on:
- The manner the ML models will be consumed, and
- How quickly the predictions of the ML models need to be available,
there could be different ways to deploy the ML model in a production environment. ML model deployment can be categorized into the 2 major patterns.
- Online (real-time) model deployment — ML models are normally packaged either as REST API endpoints or self-contained Docker images with the REST API endpoints. With online model deployment, the trained ML model makes predictions in real-time…