Published in

Different Architectures of Machine Learning Model Deployment!

This blog aims to explain Model Deployment & the different available architectures of deploying any machine learning model with the use-cases & examples. In addition to that, it also unveils that in reality whether a machine learning model is deployed or is it something else?


Over time, Machine Learning is evolved to another level, but it's really sad to witness that still, whenever Machine Learning fundamentals or pipelines is been explained at most of the places (university/blog/any other source), only the incomplete knowledge in been conveyed. Only till the model creation/training & testing, the relevant concepts are explained. There are very few places where the model deployment is been explained, even though if someone is aware of the model deployment, it is very important to be aware of the different/multiple available ways of deploying the model with their use-cases & architecture.

Considering the importance of the model deployment & its ways, this blog will act as a guide/path to make you understand the topic.

Now, that being said let’s proceed with the actual content.

What is Model Deployment?

The term Model Deployment refers to the act/process of making the machine learning model available to the end-user/desired audience. The model created should be able to take the requests from the desired audience & then it should be able to return the result/output to the same desired audience.

For example: There is a very simple use-case of creating a prediction machine learning model whose goal is to predict the salary of an employee based on some attributes/features. This model needs to be present on some company’s website so that a person who is trying to apply for a job in that company may get a rough idea of the salary that will be offered to it. To fulfill this use case, first of all, the required steps of an ideal machine learning pipeline will be implemented, then once the model is ready, it needs to be somehow integrated with the company’s website, so that it can serve the requests of any person. This process of making the trained & tested model available on the desired location (company’s website) so that it can serve the requests of the intended audience is known as Model Deployment.

In reality, not only model is deployed, but, almost the complete machine learning pipeline is deployed (not complete, but most of the steps of the pipeline are deployed that are: Feature Engineering, Feature Selection, & Model Training). The reasons for this action are many, a few of them are listed below:

1. To perform Feature Engineering, all of the processing needs to be done again, so this functionality needs to be deployed. As when taking input from the user, the user should face the simple interface considering it to be completely oblivious of the internal processing. There can be multiple requirements involving generating new columns, multiple transformations, etc. for a specific use case that can only be handled by Feature Engineering.

2. Once the Feature Engineering is done, there can be multiple scenarios where some of the dominant features need to be selected for the final model training that in turn also removes the problem of “curse of dimensionality” (problem caused by a lot of features in the dataset).

3. Over time, the model becomes old/obsolete, therefore, it needs to be updated regularly, therefore continuous training, continuous testing, & continuous deployment is much needed or I can say, it is a mandate. This is the reason that leads to the birth of MLOPS.

Various Architectures of Machine Learning Model Deployment!

Machine Learning Model Deployment Architecture signifies how a Machine Learning Model is deployed or the design pattern that is used to deploy the machine learning model.

Any model that is deployed, in every case, is deployed with some application because a model will be deployed to fulfill some use case, & the presentation of that use-case or at least the designing of the interface that is used to deploy the model will be done using any application. For example, the simplest model deployment can be done through a web page that can take input from the user, then take that input to the model (API working), & then return the result to the user. Here, the application will be that simple web page.

That being said, let’s understand the 4 different architectures of model deployment:

Embedded Architecture

Image Designed by Author!

In the architecture, the model is deployed within the application in an embedded way as a dependency of the application, the model is packaged within the final/consuming application at the build time of the application.

The application & the model serve their specific purpose being at a single location.

The model used here is pre-trained & will have the capability to “predict on the fly”.

In this architecture, there is a trade-off between flexibility & simplicity. Since the model is deployed within the application, therefore, in case any one of the two things needs to be updated, it will affect both the things, specifically, in case when the model needs to be updated, then in that case, application also needs to be deployed again. Simplicity is promoted more than flexibility here.

This architecture can also be used inside mobile devices as mobile applications, for example, “Core ML”. It can also be used directly in the browser, for example, “Tensorflow.js”. This architecture is widely used in the Flask & Django applications as well.

Dedicated Model API

Image Designed by Author!

This architecture has application & model deployed separately, whenever the application needs the model, it can be called remotely either using REST API call or any other.

Here also the model used is pre-trained & will have the capability to “predict on the fly”.

In this architecture, the trade-off of the Embedded architecture is reversed as we have now separate servers hosting the API & the Applciation that increases the complexity (decreased simplcity) & increases flexibility. Each part (Model & Application) can be scaled indepenedently to handle the traffic.

Model Published as Data

Image Designed by Author!

Here also the model used is pre-trained & will have the capability to “predict on the fly”.

In this architecture, streaming capability in application is also present, that is why a streaming application like Apache Kafka is used to hold/store the models that can be consumed by the applciations. Here, different versions of the same model or differnet models to solve different use-cases are published to the Apache Kafka Topic by the ML Engineers & the application subscribe to the Apache Kafka Topic to consume at the runtime as per the need/requirement. Here also, the trade off is much more inclined towards increasing the complexity (decreasing simplicity) & increasing flexibility of the complete system/architecture. This architecture is much more advanced & can solve many complex use-cases.

Offline Predictions

Image Designed by Author!

Here, the model used is pre-trained, but it will not have the capability to “predict on the fly”.

This is the only architecture that doesn’t support “predict-on-the-fly”. Here also, the trade off is much more inclined towards increasing the complexity (decreasing simplicity) & increasing flexibility of the complete system/architecture. The predictions are generated in the offline way by applying whatever ML Pipeline steps that are required, & then they are stored in the predictions Database. This Architecture is somehow obsolete, but still it is very beneficial in some use-cases, for example, when there is a requirement to validate that either the prediction is correct or not. To solve such use-case, this architecture can be used, as we will have the predictions already, then we can verify them.


A comparison for all the architectures based on the most important factors is present below in a table:

Table generated by Author!

There is no as-such best architecture of the machine learning model deployment, just the fact is that the architecture selection depends majorly on the use-case, it also depends on some other factors like the capability to manage complex system architecture, etc.

Therefore, after analyzing all the things, one architecture is selected to deploy the Machine Learning Model!

I hope my article explains each and everything related to the topic with all the detailed concepts and explanations. Thank you so much for investing your time in reading my blog & boosting your knowledge. If you like my work, then I request you to applaud this blog & follow me on Medium, GitHub, & LinkedIn for the more amazing content on multiple technologies and their integration!

Also, subscribe to me on Medium to get the update of all my blogs!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store