Machine Learning System Design: Models-as-a-service

Architecture patterns for making models available as a service.

Vimarsh Karbhari
Acing AI
4 min readApr 21, 2020

--

Engineers strive to remove barriers that block innovation in all aspects of software engineering. Currently, in addition to deploying technology products, there is an amalgamation of technology and data models or just deploying a plethora of AI models. In this article, we will cover the horizontal approach of serving data science models from an architectural perspective.

DevOps emerged when agile software engineering matured around 2009. Today, as data science products mature, ML Ops is emerging as a counterpart to traditional devops.

Currently, since ML Ops is not a mature standardized approach, sometimes teams spend more time bringing the model to production than developing and training it.

Depending on the team structure and dynamic, teams could try making these models available based on their leaning towards data science or engineering.

Traditional Software Engineering heavy teams

If the team is traditional software engineering heavy, making data science models available might have a different meaning. Sometimes, teams would translate the Python model to Java and then use the Java web services with Spring and Tomcat to make them available as an API. For Python, Django or Flask are commonly used.

Distributed Software Engineering/Data Science centric teams

In this scenario, the teams usually have some container technology like Kubernetes which is leveraged on their respective cloud platforms. This process does not have a one size fits all approach. There are different architectural patterns to achieve the required outcomes.

Photo by Lefteris kallergis on Unsplash

Architectural Patterns

Standalone Model-as-a-Service

In this pattern, usually the model has little or no dependency on the existing application and made available standalone. Usually, in this pattern the model is dropped and made available using AWS Elastic Search like service. Logstash and Kibana on AWS Elastic Search are used to provide metrics associated with the service since it is deployed standalone. Every time the model updated, it has to get updated and deployed accordingly to the elastic search instance. It provides flexibility on one end but could lead to issues as the service grows and starts spreading into the application itself.

Real-time Model-as-a-Service

In this pattern, the model while deployed to production has inputs given to it and the model responds to those inputs in real-time. The applications which produce and consume real time streaming data to make decisions usually follow this architectural pattern. Imagine a stock trading model as a service which makes decisions split second based on the current value of a stock. Whenever the model is updated, since the old model is currently serving requests, we will need to deploy these models using the canary models deployment technique.

Immersive Model-as-a-Service

In this pattern, the model is immersed in the application itself. It cannot be separated from the application itself. Whenever a new version of the application is deployed, it has a version of the model in the deployment and vice versa. Since they are intertwined, this requires the Ops teams to have custom deploy infrastructure which will handle this pattern. A/B test models and composite models usually leverage this approach.

Technologies to achieve these architectural patterns:

For any of the architectural patterns we use, there will be some common entities which will be used to achieve economies of scale. DVC could be leveraged to maintain versioning.

MLeap provides a common serialization format for exporting/importing Spark, scikit-learn, and Tensorflow models. MLflow Models is trying to provide a standard way to package models in different ways so they can be consumed by different downstream tools depending the pattern.

Application and models can be deployed separately or together using Docker images depending the pattern.

For actual ML workflows, each of the cloud providers, Google GCP, Azure ML or ML on AWS. Each of these platforms also provide monitoring and logging as well. Application wide cloud monitoring post deployment could be achieved by Wavefront. Logging infrastructure can be achieved using Splunk or Datadog. These two are important as we need data about how the models and the product is performing.

Recommendations

It is worth noting that, regardless of which pattern you decide to use, there is always an implicit contract between the model and its consumers. Since the ML Ops world is not standardized yet, no pattern or deployment standard can be considered a clear winner yet, and therefore you will need to evaluate the right option for the team and product needs.

Subscribe to our Acing Data Science newsletter for more such content.

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

--

--