Machine Learning Model Serving Framework

Framework/Server for model serving

3 min readOct 11, 2023

Machine learning models or LLM have become an integral part of many industries. As these models become more prevalent, the need to serve these models in production environments has become increasingly important. Serving a machine learning model means making it available for prediction or inference, either to serve real-time predictions to users or to batch prediction for offline use. This article will provide an overview of various frameworks and servers used for serving machine learning models and their trade-offs.

Frameworks for Model Serving

BentoML: A framework for building reliable, scalable, and cost-efficient AI applications. It comes with everything you need for model serving, application packaging, and production deployment.
Jina: Build multimodal AI services via cloud native technologies. It provides features like Model Serving, Generative AI, Neural Search, and Cloud Native.
Mosec: A machine learning model serving framework with dynamic batching and pipelined stages, providing an easy-to-use Python interface.
TFServing: A flexible, high-performance serving system for machine learning models.
Torchserve: Serve, optimize and scale PyTorch models in production.
Triton Server (TRTIS): The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Cortex: An open-source platform for deploying, managing, and scaling machine learning models. It supports deployment of all types of models.
KFServing: Provides a Kubernetes Custom Resource Definition (CRD) for serving machine learning models on arbitrary frameworks.
Multi Model Server: A flexible and easy-to-use tool for serving deep learning models trained using any ML/DL framework.
Xinference: Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need.
Lanarky: FastAPI framework to build production-grade LLM applications .
Langchain-serve: Serverless LLM apps on Production with Jina AI Cloud.
Seldon Core: An open-source platform for deploying, scaling, and managing machine learning models in Kubernetes.
Ray Serve: A scalable and programmable serving framework built on top of Ray to help you scale your microservices and ML models in production.
OpenSearch ML Commons: An experimental feature that allows you to serve custom models and use those models to make inferences.
KServe: A Kubernetes-native platform to deploy and serve machine learning models .
MLflow Model Serving: A flexible, high-performance serving layer for machine learning models built using PyFunc .
Kubeflow: A machine learning toolkit for Kubernetes which aims to make deployments of machine learning workflows on Kubernetes simple, portable, and scalable .
Scikit-learn: A machine learning library for Python which supports model persistence, allowing you to save and load models .
H2O: An open-source platform for data analysis, machine learning, and predictive modeling .
FastAPI: A modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints .
Flask: A lightweight WSGI web application framework. It is designed to make getting started quick and easy, with the ability to scale up to complex applications .
MLServer: An open-source inference server for machine learning models. It provides a unified API for serving models and supports multiple inference runtimes .
Neptune: A platform for tracking, comparing, and sharing machine learning metadata. It can be used to manage the machine learning lifecycle, including tracking experiments, reproducibility, and deployment.

In conclusion, the field of machine learning is vast and continually evolving. The tools and technologies available for serving machine learning models are diverse and continually improving. While each tool has its strengths and weaknesses, all play a crucial role in enabling the practical application of machine learning models in real-world scenarios. As developers, understanding these tools and their capabilities is key to effectively utilizing machine learning in production environments.

Futher References

Connect with me on Linkedin
Find me on Github
Visit my technical channel on Youtube
Support: Buy me a Cofee/Chai

Machine Learning Model Serving Framework

Framework/Server for model serving

Frameworks for Model Serving

Futher References

Written by Abonia Sojasingarayar