Understanding TensorFlow Serving
Architecture and best practices of TensorFlow Serving
TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. It makes it easy to deploy new algorithms and experiments while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data. It provides out of the box integration with TensorFlow models but can be easily extended to serve other types of models.
TensorFlow Serving handles the model serving, version management, lets you serve models based on policies, and allows you to load your models from different sources. It is used internally at Google and numerous organizations worldwide.
TensorFlow Serving architecture
In the image below, you can see an overview of TF Serving architecture. This high-level architecture shows the important components that make up TF Serving.
From right to left in the image above, let’s start with the model source:
- The model source provides plugins and functionality to help you load models or in TF Serving terms servables from numerous locations (e.g GCS or AWS S3 bucket)…