This post is authored by Fidan Boylu Uz, Mathew Salvaris and Yan Zhang, Data Scientists at Microsoft.
Data scientists and AI developers are typically more consumed with training of their models compared to deployment creating a large gap between experimentation and production. As recent developments made AI more accessible, going from modelling to production faster to evaluate real performance of models has become an essential need. To that end, tools that provide a quick path to production have become quite popular among data scientists and engineers. In this blog post, we will be looking into two machine learning toolkits Azure Machine Learning service (AML) and Kubeflow to compare the two approaches for a computer vision scenario where one would like to deploy a trained deep learning model for image classification. We hope this will help data scientists make a more informed decision for their next deployment problem.
The AML approach for solving this scenario has already been published by Microsoft as an AI reference architecture on Real-time scoring of Python models on Azure in this repository . In this tutorial, we follow a similar pattern to show how to use Kubeflow to deploy deep learning models using TensorFlow Serving on Azure Kubernetes Service. Here is the architecture diagram followed by explanations of its major components.
Kubeflow is a popular open source machine learning toolkit dedicated to make deployments of machine learning workflows such as data preparation, model training and prediction serving simple on Kubernetes. It provides easy to use parameterized templates which makes it easy to customize Kubernetes manifests for a particular use case. It consists of many components to serve in multiple deep learning frameworks such as TensorFlow serving component which is used in this tutorial.
TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. It provides out-of-the-box integration with TensorFlow models but can be easily extended to serve other types of models and data.
Azure Kubernetes Service (AKS) simplifies the management of Kubernetes by enabling easy provisioning of clusters through tools like Azure CLI and by streamlining cluster maintenance with automated upgrades and scaling. Additionally, the ability to create GPU clusters allows AKS to be used for high-performing serving, and auto-scaling of deep learning models.
This tutorial provides step-by-step instructions to use these powerful services to perform the following steps below to serve a pre-trained CNN model on AKS using Kubeflow’s TensorFlow serving component:
Develop Model Servable
In this first step, we load a pre-trained Inception model using Keras library, pre-process the images to the required format and call the model to find the top predictions. Next, we save the model as a TensorFlow servable.
Serve Model Locally
In this step, we pull the TensorFlow serving docker image and start serving the exported model locally by opening the REST API port and test the local service.
Create AKS Cluster and Mount Blobfuse
Next, we attach blobfuse to AKS nodes as a virtual file system backed by the Azure Blob Storage using blobfuse volume driver for Kubernetes . We then store the model servables in the attached blob storage. TensorFlow serving component will serve the model from this location.
Serve Model with Kubeflow
In this step, we install Kubeflow’s common components along with TensorFlow serving component on AKS. We then provide the parameters of the deployment and use a modified template for the deployment to have access to blobfuse. Finally, we test the web service using external IP from the load balancer. As a last step, we delete the resource group where AKS and blob storage was provisioned on and free up the Azure resources in the last notebook of the tutorial.
Although the two approaches share many similarities such as using Docker containers and Kubernetes, they differ in the following aspects:
- The Kubeflow deployment is built around TensorFlow serving and relies on official TensorFlow serving docker image along with model servables for web service creation. The AML deployment is more generic and is built around docker image created by the service based on the Anaconda environment specification and a scoring script prepared by the user.
- The Kubeflow deployment requires the model servables be saved in a storage account from which it is loaded at runtime. The AML service bakes in the model at build time based on the model that is register and saved on AML workspace.
- AML service hides the complexities by allowing the user to seamlessly deploy models without expertise in Kubernetes in contrast Kubeflow requires the user to be familiar with Kubernetes manifests.
Overall, AML service manages to abstract some of the more mundane complexities yet provides enough flexibility which make it quite compelling.
We hope you give both approaches a try and reach out to us with any comments or questions.