Deploying Deep-Learning Models to Kubernetes on IBM Cloud

The tools you need to turn ready-made model assets into a web service

Published in

Center for Open Source Data and AI Technologies

6 min readAug 9, 2018

In my previous blog post I outlined how you can download ready-to-use deep-learning models from the Model Asset Exchange (MAX) and run them locally in containers using Docker.

We’ve recently added a starter Kubernetes deployment configuration file to many model assets, providing you with a head start if you are planning on deploying the MAX Docker containers to a Kubernetes instance you are running locally or to a managed instance in the <pick-your-favorite> cloud.

Deploy a ready-to-use deep-learning model from MAX to Kubernetes using the provided starter configuration file

The starter configuration file defines multiple Kubernetes objects: a pod, a deployment, and a service:

A pod embodies a single instance of a ready-to-use MAX model, serving the model’s REST API endpoints (e.g. /model/predict). Each MAX pod is backed by a container that is based on a Docker image we’ve published on Docker Hub (e.g. codait/max-object-detector).

View of the codait/max-object-detector registry entry at https://hub.docker.com/ (a cloud-based Docker image registry)

A deployment describes the desired state for pods, such as the number of replicas to run. Out of the box MAX deployments are configured to run only a single instance of the specified MAX model. (You can increase the replica count as desired.)
A service defines a logical set of pods and a policy by which these pods are accessed. The starter configuration file defines a service of type NodePort, exposing the model serving port externally on each Kubernetes worker node where a MAX model pod is running.

If you are planning to deploy a MAX model Docker image to a Kubernetes instance you are managing in your environment or to a managed cluster in a cloud other than IBM follow the appropriate instructions for that environment. In the remainder of this blog post, I’ll describe how to perform the deployment to the Kubernetes service on IBM Cloud.

Deployment to the IBM Cloud

The Kubernetes service on IBM Cloud provides two cluster types: a free cluster (one worker pool with a single virtual-shared worker node with 2 cores, 4GB RAM, and 100GB SAN), which is great for exploration, or a fully customizable standard cluster (virtual-shared, virtual-dedicated, or bare metal) for the heavy lifting.

Set up a Kubernetes cluster

If you already have access to a cluster skip to section “Deploy the deep-learning model Docker image”, otherwise create a free cluster using your IBM Cloud id.

Note: Some MAX models (e.g., the audio classifier) require resources in excess of what is provided in the free cluster. Check the model asset’s GitHub repository README to see if this limitation applies to the model you are planning to deploy.

Cluster provisioning might take a couple of minutes. Go ahead and install the command line interfaces (CLIs) on your local machine in the meantime.

Note: If you prefer, you can also use the IBM Cloud web console to deploy the MAX models without having to install anything on your machine.

Install the CLIs

Kubernetes clusters in the IBM Cloud are managed using the IBM Cloud CLI(ibmcloud) and the Kubernetes CLI (kubectl).

You can find instructions on how to access your cluster in the Access tab when you open the cluster in your IBM Cloud dashboard:

After you’ve configured access to the Kubernetes cluster verify that the worker node is ready.

Note: The examples below have been edited for brevity. Additional output is displayed ( ...) when you run the listed commands in a terminal window.

$ kubectl get nodes
 NAME       STATUS  ...
 x.x.x.x     Ready  ...
 ...

Note: To learn more about kubectl commands use the embedded help (e.g. kubectl help for general help or kubectl help get to get help for the get command) or refer to the CLI reference.

You can now deploy the desired MAX model Docker image. If you decide to deploy a model other than MAX-Object-Detector keep in mind that the displayed file names, directory names and object names in the examples below will be different; however, the deployment steps are identical.

Deploy the MAX model Docker image

From the MAX website choose the desired model and clone the GitHub repository. You can skip this step if you are not planning to customize the model’s starter Kubernetes configuration file:

$ git clone https://github.com/IBM/MAX-Object-Detector.git
 ...$ cd MAX-Object-Detector$ ls *.yaml
 max-object-detector.yaml

Create the Kubernetes objects for the selected MAX model using the provided configuration file.

Note: If you did not clone the repository locally, specify the remote URL of the appropriate configuration file in the command below. For example, to create the objects for the MAX-Object-Detector specify https://github.com/IBM/MAX-Object-Detector/blob/master/max-object-detector.yaml?raw=true as parameter.

$ kubectl apply -f ./max-object-detector.yaml
 service/max-object-detector created
 deployment.extensions/max-object-detector created  
 ...

Verify that the objects have been created and that the pod is running:

$ kubectl get pods
 NAME                    READY    STATUS   ...
 ...
 max-object-detector-8..   1/1   Running   ...$ kubectl get services
 NAME               TYPE     CLUSTER-IP EXTERNAL-IP      PORT(S) ...
 ...
 max-object-... NodePort 172.21.120.15       <none> 5000:n..n/TCP $ kubectl get deployments
 NAME            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   ...
 max-object-...        1         1            1           0   ...

Note: For large models it might take a couple of minutes until the container has been created and the pod is running.

Identify the public IP address(es) and port

The Kubernetes service exposes the model service’s port on the worker node (or nodes) where the pod is running.

To identify the public IP address(es) of the worker node(s) run the following commands, replacing mycluster in the second command with the cluster name returned by the first command:

$ ibmcloud cs clusters
 Name        ID      ...    
 mycluster   094...  ... $ ibmcloud cs workers mycluster
 ID               Public IP       Private IP ...
 kube-...   NNN.NNN.NNN.NNN  nnn.nnn.nnn.nnn ...

Take note of the public IP addressNNN.NNN.NNN.NNN.

Each MAX model service API endpoint is served through the NodePort. Display information about the service:

$ kubectl describe service max-object-detector
 Name:                     max-object-detector
 ...
 NodePort:                 <unset>  ppppp/TCP
 ...

Locate the NodePort setting and capture the public port number ppppp. You now have all the information you need to access the deep-learning model service running on Kubernetes.

Access the deep-learning model service

You can quickly test the deployment by directing your browser to http://<NNN.NNN.NNN.NNN>:<ppppp> to open the service’s Swagger specification and explore the displayed endpoints.

The model service Swagger specification describes the API endpoints that your application(s) can invoke

Cleaning up

To remove the MAX model Kubernetes objects run the following commands:

$ kubectl delete services max-object-detector
 ...$ kubectl delete deployment max-object-detector
 ...

Wrapping up

As the name implies, the starter Kubernetes configuration files we’ve provided are only intended to get you going. You should, and will likely have to customize them to meet your needs:

The starter configuration only requests one running MAX pod replica, which might not be sufficient to handle your expected workloads.
The starter configuration only exposes the MAX model service at the worker node’s IP address, which can change over time.
The starter configuration does not configure load balancing across worker nodes, if pods are distributed across multiple nodes to provide high-availability.

You can use the LoadBalancer and Ingress service types to resolve these limitations.

Note: The free Kubernetes cluster service in IBM Cloud does not support the LoadBalancer or Ingress service types. You will have to upgrade to a standard cluster and choose a worker node configuration — bare-metal vs. virtual, desired number of nodes, resource allocation (cores/RAM/storage) per worker node — to increase availability and meet your performance requirements.