Deploying Deep-Learning Models to Kubernetes on IBM Cloud

The tools you need to turn ready-made model assets into a web service

Patrick Titzler
Aug 9, 2018 · 6 min read

In my previous blog post I outlined how you can download ready-to-use deep-learning models from the Model Asset Exchange (MAX) and run them locally in containers using Docker.

We’ve recently added a starter Kubernetes deployment configuration file to many model assets, providing you with a head start if you are planning on deploying the MAX Docker containers to a Kubernetes instance you are running locally or to a managed instance in the <pick-your-favorite> cloud.

Deploy a ready-to-use deep-learning model from MAX to Kubernetes using the provided starter configuration file

The starter configuration file defines multiple Kubernetes objects: a pod, a deployment, and a service:

  • A pod embodies a single instance of a ready-to-use MAX model, serving the model’s REST API endpoints (e.g. /model/predict). Each MAX pod is backed by a container that is based on a Docker image we’ve published on Docker Hub (e.g. codait/max-object-detector).
View of the codait/max-object-detector registry entry at (a cloud-based Docker image registry)
  • A deployment describes the desired state for pods, such as the number of replicas to run. Out of the box MAX deployments are configured to run only a single instance of the specified MAX model. (You can increase the replica count as desired.)
  • A service defines a logical set of pods and a policy by which these pods are accessed. The starter configuration file defines a service of type NodePort, exposing the model serving port externally on each Kubernetes worker node where a MAX model pod is running.

If you are planning to deploy a MAX model Docker image to a Kubernetes instance you are managing in your environment or to a managed cluster in a cloud other than IBM follow the appropriate instructions for that environment. In the remainder of this blog post, I’ll describe how to perform the deployment to the Kubernetes service on IBM Cloud.

Deployment to the IBM Cloud

The Kubernetes service on IBM Cloud provides two cluster types: a (one worker pool with a single virtual-shared worker node with 2 cores, 4GB RAM, and 100GB SAN), which is great for exploration, or a fully customizable (virtual-shared, virtual-dedicated, or bare metal) for the heavy lifting.

Set up a Kubernetes cluster

If you already have access to a cluster skip to section “Deploy the deep-learning model Docker image”, otherwise create a free cluster using your IBM Cloud id.

Cluster provisioning might take a couple of minutes. Go ahead and install the command line interfaces (CLIs) on your local machine in the meantime.

Install the CLIs

Kubernetes clusters in the IBM Cloud are managed using the IBM Cloud CLI(ibmcloud) and the Kubernetes CLI (kubectl).

You can find instructions on how to access your cluster in the tab when you open the cluster in your IBM Cloud dashboard:

After you’ve configured access to the Kubernetes cluster verify that the worker node is ready.

$ kubectl get nodes
x.x.x.x Ready ...

You can now deploy the desired MAX model Docker image. If you decide to deploy a model other than MAX-Object-Detector keep in mind that the displayed file names, directory names and object names in the examples below will be different; however, the deployment steps are identical.

Deploy the MAX model Docker image

From the MAX website choose the desired model and clone the GitHub repository. You can skip this step if you are not planning to customize the model’s starter Kubernetes configuration file:

$ git clone
$ cd MAX-Object-Detector$ ls *.yaml

Create the Kubernetes objects for the selected MAX model using the provided configuration file.

$ kubectl apply -f ./max-object-detector.yaml
service/max-object-detector created
deployment.extensions/max-object-detector created

Verify that the objects have been created and that the pod is running:

$ kubectl get pods
max-object-detector-8.. 1/1 Running ...
$ kubectl get services
max-object-... NodePort <none> 5000:n..n/TCP
$ kubectl get deployments
max-object-... 1 1 1 0 ...

Identify the public IP address(es) and port

The Kubernetes service exposes the model service’s port on the worker node (or nodes) where the pod is running.

To identify the public IP address(es) of the worker node(s) run the following commands, replacing mycluster in the second command with the cluster name returned by the first command:

$ ibmcloud cs clusters
Name ID ...
mycluster 094... ...
$ ibmcloud cs workers mycluster
ID Public IP Private IP ...
kube-... NNN.NNN.NNN.NNN nnn.nnn.nnn.nnn ...

Take note of the public IP addressNNN.NNN.NNN.NNN.

Each MAX model service API endpoint is served through the NodePort. Display information about the service:

$ kubectl describe service max-object-detector
Name: max-object-detector
NodePort: <unset> ppppp/TCP

Locate the NodePort setting and capture the public port number ppppp. You now have all the information you need to access the deep-learning model service running on Kubernetes.

Access the deep-learning model service

You can quickly test the deployment by directing your browser to http://<NNN.NNN.NNN.NNN>:<ppppp> to open the service’s Swagger specification and explore the displayed endpoints.

The model service Swagger specification describes the API endpoints that your application(s) can invoke

Cleaning up

To remove the MAX model Kubernetes objects run the following commands:

$ kubectl delete services max-object-detector
$ kubectl delete deployment max-object-detector

Wrapping up

As the name implies, the starter Kubernetes configuration files we’ve provided are only intended to get you going. You should, and will likely have to customize them to meet your needs:

  • The starter configuration only requests one running MAX pod replica, which might not be sufficient to handle your expected workloads.
  • The starter configuration only exposes the MAX model service at the worker node’s IP address, which can change over time.
  • The starter configuration does not configure load balancing across worker nodes, if pods are distributed across multiple nodes to provide high-availability.

You can use the LoadBalancer and Ingress service types to resolve these limitations.


Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

Thanks to va barbosa

Patrick Titzler

Written by

Developer Advocate at Center for Open-Source Data & AI Technologies


Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade