Deploying Deep-Learning Models to Kubernetes on IBM Cloud
The tools you need to turn ready-made model assets into a web service
In my previous blog post I outlined how you can download ready-to-use deep-learning models from the Model Asset Exchange (MAX) and run them locally in containers using Docker.
We’ve recently added a starter Kubernetes deployment configuration file to many model assets, providing you with a head start if you are planning on deploying the MAX Docker containers to a Kubernetes instance you are running locally or to a managed instance in the <pick-your-favorite> cloud.
The starter configuration file defines multiple Kubernetes objects: a pod, a deployment, and a service:
- A pod embodies a single instance of a ready-to-use MAX model, serving the model’s REST API endpoints (e.g.
/model/predict
). Each MAX pod is backed by a container that is based on a Docker image we’ve published on Docker Hub (e.g.codait/max-object-detector
).
- A deployment describes the desired state for pods, such as the number of replicas to run. Out of the box MAX deployments are configured to run only a single instance of the specified MAX model. (You can increase the replica count as desired.)
- A service defines a logical set of pods and a policy by which these pods are accessed. The starter configuration file defines a service of type NodePort, exposing the model serving port externally on each Kubernetes worker node where a MAX model pod is running.
If you are planning to deploy a MAX model Docker image to a Kubernetes instance you are managing in your environment or to a managed cluster in a cloud other than IBM follow the appropriate instructions for that environment. In the remainder of this blog post, I’ll describe how to perform the deployment to the Kubernetes service on IBM Cloud.
Deployment to the IBM Cloud
The Kubernetes service on IBM Cloud provides two cluster types: a free cluster (one worker pool with a single virtual-shared worker node with 2 cores, 4GB RAM, and 100GB SAN), which is great for exploration, or a fully customizable standard cluster (virtual-shared, virtual-dedicated, or bare metal) for the heavy lifting.
Set up a Kubernetes cluster
If you already have access to a cluster skip to section “Deploy the deep-learning model Docker image”, otherwise create a free cluster using your IBM Cloud id.
Note: Some MAX models (e.g., the audio classifier) require resources in excess of what is provided in the free cluster. Check the model asset’s GitHub repository README to see if this limitation applies to the model you are planning to deploy.
Cluster provisioning might take a couple of minutes. Go ahead and install the command line interfaces (CLIs) on your local machine in the meantime.
Note: If you prefer, you can also use the IBM Cloud web console to deploy the MAX models without having to install anything on your machine.
Install the CLIs
Kubernetes clusters in the IBM Cloud are managed using the IBM Cloud CLI(ibmcloud
) and the Kubernetes CLI (kubectl
).
- Download and install the IBM Cloud CLI and IBM Cloud Kubernetes Service plug-in.
- Download and install the Kubernetes CLI.
You can find instructions on how to access your cluster in the Access tab when you open the cluster in your IBM Cloud dashboard:
After you’ve configured access to the Kubernetes cluster verify that the worker node is ready.
Note: The examples below have been edited for brevity. Additional output is displayed (
...
) when you run the listed commands in a terminal window.
$ kubectl get nodes
NAME STATUS ...
x.x.x.x Ready ...
...
Note: To learn more about
kubectl
commands use the embedded help (e.g.kubectl help
for general help orkubectl help get
to get help for theget
command) or refer to the CLI reference.
You can now deploy the desired MAX model Docker image. If you decide to deploy a model other than MAX-Object-Detector keep in mind that the displayed file names, directory names and object names in the examples below will be different; however, the deployment steps are identical.
Deploy the MAX model Docker image
From the MAX website choose the desired model and clone the GitHub repository. You can skip this step if you are not planning to customize the model’s starter Kubernetes configuration file:
$ git clone https://github.com/IBM/MAX-Object-Detector.git
...$ cd MAX-Object-Detector$ ls *.yaml
max-object-detector.yaml
Create the Kubernetes objects for the selected MAX model using the provided configuration file.
Note: If you did not clone the repository locally, specify the remote URL of the appropriate configuration file in the command below. For example, to create the objects for the MAX-Object-Detector specify
https://github.com/IBM/MAX-Object-Detector/blob/master/max-object-detector.yaml?raw=true
as parameter.
$ kubectl apply -f ./max-object-detector.yaml
service/max-object-detector created
deployment.extensions/max-object-detector created
...
Verify that the objects have been created and that the pod is running:
$ kubectl get pods
NAME READY STATUS ...
...
max-object-detector-8.. 1/1 Running ...$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) ...
...
max-object-... NodePort 172.21.120.15 <none> 5000:n..n/TCP $ kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE ...
max-object-... 1 1 1 0 ...
Note: For large models it might take a couple of minutes until the container has been created and the pod is running.
Identify the public IP address(es) and port
The Kubernetes service exposes the model service’s port on the worker node (or nodes) where the pod is running.
To identify the public IP address(es) of the worker node(s) run the following commands, replacing mycluster
in the second command with the cluster name returned by the first command:
$ ibmcloud cs clusters
Name ID ...
mycluster 094... ... $ ibmcloud cs workers mycluster
ID Public IP Private IP ...
kube-... NNN.NNN.NNN.NNN nnn.nnn.nnn.nnn ...
Take note of the public IP addressNNN.NNN.NNN.NNN
.
Each MAX model service API endpoint is served through the NodePort
. Display information about the service:
$ kubectl describe service max-object-detector
Name: max-object-detector
...
NodePort: <unset> ppppp/TCP
...
Locate the NodePort
setting and capture the public port number ppppp
. You now have all the information you need to access the deep-learning model service running on Kubernetes.
Access the deep-learning model service
You can quickly test the deployment by directing your browser to http://<NNN.NNN.NNN.NNN>:<ppppp>
to open the service’s Swagger specification and explore the displayed endpoints.
Cleaning up
To remove the MAX model Kubernetes objects run the following commands:
$ kubectl delete services max-object-detector
...$ kubectl delete deployment max-object-detector
...
Wrapping up
As the name implies, the starter Kubernetes configuration files we’ve provided are only intended to get you going. You should, and will likely have to customize them to meet your needs:
- The starter configuration only requests one running MAX pod replica, which might not be sufficient to handle your expected workloads.
- The starter configuration only exposes the MAX model service at the worker node’s IP address, which can change over time.
- The starter configuration does not configure load balancing across worker nodes, if pods are distributed across multiple nodes to provide high-availability.
You can use the LoadBalancer
and Ingress
service types to resolve these limitations.
Note: The free Kubernetes cluster service in IBM Cloud does not support the
LoadBalancer
orIngress
service types. You will have to upgrade to a standard cluster and choose a worker node configuration — bare-metal vs. virtual, desired number of nodes, resource allocation (cores/RAM/storage) per worker node — to increase availability and meet your performance requirements.