Deploy ML models at the edge with Microk8s, Seldon and Istio

Bartłomiej Poniecki-Klotz
Ubuntu AI
Published in
9 min readMar 24, 2023

Edge computing is defined as solutions that move data processing at or near the point of data generation. This means that the Machine Learning model inference results can be delivered to customers faster and create a real-time inference feeling. This is a perfect place for your models.

Looking at Gartner’s prediction: “Around 10% of enterprise-generated data is created and processed outside a traditional centralized data centre or cloud. By 2025, Gartner predicts this figure will reach 75%”. To summarize, they expect the majority of data processing to be performed at the edge. The prediction figure is so high because of the increasing amount of data produced — it is not possible for all of it to be sent to the central data centre for processing.

AI model deployments is popular for video processing at the Edge
Photo by Alan J. Hendry on Unsplash

Additionally, there is an increasing number of use cases that require Machine Learning model deployment in micro-data centres or edge devices. Just to name a few:

  • Cameras with video processing using GPU-accelerated hardware
  • Customer-facing shopping assistants with chatbots and large language models
  • MTL implemented in the manufacturing plant using 5G networks and MEC

Despite its benefits and rising popularity, many risks are involved in hosting ML models at the edge. One of the major ones is security. In a recent article, you can learn more about mitigating security risks for Machine Learning models.

The full-blown MLOps stack requires a lot of resources and cannot fit into the edge site, right? Not quite, this blog post will show you how to enable the key capabilities required for edge sites like serving models without the need to deploy the full MLOps stack.

Let’s build the Edge Site!

For this hands-on tutorial, we will be using AWS EC2 to provide a VM for us. Let’s create a t2.medium instance with public IP. This edge site requires around 2GB of memory and is ready to host multiple models. The more models you host, the more memory and CPU will be required. We will use Ubuntu 22.04 LTS as the OS and install most of the required tools using snaps. Snaps work based on channels and provides automated updates inside channels. This way, your environment will be always up-to-date. Handy, right?

Security group in AWS console

Also, remember to take care of the security groups because you will need access to the VM using port 22 for SSH and ports 30000–32767 for NodePorts. You should add them as ingress rules to the security group and attach this security group to the instance.

All codes are available in the repository below.

Prepare the environment

The first step is to install MicroK8s, which is a lightweight Kubernetes cluster. Thanks to its small resource requirements, MicroK8s is ideal for edge deployments. MicroK8s conveniently extend basic clusters with a storage class based on a local drive, GPU support or private registry. A full list of add-ons is available here: https://microk8s.io/docs/addons.

$ sudo snap install microk8s - channel 1.24/stable - classic
$ sudo usermod -a -G microk8s ubuntu
$ mkdir -p ~/.kube
$ sudo chown -f -R ubuntu ~/.kube
$ newgrp microk8s
$ microk8s enable hostpath-storage dns ingress

Before going to the next step, check if all expected plugins are enabled and Pods are Running. Here is the expected result of MicroK8s status.

$ microk8s status
microk8s is running
high-availability: no
datastore master nodes: 127.0.0.1:19001
datastore standby nodes: none
addons:
enabled:
dns # (core) CoreDNS
ha-cluster # (core) Configure high availability on the current node
hostpath-storage # (core) Storage class; allocates storage from host directory
ingress # (core) Ingress controller for external access
storage # (core) Alias to hostpath-storage add-on, deprecated
disabled:
community # (core) The community addons repository
dashboard # (core) The Kubernetes dashboard
gpu # (core) Automatic enablement of Nvidia CUDA
helm # (core) Helm 2 - the package manager for Kubernetes
helm3 # (core) Helm 3 - Kubernetes package manager
host-access # (core) Allow Pods connecting to Host services smoothly
mayastor # (core) OpenEBS MayaStor
metallb # (core) Loadbalancer for your Kubernetes cluster
metrics-server # (core) K8s Metrics Server for API access to service metrics
prometheus # (core) Prometheus operator for monitoring and logging
rbac # (core) Role-Based Access Control for authorisation
registry # (core) Private image registry exposed on localhost:32000

The second step is to Install Juju and bootstrap the controller. Juju is an automation tool which brings the model-driven approach to deploying applications across VMs and Kubernetes clusters. It supports deployments, integrations between applications and day-2 operations like backups, configuration changes and updates.

$ sudo snap install juju --classic
$ juju bootstrap microk8s micro
$ juju add-model kubeflow

You have a Kubernetes cluster ready and Juju controller bootstrapped, now it’s time to deploy the bundle. Bundle is a composable set of applications with defined interactions between them, called relationships. The key aspect of juju bundles is their composability. The bundle is adjusted for each environment and use case. We are now deploying the bundle with only essentials for edge deployment: Seldon Core and Istio. If we want to add authentication we can add additional applications like dex and oidc-gatekeeper and relate them with Istio. The same bundle can be reused between multiple edge sites to provide a consistent environment for edge operations.

$ juju deploy ./bundle.yaml

Wait for all applications to be in status idle and you are ready to deploy the Machine Learning model.

$ juju status
Model Controller Cloud/Region Version SLA Timestamp
kubeflow micro microk8s/localhost 2.9.42 unsupported 11:41:03Z

App Version Status Scale Charm Channel Rev Address Exposed Message
istio-gateway active 1 istio-gateway 1.11/stable 285 10.152.183.119 no
istio-pilot active 1 istio-pilot 1.11/stable 302 10.152.183.254 no
seldon-controller-manager res:oci-image@eb811b6 active 1 seldon-core 1.14/stable 92 10.152.183.133 no

Unit Workload Agent Address Ports Message
istio-gateway/0* active idle 10.1.121.233
istio-pilot/0* active idle 10.1.121.229
seldon-controller-manager/0* active idle 10.1.121.228 8080/TCP,4443/TCP

Check the services deployed in the “kubeflow” namespace and note down the port of the NodePort service. You will need it to build a URL to access the deployed models.

$ microk8s kubectl get svc -n kubeflow
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
modeloperator ClusterIP 10.152.183.248 <none> 17071/TCP 40h
istio-gateway ClusterIP 10.152.183.119 <none> 65535/TCP 39h
istio-gateway-endpoints ClusterIP None <none> <none> 39h
istio-pilot ClusterIP 10.152.183.254 <none> 65535/TCP 39h
istio-pilot-endpoints ClusterIP None <none> <none> 39h
istiod ClusterIP 10.152.183.156 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 39h
istio-ingressgateway-workload NodePort 10.152.183.183 <none> 80:31788/TCP,443:32475/TCP 39h
seldon-controller-manager-operator ClusterIP 10.152.183.179 <none> 30666/TCP 39h
seldon-controller-manager ClusterIP 10.152.183.133 <none> 8080/TCP,4443/TCP 39h
seldon-webhook-service ClusterIP 10.152.183.223 <none> 4443/TCP 39h

Deploy the Machine Learning model

We will use the Seldon Core to deploy our model at the edge. We can deploy the same model without using Seldon Core, but is Seldon Deployment worth the hustle?

Let’s look at a few features provided by Seldon, which make our life easy by doing a lot of heavy lifting for us.

Unification of API contracts

Seldon Core provides a layer of abstraction for the HTTP and gRPC APIs. This means that you can freely change underlying Machine Learning frameworks without redefining the API contract. This gives you the ability to experiment with different frameworks to get the best results and provide consistent API service to people consuming this endpoint.

Metrics endpoint

Seldon Core Standard and Custom using Prometheus

Seldon Core provides the Metric endpoint for each of its deployments. It supports basic metrics like the number of requests processed and custom user-defined metrics. For how to define custom metrics check here.

Seldon custom metrics is a great tool for defining fine-grained monitoring metrics and reusing the integration to visualize them using Prometheus and Grafana.

Integration with Istio

Seldon Core easily integrates with Istio to provide a network layer for model access. Istio is a key component to providing functionalities like A/B testing. Additionally for our hands-on, it will allow us to expose multiple models on the same NodePort without the need for a separated ingress controller like NGINX.

Istio allows us to integrate with oidc-gatekeeper and dex-auth components to provide authentication and integration with external LDAP and OIDC identity providers.

Multimodel deployments

Seldon Core Deployment with single and multiple models, transformers and combiners

Seldon Core is known as graph-based deployment. This means that in the Seldon Deployment, you can deploy not only a single model but a set of transformers, models and combiners to build advanced and self-contained multi-model deployments. All deployed models allow you to set integration with Jaeger or provide audibility with Open Search integration.

OpenAPI documentation for each endpoint

For each Seldon Deployment, there are 3 functions available. “Predict” to do inference on the model, “Feedback” to push the feedback or calculate metrics and “Doc” provides the documentation to your endpoint in OpenAPI/Swagger format.

Model API documentation in OpenAPI/Swagger format

You deploy models using Seldon Core by applying the yaml file with Seldon Deployment details. Next, let the CRDs do all the work.

$ microk8s kubectl apply -f - << END
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-deployment-example
spec:
name: sklearn-iris-deployment
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/sklearn-iris:0.3
imagePullPolicy: IfNotPresent
name: sklearn-iris-classifier
graph:
children: []
endpoint:
type: REST
name: sklearn-iris-classifier
type: MODEL
name: default
replicas: 1
END

The deployed model is not accessible from outside the cluster. We will reuse the Istio gateway to expose it with the prefix. Prefix allows us to expose multiple models using the same gateway. Each of the Virtual Services points to a different deployed model. Models are selected based on the DNS record of the Seldon Deployment service, this allows us to leverage the Blue/Green deployment mechanism of Kubernetes.

$ microk8s kubectl apply -f - << END
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: iris-server
namespace: default
spec:
gateways:
- kubeflow/kubeflow-gateway
hosts:
- '*'
http:
- match:
- uri:
prefix: /model/iris/
rewrite:
uri: /
route:
- destination:
host: seldon-deployment-example-default.default.svc.cluster.local
port:
number: 8000
END

Call the model

Before calling the model, we need to build the proper URL. We will build the URL in three steps. The first is to get the IP of the VM used as an edge site. The second is to check the port used for the Istio gateway service. In the bundle, we selected how to expose the service as NodePort. If you want to learn more about ways to expose the Kubernetes Application, check here.

Get the public IP for the EC2 instance.

Get the Istio Gateway port.

$ microk8s kubectl get svc istio-ingressgateway-workload -n kubeflow
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway-workload NodePort 10.152.183.183 <none> 80:31788/TCP,443:32475/TCP 26m

The last part is the prefix for the Istio Virtual Service we used for exposing the Machine Learning Model.

# URL: http(s)://<NodeIP>:<NodePort>/<IstioVirtualServerPrefix>/api/v0.1/predictions
$ curl -s http://3.249.66.169:31788/model/iris/api/v0.1/predictions \
-H "Content-Type: application/json" \
-d '{"data":{"ndarray":[[5.964,4.006,2.081,1.031]]}}'

{"data":{"names":["t:0","t:1","t:2"],"ndarray":[[0.9548873249364059,0.04505474761562512,5.7927447968953825e-05]]},"meta":{"requestPath":{"sklearn-iris-classifier":"seldonio/sklearn-iris:0.3"}}}

Voila!

Summary

We built a fully working edge site capable of hosting multiple models without the resource overhead for unnecessary functions like an experimentation environment. The deployed models are available to the outside world, all have a standard API structure and allow changes in the frameworks used to implement them without changes to the API.

We did this without a lot of resource overhead. We deploy services needed for the inference edge site while leaving the experimentation environment for central data centre deployment. The deployed models are available to the outside world, all have a standard API structure and allow changes in the frameworks used to implement them without changes to the API.

To host models at the edge, you need somewhere to train them. Below is a complete guide on setting MLOps platform on AWS EKS using open-source tools.

Here are some ideas to extend the edge site capabilities using the composability of the Juju bundle:

Keep on experimenting with open-source tools and share your results!

For more MLOps Hands-on guides, tutorials and code examples, follow me on Medium and contact me via social media.

--

--