Deploy ML models at the edge with Microk8s, Seldon and Istio

Published in

Ubuntu AI

9 min readMar 24, 2023

Edge computing is defined as solutions that move data processing at or near the point of data generation. This means that the Machine Learning model inference results can be delivered to customers faster and create a real-time inference feeling. This is a perfect place for your models.

Looking at Gartner’s prediction: “Around 10% of enterprise-generated data is created and processed outside a traditional centralized data centre or cloud. By 2025, Gartner predicts this figure will reach 75%”. To summarize, they expect the majority of data processing to be performed at the edge. The prediction figure is so high because of the increasing amount of data produced — it is not possible for all of it to be sent to the central data centre for processing.

AI model deployments is popular for video processing at the Edge — Photo by Alan J. Hendry on Unsplash

Additionally, there is an increasing number of use cases that require Machine Learning model deployment in micro-data centres or edge devices. Just to name a few:

Cameras with video processing using GPU-accelerated hardware
Customer-facing shopping assistants with chatbots and large language models
MTL implemented in the manufacturing plant using 5G networks and MEC

Despite its benefits and rising popularity, many risks are involved in hosting ML models at the edge. One of the major ones is security. In a recent article, you can learn more about mitigating security risks for Machine Learning models.

ML models security — from MLOps to inference

Security is an important part of any computer system. IT security is a well-described and known field. Red teams are…

medium.com

The full-blown MLOps stack requires a lot of resources and cannot fit into the edge site, right? Not quite, this blog post will show you how to enable the key capabilities required for edge sites like serving models without the need to deploy the full MLOps stack.

Let’s build the Edge Site!

For this hands-on tutorial, we will be using AWS EC2 to provide a VM for us. Let’s create a t2.medium instance with public IP. This edge site requires around 2GB of memory and is ready to host multiple models. The more models you host, the more memory and CPU will be required. We will use Ubuntu 22.04 LTS as the OS and install most of the required tools using snaps. Snaps work based on channels and provides automated updates inside channels. This way, your environment will be always up-to-date. Handy, right?

Also, remember to take care of the security groups because you will need access to the VM using port 22 for SSH and ports 30000–32767 for NodePorts. You should add them as ingress rules to the security group and attach this security group to the instance.

All codes are available in the repository below.

kubeflow-examples/seldon-edge at main · canonical/kubeflow-examples

Create t2.medium instance with public IP.

github.com

Prepare the environment

The first step is to install MicroK8s, which is a lightweight Kubernetes cluster. Thanks to its small resource requirements, MicroK8s is ideal for edge deployments. MicroK8s conveniently extend basic clusters with a storage class based on a local drive, GPU support or private registry. A full list of add-ons is available here: https://microk8s.io/docs/addons.

$ sudo snap install microk8s - channel 1.24/stable - classic
$ sudo usermod -a -G microk8s ubuntu
$ mkdir -p ~/.kube
$ sudo chown -f -R ubuntu ~/.kube
$ newgrp microk8s
$ microk8s enable hostpath-storage dns ingress

Before going to the next step, check if all expected plugins are enabled and Pods are Running. Here is the expected result of MicroK8s status.

$ microk8s status
microk8s is running
high-availability: no
  datastore master nodes: 127.0.0.1:19001
  datastore standby nodes: none
addons:
  enabled:
    dns                  # (core) CoreDNS
    ha-cluster           # (core) Configure high availability on the current node
    hostpath-storage     # (core) Storage class; allocates storage from host directory
    ingress              # (core) Ingress controller for external access
    storage              # (core) Alias to hostpath-storage add-on, deprecated
  disabled:
    community            # (core) The community addons repository
    dashboard            # (core) The Kubernetes dashboard
    gpu                  # (core) Automatic enablement of Nvidia CUDA
    helm                 # (core) Helm 2 - the package manager for Kubernetes
    helm3                # (core) Helm 3 - Kubernetes package manager
    host-access          # (core) Allow Pods connecting to Host services smoothly
    mayastor             # (core) OpenEBS MayaStor
    metallb              # (core) Loadbalancer for your Kubernetes cluster
    metrics-server       # (core) K8s Metrics Server for API access to service metrics
    prometheus           # (core) Prometheus operator for monitoring and logging
    rbac                 # (core) Role-Based Access Control for authorisation
    registry             # (core) Private image registry exposed on localhost:32000

The second step is to Install Juju and bootstrap the controller. Juju is an automation tool which brings the model-driven approach to deploying applications across VMs and Kubernetes clusters. It supports deployments, integrations between applications and day-2 operations like backups, configuration changes and updates.

$ sudo snap install juju --classic
$ juju bootstrap microk8s micro
$ juju add-model kubeflow

You have a Kubernetes cluster ready and Juju controller bootstrapped, now it’s time to deploy the bundle. Bundle is a composable set of applications with defined interactions between them, called relationships. The key aspect of juju bundles is their composability. The bundle is adjusted for each environment and use case. We are now deploying the bundle with only essentials for edge deployment: Seldon Core and Istio. If we want to add authentication we can add additional applications like dex and oidc-gatekeeper and relate them with Istio. The same bundle can be reused between multiple edge sites to provide a consistent environment for edge operations.

$ juju deploy ./bundle.yaml

Wait for all applications to be in status idle and you are ready to deploy the Machine Learning model.

$ juju status
Model     Controller  Cloud/Region        Version  SLA          Timestamp
kubeflow  micro       microk8s/localhost  2.9.42   unsupported  11:41:03Z

App                        Version                Status  Scale  Charm          Channel      Rev  Address         Exposed  Message
istio-gateway                                     active      1  istio-gateway  1.11/stable  285  10.152.183.119  no       
istio-pilot                                       active      1  istio-pilot    1.11/stable  302  10.152.183.254  no       
seldon-controller-manager  res:oci-image@eb811b6  active      1  seldon-core    1.14/stable   92  10.152.183.133  no       

Unit                          Workload  Agent  Address       Ports              Message
istio-gateway/0*              active    idle   10.1.121.233                     
istio-pilot/0*                active    idle   10.1.121.229                     
seldon-controller-manager/0*  active    idle   10.1.121.228  8080/TCP,4443/TCP

Check the services deployed in the “kubeflow” namespace and note down the port of the NodePort service. You will need it to build a URL to access the deployed models.

$ microk8s kubectl get svc -n kubeflow
NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                 AGE
modeloperator                        ClusterIP   10.152.183.248   <none>        17071/TCP                               40h
istio-gateway                        ClusterIP   10.152.183.119   <none>        65535/TCP                               39h
istio-gateway-endpoints              ClusterIP   None             <none>        <none>                                  39h
istio-pilot                          ClusterIP   10.152.183.254   <none>        65535/TCP                               39h
istio-pilot-endpoints                ClusterIP   None             <none>        <none>                                  39h
istiod                               ClusterIP   10.152.183.156   <none>        15010/TCP,15012/TCP,443/TCP,15014/TCP   39h
istio-ingressgateway-workload        NodePort    10.152.183.183   <none>        80:31788/TCP,443:32475/TCP              39h
seldon-controller-manager-operator   ClusterIP   10.152.183.179   <none>        30666/TCP                               39h
seldon-controller-manager            ClusterIP   10.152.183.133   <none>        8080/TCP,4443/TCP                       39h
seldon-webhook-service               ClusterIP   10.152.183.223   <none>        4443/TCP                                39h

Deploy the Machine Learning model

We will use the Seldon Core to deploy our model at the edge. We can deploy the same model without using Seldon Core, but is Seldon Deployment worth the hustle?

Let’s look at a few features provided by Seldon, which make our life easy by doing a lot of heavy lifting for us.

Unification of API contracts

Seldon Core provides a layer of abstraction for the HTTP and gRPC APIs. This means that you can freely change underlying Machine Learning frameworks without redefining the API contract. This gives you the ability to experiment with different frameworks to get the best results and provide consistent API service to people consuming this endpoint.

Metrics endpoint

Seldon Core Standard and Custom using Prometheus

Seldon Core provides the Metric endpoint for each of its deployments. It supports basic metrics like the number of requests processed and custom user-defined metrics. For how to define custom metrics check here.

Seldon custom metrics is a great tool for defining fine-grained monitoring metrics and reusing the integration to visualize them using Prometheus and Grafana.

Integration with Istio

Seldon Core easily integrates with Istio to provide a network layer for model access. Istio is a key component to providing functionalities like A/B testing. Additionally for our hands-on, it will allow us to expose multiple models on the same NodePort without the need for a separated ingress controller like NGINX.

Istio allows us to integrate with oidc-gatekeeper and dex-auth components to provide authentication and integration with external LDAP and OIDC identity providers.

Multimodel deployments

Seldon Core Deployment with single and multiple models, transformers and combiners

Seldon Core is known as graph-based deployment. This means that in the Seldon Deployment, you can deploy not only a single model but a set of transformers, models and combiners to build advanced and self-contained multi-model deployments. All deployed models allow you to set integration with Jaeger or provide audibility with Open Search integration.

OpenAPI documentation for each endpoint

For each Seldon Deployment, there are 3 functions available. “Predict” to do inference on the model, “Feedback” to push the feedback or calculate metrics and “Doc” provides the documentation to your endpoint in OpenAPI/Swagger format.

Model API documentation in OpenAPI/Swagger format

You deploy models using Seldon Core by applying the yaml file with Seldon Deployment details. Next, let the CRDs do all the work.

$ microk8s kubectl apply -f - << END
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: seldon-deployment-example
spec:
  name: sklearn-iris-deployment
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/sklearn-iris:0.3
          imagePullPolicy: IfNotPresent
          name: sklearn-iris-classifier
    graph:
      children: []
      endpoint:
        type: REST
      name: sklearn-iris-classifier
      type: MODEL
    name: default
    replicas: 1
END

The deployed model is not accessible from outside the cluster. We will reuse the Istio gateway to expose it with the prefix. Prefix allows us to expose multiple models using the same gateway. Each of the Virtual Services points to a different deployed model. Models are selected based on the DNS record of the Seldon Deployment service, this allows us to leverage the Blue/Green deployment mechanism of Kubernetes.

$ microk8s kubectl apply -f - << END
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: iris-server
  namespace: default
spec:
  gateways:
    - kubeflow/kubeflow-gateway
  hosts:
    - '*'
  http:
    - match:
        - uri:
            prefix: /model/iris/
      rewrite:
        uri: /
      route:
        - destination:
            host: seldon-deployment-example-default.default.svc.cluster.local
            port:
              number: 8000
END

Call the model

Before calling the model, we need to build the proper URL. We will build the URL in three steps. The first is to get the IP of the VM used as an edge site. The second is to check the port used for the Istio gateway service. In the bundle, we selected how to expose the service as NodePort. If you want to learn more about ways to expose the Kubernetes Application, check here.

Get the public IP for the EC2 instance.

Get the Istio Gateway port.

$ microk8s kubectl get svc istio-ingressgateway-workload -n kubeflow
NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                 AGE
istio-ingressgateway-workload        NodePort    10.152.183.183   <none>        80:31788/TCP,443:32475/TCP              26m

The last part is the prefix for the Istio Virtual Service we used for exposing the Machine Learning Model.

# URL: http(s)://<NodeIP>:<NodePort>/<IstioVirtualServerPrefix>/api/v0.1/predictions
$ curl  -s http://3.249.66.169:31788/model/iris/api/v0.1/predictions  \
  -H "Content-Type: application/json"  \
  -d '{"data":{"ndarray":[[5.964,4.006,2.081,1.031]]}}'

{"data":{"names":["t:0","t:1","t:2"],"ndarray":[[0.9548873249364059,0.04505474761562512,5.7927447968953825e-05]]},"meta":{"requestPath":{"sklearn-iris-classifier":"seldonio/sklearn-iris:0.3"}}}

Voila!

Summary

We built a fully working edge site capable of hosting multiple models without the resource overhead for unnecessary functions like an experimentation environment. The deployed models are available to the outside world, all have a standard API structure and allow changes in the frameworks used to implement them without changes to the API.

We did this without a lot of resource overhead. We deploy services needed for the inference edge site while leaving the experimentation environment for central data centre deployment. The deployed models are available to the outside world, all have a standard API structure and allow changes in the frameworks used to implement them without changes to the API.

To host models at the edge, you need somewhere to train them. Below is a complete guide on setting MLOps platform on AWS EKS using open-source tools.

This is how you set up an MLOps platform on AWS EKS with Kubeflow and MLflow

Give your data science team the MLOps tools they need and empower them to solve business problems.