Taming the Edges — Part 2

Published in

AI+ Enterprise Engineering

12 min readDec 17, 2020

Enjoying the natural “Edges” and the scenic view down to the river Elbe in Saxon Switzerland (Germany)

How to extend AI towards EI (Edge Intelligence) by combining an Open Source based edge application manager, data virtualization technologies and some clever edge analytics capabilities from IBM.

In part 1 of my “Taming the Edges” article series I covered the “how to” of the operational aspects of edge computing. Now let’s have a look at one of the currently most popular edge computing uses cases: AI on the edge, or “EI” (Edge Intelligence) how I like to name it.

Before I continue, let me quickly clarify the term ‘edge node’ which I am using throughout my article. An edge node is a compute unit which typically resides outside a datacenter or a cloud, very close to the source of the data which are being analyzed. Depending on the use case it can be a very small system with limited compute power, storage and memory resources, often called an edge device or an edge gateway. But it also could be a much more powerful unit which is capable to run e.g., a Kubernetes cluster. Such edge nodes are typically named ‘edge cluster’ or ‘edge clouds’.

From AI to EI

EI becomes especially very handy if you need to analyze high volume data with low latency requirements in non-continuously connected network environments. Typical use cases for such data could be e.g., high frequency IoT time-series based sensor data or quite popular these days analyzing data from video or audio streams.

Imagine you would like to create an application to provide your construction workers with an improved safe work environment. One aspect of such an application might be to utilize some video streaming-based obstacle detection in a worker’s safety zone in combination with some video-based worker protection gear enforcement (e.g., are hard hats and safety vests correctly worn while working?).

Such an application in a non-Edge Intelligence environment might require high volume of video data to be sent into either a public or private cloud for inferencing which in return can lead to high response times (high latency).

A higher response time could make a difference for your worker: either he/she gets an obstacle warning just in time to prevent an accident or he/she will be hit and injured by that obstacle. Another serious issue could arise if the network connectivity between the monitoring camera(s) and the cloud-based application is not reliable or even completely failing. Such problems wouldn’t be surprising in construction environments.

Now imagine if you might have an edge node hardware which has enough compute power to do the required ML model inferencing on the video streaming data w/o the need to send those data streams into a cloud for processing?

Luckily such edge node hardware has made dramatic improvements over the last few years, compute power and price/performance wise. These days it’s even not uncommon to get access to GPU powered edge nodes for less than $60!

Assuming that you will be now able to run your AI app locally on your edge node and your construction workers will be happy since through that app the accident rate dropped dramatically, even on remote construction sites with less than stellar network connectivity, how will you be able to continuously improve and update the ML model on your edge node(s)?

Even if your edge node will be powerful enough to do the local ML model inferencing, it might not be powerful enough to compute an updated model locally.

So, let’s have a look at different strategies on how to potentially solve that requirement for improving an ML model:

In addition of processing the raw data locally on the edge node, you could also send the same data in parallel or asynchronous to the central cloud for ML model optimizations. That approach would still provide you the required low latency response times on-site but would require the high-volume data transfers into the cloud you likely wanted to avoid in the first place.
You could implement some means of providing feedback on how the ML model scored locally on the edge node, store that feedback locally and send it back to the cloud. That way one would reduce the traffic back to the cloud. Through that approach the solution would benefit from the consolidated feedback data from all edge nodes and would also benefit from the ML compute power in the cloud.

Depending on the local data caching and storage capabilities on your edge nodes you could decide to keep the raw data (unlikely) or some aggregated raw data (more likely) on your edge node . In combination with some clever data federation or data virtualization techniques, your cloud-based ML optimization algorithms could then access the necessary data across all edge nodes on demand. If your data virtualization technology would be able to apply some mesh network principles across neighboring edge node clusters to pre-aggregate the raw data locally, then this approach would also dramatically reduce the network traffic back into the cloud. IBM’s Data Virtualization technology would be an example for such an approach.
Digging further into some mesh networking principles, it might be a very interesting approach to not just combine neighboring edge nodes into some data pre-aggregation clusters, but also to combine them into some ML model compute clusters if their CPU and GPU utilization is not maxed out at a given time. That approach would allow the re-optimization of ML models w/o the requirement of being continuously connected to a central cloud. The mesh cluster optimized ML models could be still sent back to the supporting cloud and either merged with other ML models from other mesh clusters or simply re-distributed across the other edge nodes.

Since we have now discussed some approaches on how to improve an EI ML model, let’s look at how to update an ML on your edge nodes while trying to minimize any downtime of your EI app on those nodes.

Updating ML models on an edge node

Assuming that your EI apps have been containerized and assuming that you are using an edge application orchestrator like IBM’s Edge Application Manager to deploy and to maintain those apps, how about keeping your ML models on your edge nodes up to date?

The worst-case scenario would be if your ML models are tightly coupled with your containerized app in a way that each modification of your ML model would require a rebuild of your app container. Imagine that if your app code is considerably larger than your ML model that you need to refresh unnecessarily your complete app/ML model container on all edge nodes affected.

A slightly better approach could be to separate your ML models from your app(s) and encapsulate them into their own containers. That in return would mean that you need to rebuild your ML container(s) each time you’ll have an updated ML model and you then need to redeploy those updated containers like your app container via your edge application manager. You are still facing a certain overhead since you need to build a container around your ML model.

What if you could treat you ML models as files (or a set of files) which you could simply deploy to your edge nodes w/o any additional container overhead? That approach has been implemented by Linux Foundation Edge’s Open Horizon edge manager and its commercial sibling, the IBM Edge Application Manager (IEAM).

In the following section I am using the reference to the Open Horizon components as a synonym to the equivalent IEAM components.

So how does that work?

The Open Horizon edge manager and the related Open Horizon edge agent have implemented a so-called model management system (MMS). Three key components make up that MMS:

The Cloud Sync Service (CSS) resides on the Open Horizon edge manager hub and uses a mongoDB database to store objects (e.g., ML models). It also maintains the status of all edge nodes.
The Edge Sync Service (ESS) is part of the Open Horizon edge agent. It provides two major functions: to poll the CSS for any updates on any objects which are relevant for the edge node on which the agent runs on and to provide a local REST API so that any edge application on that node can easily interact with the ESS.
Objects which are described by their metadata and their content are then used to exchange the actual data/files between the management hub and the edge nodes.

Since those objects can be anything not just ML models in Open Horizon maybe the MMS should be sometimes renamed into Object Management System (OMS)? 😉

Anyway, let’s have a brief look on how to create, publish and consume an object by utilizing Open Horizon’s and IEAM’s MMS…

Step 1: Create a metadata template JSON document for your object by executing the Open Horizon’s agent CLI command

hzn mms object new > myobject_meta.json

Here is how such a metadata JSON template looks like:

{"objectID": "",   /* Required: A unique identifier of the object. */"objectType": "", /* Required: The type of the object. */"destinationOrgID": "$HZN_ORG_ID", /* Required: The organization ID 
                                   of the object (an object belongs 
                                   to exactly one organization). */"destinationID": "",  /* The node id (without org prefix) where the 
                      object should be placed. */
                      /* If omitted the object is sent to all nodes 
                      with the same destinationType. */
                      /* Delete this field when you are using 
                      destinationPolicy. */"destinationType": "", /* The pattern in use by nodes that should 
                       receive this object. */
                       /* If omitted (and if destinationsList is 
                       omitted too) the object is broadcast to all 
                       known nodes. */
                       /* Delete this field when you are using 
                       policy. */"destinationsList": null,  /* The list of destinations as an array 
                           of pattern:nodeId pairs that should 
                           receive this object. */
                           /* If provided, destinationType and 
                           destinationID must be omitted. */
                           /* Delete this field when you are using 
                           policy. */"destinationPolicy": {  /* The policy specification that should be 
                        used to distribute this object. */
                        /* Delete these fields if the target node is 
                        using a pattern. */"properties": [        /* A list of policy properties that 
                        describe the object. */
    { 
      "name": "", 
      "value": null,
      "type": ""       /* Valid types are string, bool, int, float, 
                       list of string (comma separated), version. */
                       /* Type can be omitted if the type is 
                       discernable from the value, e.g. unquoted 
                       true is boolean. */
    }
  ],"constraints": [       /* A list of constraint expressions of the 
                       form <property name> <operator> 
                       <property value>, separated by boolean 
                       operators AND (&&) or OR (||). */
     ""
  ],"services": [        /* The service(s) that will use this object. */
    {
      "orgID": "",         /* The org of the service. */
      "serviceName": "",   /* The name of the service. */
      "arch": "",          /* Set to '*' to indcate services of any     
                              hardware architecture. */
      "version": ""        /* A version range. */
    }
  ]
},"expiration": "",     /* A timestamp/date indicating when the 
                      object expires (it is automatically deleted). 
                      The timestamp should be provided in RFC3339 
                      format. */"version": "",        /* Arbitrary string value. The value is not 
                      semantically interpreted. The Model Management 
                      System does not keep multiple version of an 
                      object. */"description": "",    /* An arbitrary description. */"activationTime": ""  /* A timestamp/date as to when this object 
                      should automatically be activated. The     
                      timestamp should be provided in RFC3339 
                      format. */
}

Let’s have a look at a simplified example to make it easier for the reader to follow.
I am borrowing my example from the Open Horizon demo ‘helloMMS’ edge service.

In that demo the metadata JSON file looks like:

{
    "objectID": "config.json",
    "objectType": "$HZN_DEVICE_ID.hello-mms",
    "destinationOrgID": "$HZN_ORG_ID",
    "destinationID": "$HZN_DEVICE_ID",
    "destinationType": "pattern-${SERVICE_NAME}-$ARCH"
}

Step 2: Here is the content of the actual object (file) config.json which should be eventually synchronized with the edge node:

{
    "HW_WHO": "IBM Cloud Engagement Hub"
}

Step 3: Before we send our config.json file to the edge node, let’s make sure that the ibm.hello-mms service runs on that node by executing the following commands on your edge node:

hzn register -p IBM/pattern ibm.hello-mms-$(hzn architecture) -s ibm.hello-mms — serviceorg IBM

Step 4: Now let’s publish the config.json file to the MMS so that the ibm.hello-mms service can eventually access it through the Open Horizon agent’s Edge Sync Service (make sure that the environment variables $HZN_DEVICE_ID and $HZN_ORG_ID have been correctly set):

hzn mms object publish -m object.json -f config.json

Eventually your helloMMS service will pick up the new config.json file through the ESS and will write a different message into the service’s message log.

In my YouTube video below I am using a slightly modified version of the helloMMS demo to demonstrate how easy it is to send updated objects to a running edge application through the IEAM’s MMS.

Simple demo of the IBM Edge Application Manager’s MMS with a Raspberry Pi Zero as an edge device

Step 5 (optional): To list the published MMS object, one could execute the following Open Horizon agent CLI command:

hzn mms object list -t $HZN_DEVICE_ID.hello-mms -i config.json -d

Pulling everything together using IBM Cloud Pak for Data’s edge analytics capabilities

In the first section of this article, I introduced the concept of moving AI to the edges and discussed some approaches on how to do it.

The second part covered one technical solution for updating ML models on the edge in more detail by utilizing LF Edge Open Horizon’s model management system.

In the third and last part of my article I am introducing one way of pulling everything together so that one can create a seamless cloud-to-edge AI development and deployment environment which can be implemented on any cloud.

As an add on to the edge application management environment provided by the Open Horizon technology, I would like to introduce IBM’s Cloud Pak for Data edge analytic capabilities which come with all the necessary containerized edge services which will allow to easily execute ML models on edge nodes which have been computed in the cloud. To complement the EI environment on an edge node, that Cloud Pak for Data “Edge Analytics” components also provide optional streaming edge analytics and an optional local multi model edge database.

In addition to the integrated and containerized edge analytics, the Cloud Pak also comes with some shared edge services, a communication layer and APIs on the edge node provided by the ‘edge bedrock’ layer.

The edge analytics components mentioned above, although containerized and with lots of functionality don’t need any powerful Kubernetes cluster on the edge to operate. They have been designed to work at minimum on small edge devices with only a Docker runtime on low powered ARM or Intel CPUs with a limited amount of memory and storage. For sure they will also scale up to more powerful edge clouds if required.

The Cloud Pak for Data counterpart in the cloud support the data virtualization capabilities for data stored locally on an edge node which I mentioned in the first section of my article.

Summary and a very demanding EI use case to watch out for

Today’s edge computing technologies provide a very promising foundation to simplify AI on the edge: starting with low-cost edge node hardware with on-board GPU support which provides powerful enough local ML inferencing acceleration, a freedom of choice for the data placement and data analytics (far edge, near edge and/or cloud) plus some smart open source based edge application and ML model management solutions.

Let me conclude with a very interesting, but also a very demanding EI use case, the Mayflower Autonomous Ship.

The historic Mayflower and the Mayflower Autonmous Ship

The Mayflower, named after the historic ship which transported a group of English pilgrims in 1620 to the new world, is a scientific research vessel which will set sail in spring 2021 to cross the Atlantic w/o any human intervention. The Mayflower will be controlled by a so called ‘AI Captain’. That AI captain is basically built on top of a set of autonomous edge nodes which will control the ship’s engines, navigation systems, energy supply and which continuously will monitor the ship’s surroundings for any obstacles (e.g., other ships, debris, whales etc.). The edge node’s containerized apps and the ML models on the ship will be managed and supervised by IBM’s open source based Edge Application Manager.

Since the network connectivity (via satellite) will typically quite limited while on sea, the edge nodes will be operating primarily in a full autonomous mode.

If you (like me) might have recently followed the Vendée Globe, a single handed yacht race around the world, then you can definitely imagine what kind of potential challenges the Mayflower might face during her Atlantic crossing in April 2021. Just thinking about the term ‘single handed’, does an AI powered ship actually also falls into such a solo category? 🤔

So, what will be Your next innovative EI use case?

Taming the Edges — Part 2

From AI to EI

Updating ML models on an edge node

Pulling everything together using IBM Cloud Pak for Data’s edge analytics capabilities

Summary and a very demanding EI use case to watch out for

Written by Alexander Koerner