Seven Ways to Stub Your Toes on The Edge

Technical problems in managing applications in Kubernetes clusters in edge computing

Mike Spreitzer
8 min readMar 3, 2023

By Mike Spreitzer

Folks in a central control room facing seven challenges in managing their clusters at the edge.
Image by Brauio Dumba

I have been learning about edge computing, and started working on part of the problem with a focus on how to support edge computing with Kubernetes clusters. I think Kubernetes API machinery is a good foundation for creating reliable distributed systems, because it embodies state-based management and has a simple pattern for writing high-quality controllers. Kubernetes clusters are a popular way to host various workloads. The application of those in the edge computing milieu brings some challenges. Others have also come to the same conclusion and taken swipes at this passel of problems, usually by breaking out of the fundamental Kubernetes paradigms in one or more ways. I think we do not need so much to diverge from the fundamentals as to use them judiciously and with perhaps a few generalizations. The less divergent edge multi-cluster is, the more of the large existing ecosystem around Kubernetes can be employed. In this story I summarize what I see as key problems for an edge multi-cluster platform to solve. You can find broader summaries in many places, such as in the Kubernetes IOT Edge Working Group.

I should be clear about the scope addressed here. Edge computing has many aspects, and I am focusing on just part of the problem. One aspect is that there is typically a wide range of “machine” or “device” types involved. For practical reasons I am limiting my focus to systems where the smallest thing of concern can run a Kubernetes cluster; management of smaller things can be part of the job of the workload in those clusters. There are Kubernetes distributions that can run on a single machine of modest capacity.

A platform provider has to cope with the fact that the platform user generally decides on a physical hierarchy, determined by both technical considerations evident here and also many other considerations. The term “location” often gets used to describe the lowest sort of vertex in a geographical hierarchy. For example, this might be a retail store or an automobile. A location can hold one or more edge clusters. Within a location, communication is relatively good and stable. Failures may be relatively correlated. A location might not have fixed geocentric coordinates. There are generally Kubernetes clusters at higher layers in the hierarchy too.

Another important aspect is that information technology is often organized into layers of concern. A common division is between “infrastructure” and “workload”. Here we might expect the infrastructure layer to be responsible for creating or on-boarding the Kubernetes clusters, while the workload layer is more concerned with their configuration/workload. In this story I focus on the problems in the workload layer.

I should also be explicit about “policy” vs. “workload”. People often think about security/compliance policy separately from workload. And yet in the Kubernetes milieu, often policy is technically just like workload: it is defined by API objects, implemented by controllers, and output goes in API objects. Examples include kyverno and Open Cluster Management. An edge-multi-cluster platform should support this sort of workload along with the others.

The following sections briefly describe the key problems in supporting edge computing with Kubernetes clusters.

Resource-Constrained Clusters

An edge cluster may have to run on a relatively small machine. Distributions such as k3s, k0s, and MicroShift aim to be runnable on a single machine of modest capacity. For example, see my colleague Alexei’s work on Raspberry Pis. We would not want a platform intended to support edge multi-cluster to have higher resource requirements for the edge clusters than necessary.

Cluster-Scoped API Objects

In some Kubernetes based frameworks, a workload has to be something that exists within one Kubernetes API namespace. That is too restrictive for edge computing in general. Many workloads involve API objects that are scoped to the whole cluster rather than a namespace. For example, many workloads have custom resources and thus CustomResourceDefinition objects (which are cluster scoped). For another example, many workloads have their own controllers and cluster-scoped RBAC objects that authorize these controllers to work on certain kinds of object regardless of namespace.

General Hierarchy

I touched on this at the outset. Let us consider some examples. A telco operating a cellular network has — even ignoring the handsets — IT arrayed on a hierarchy with several layers and a large number of leaves covering a large geographic area. A retail or manufacturing enterprise may have a central engineering office and many shops/warehouses/plants at various places, perhaps with some regional level of organization.

Self-Sufficiency of Each Location

The connectivity between an edge location and the rest of the system may not be consistently good. It may be sometimes good and sometimes absent. It may be sometimes good and sometimes poor. It may always be mediocre. For this reason and/or other exogenous reasons, each edge location may need to be self-sufficient. That is, able to do its local job without communicating more broadly for a use-case-specific amount of time that might be quite long. It may also be required to keep non-local communication bandwidth requirements modest.

Complex Arrangement of Roles and Responsibilities

In general, edge computing is not DevOps nor even development + SRE. For example, consider that manufacturing enterprise. It may have a central engineering team with some supply chain — a terrible term for what is really a web of dependencies among independent organizations developing and selling and supporting potentially quite complex products and services. While the central engineering team may qualify and integrate products and services as well as develop their own, there may also be some level of management at each plant — installing and configuring for the local context, batching updates into local maintenance windows, and so on. Consider also the plight of the developers of Log4J; they have no idea of (let alone authority over) everywhere it is in use and can not apply updates whenever and wherever they are needed.

One-to-many Relationship Between Workload Description and Running Copies

In the ordinary Kubernetes setting, a user of a Kubernetes deals with one cluster that both (a) holds the API objects that define and report on the workload and (b) runs the one and only copy of that workload. In edge multi-cluster there are multiple edge clusters but the users want to describe their workloads just once and get independent copies instantiated in the right edge clusters. This affects desired state (“spec”, in Kubernetes jargon) and reported state (“status”) asymmetrically. While it may make sense to describe desired state just once, each reported state is inherently specific to the edge cluster from which it arises. A user wants to see at the root of the hierarchy a summary or distillation of the reported state from all the edge clusters, and the defined datatypes for Kubernetes API objects have status sections that are not right for this job. The sort of things you can do with aggregation in SQL are an example of desirable summarization. Another example would be a list, perhaps of a capped size, of edge clusters that are reporting a state that needs attention. Users would also like to get complete copies of the reported state into the center for convenient processing, however that is infeasible in some use cases (e.g., when the data volume or throughput is too high).

This one-to-many relationship may also affect desired state. In some use cases it is necessary to customize the desired state to each location. Another effect is that there may need to be imprecision in the specification. For example, consider a workload that has a Deployment object whose replica count is controlled (independently in each edge cluster) by horizontal pod autoscaling (HPA). The right number of replicas for each copy does not come from the center or customization, it comes from each local HPA.

This one-to-many relationship also introduces, in some use cases, the need to return information (e.g., summarized reported state) from associated objects. These are objects that do not appear in the central workload description but are created at each edge location as a consequence of running the workload there. One example would be the ReplicaSet and Pod objects created at each edge cluster that runs a Deployment object from the center. Another would be a compliance report object that is created at each edge cluster as a consequence of a compliance policy object from the center.

The fact that the relationship is one-to-many introduces the need for a new API, through which the users can say what goes where. In plain Kubernetes the where is implicit — it is the same cluster that holds the description of the what. This new API often goes by names like “placement” and “scheduling”. And it may include a timing component. The user may want to exert some temporal/spatial control over the rollout of a new workload or an update to an existing one. Supporting canary testing or a blue/green deployment pattern are examples of this. As another example, consider the cellular telco that wants to maintain coverage of a given area as an update is rolled out over the antennae covering that area.

Additionally, the removal of the implicit equation of workload description with workload hosting introduces the need for a place to put a workload description without that description modifying the behavior of that place. In a regular Kubernetes cluster, some of the objects that describe the workload modify the behavior of the API server that holds the workload description. Examples include: RBAC objects, which add authorizations; admission control plugins, which inject checks or modifiers for write operations; APIService objects, which can cause the server to delegate some requests to custom external servers; CustomResourceDefinition objects, which add kinds of objects that can be stored in the server; and namespaces, which scope some of the objects stored by the server. Some platforms solve this problem by introducing container types that can hold workload descriptions. This transmogrifies the problem: now the clients have to deal with something that does not act like a regular API server. To be sure, there are some established ways of doing this — e.g., Helm charts, GitOps — so it is desirable to play nice with those rather than re-invent them.

Large Number of Edge Clusters

Some edge computing use cases are small in scale, and some can be quite large. Consider the number of antennae of a cellular telco, or the number of cars that a given manufacturer has on the road.

When the number of edge clusters is very large, this makes many simple and obvious approaches to the above issues infeasible. For example, asking the users to say where a workload should go by providing a list of edge clusters is very user-unfriendly. Much more practical would be for the user to provide a predicate that distinguishes the desired from the undesired destinations. As a corollary, control over the rollout would need to use corresponding succinct expressions. Similarly, it would be very unfriendly to ask the users to maintain a bespoke customization recipe for each (workload, edge cluster) pair. Much more practical would be for the users to express the desired customization for a given workload via rules or patterns that reference named properties of the edge cluster.

A large number of edge clusters is often a motivation for having intermediate layers in the hierarchy and putting some computation there. Partial summarization is an example of such computation.

Onward!

I and others are starting to address the edge-multi-cluster problem. We began in the context of kcp because that provides some useful things to build on, and will be generalizing to be independent of kcp. We call the project kubestellar. It includes a github repo. Also see our introductory/overview blog post. My colleague Paolo wrote about how kcp’s “transparent multi-cluster” is related but does not solve the edge-mc problem. This is an open-source project. Come join us!

--

--

Mike Spreitzer

I work at IBM Research on distributed systems. In the previous millennium I worked at Xerox PARC on several topics. Views are my own. I do not play guitar.