Project Eirini: Choose the best fitting Container Orchestrator

Julian Skupnjak
9 min readMay 20, 2018

Recently, there have been lots of efforts to integrate CloudFoundry (CF) and Kubernetes (Kube) on various levels. They range from sharing a service catalog between the two to attempts to run CF on top of Kube. One effort that our team has been focussing on is to make the container management system (container orchestrator) swappable. At the CF summit in Boston we have been giving a successful presentation with a prototypical live demo where we pushed a CF application to a Kube cluster . If you missed the presentation, you can watch the recording here.
The project that has now been set off to make this a reality is called “Eirini” (formerly “Cube”), and is basically about adding the ability to CF to use Kube as container orchestrator. It does not mean that it will replace the current orchestrator Diego, but it will give the customer the option to choose between the orchestrator he wants to use. This brings up the question, where the differences between the orchestrators are and when to use Kube instead of Diego. In this blog post I want to address exactly these questions by giving the reader more insights to both orchestrators, Diego and Kube.

Container orchestrators can be different in several aspects: architecture, control plane components, scheduling algorithm, features, etc. Comparing Kube and Diego they don’t really differ from a high level perspective. The basis for both orchestrators is the same. Looking at the orchestrators in more detail there are differences and there are situations you might prefer the one over the other. But before exploring the differences, let’s see what the orchestrators have in common.

Architecture

The first similarity between Kubernetes and Diego (and many other orchestrators) is that both implement an monolithic scheduling architecture. The scheduler is a part of a container orchestrator and is responsible to assign containers to worker nodes. In an monolithic architecture a single scheduler and the resource manager live in the same process and all workloads are served by the same single scheduler. However, Kubernetes and Diego implement an advanced monolithic scheduling architecture. Unlike standard monolithic architectures, the state of the cluster is saved in a centralised store. Moreover, control plane components, such as the scheduler, are decoupled and live in separate processes. This approach is derived from another scheduling architecture called “shared-state”. A shared-state scheduler also has a centralised cluster-state, which is exposed to all control pane components. This allows the components to write to the store concurrently. If occasional conflicts occur, they are handled by optimistic concurrency control. In contrast to a shared-state architecture, however, the shared state in Kube’s and Diego’s advanced monolithic architecture can be accessed exclusively through a central API.

The reason for a common API is to achieve consistency and reduce complexity in the whole architecture. It averts individual components in the architecture to pick idiosyncratic APIs and conventions (such as file locations). The separation of the control pane components has the advantages that not every change is funneled through a single centralised master. This gives the ability to dynamically apply changes to the cluster state.

Algorithm

The groundlaying problem that container schedulers try to solve is the bin-packing problem. It describes the problem of packing a set of weighted/sized items (containers) into bins (workers) without exceeding the capacity of the bins (worker capacity), such that the total number of bins is a minimum. There are many variants of this problem. For instance, in the basic bin-packing problem the number of bins is infinite. Whenever existing bins run out of capacity the algorithm is able to open a new bin. However, for container schedulers the number of workers is usually fixed.

Algorithms that solve the bin-packing problem are bin-packing algorithms or, how I call them, “X-Fit” algorithms. “X-Fit” as there are different types of packing algorithms including best-fit, next-fit, worst-fit, etc. The type of the algorithm specifies the rule on how algorithm makes decissions of selecting the bin to schedule the workload to. This is where Kube and Diego has their next similarity. As basis, both are using a Worst-Fit fashioned algorithm to assign containers to a specific worker on a cluster. With a Worst-Fit algorithm, containers are scheduled on workers with least workload (or most available/free resources). Formally described the algorithm works something like this:

  • Find the bin (worker) with the most remaining capacity, which accommodates the object (container) to be placed
  • Place the object (container) to this bin (worker)

Note that the rule of a worst-fit algorithm has the effect of spreading workload through a cluster. Here is a simple example where three apps with different instance counts are scheduled over three bins:

The Worst-Fit algorithm can only be considered as the basis of a more complex algorithm. The truth is, that both schedulers apply further rules for the selection of a worker. Two general rules are:

  • balance the workload of the cluster by prioritising workers which help to balance out resource usage.
  • Spread multiple instances of the same app across the cluster, such that the number instances of the same app on a single worker is a minimum.

This rules ensure reliability of the apps running on the cluster.

Features

Kube and Diego have one feature in common: isolation segments (Diego) or node selectors (Kube). This allows the user to place deployments on specific hosts. The idea is that hosts are marked with a specific tag or label and workloads matching the tag or label are placed on these hosts. In Kubernetes, nodes (workers) and workloads can be marked with one or more labels. The scheduling algorithm assigns the workload on nodes with regards to the matching labels. In Diego, however, a admin of a CF deployment prepares the deployment by adding placement tags to cells (workers) and creates the isolation segments based on these tags. The isolation segments can then be associated with one or more CF organisations or spaces and all workloads coming from organisations or spaces associated with the isolation segment are placed on the tagged cells. Kube’s approach allows the developer more freedom and granularity in scheduling workloads to the cluster and at this point the two schedulers start to differ.

Why to use Kube as backend for CF?

Now we have more insights of both schedulers and they really have many similarities. So why to swap the schedulers at all?

Cloud Foundry is an excellent PaaS with benefits including auto-scaling, application portability, health management, and many others. It is an optimised platform for fast development and deployment, and application build on top of CF should attempt to follow the twelve-factor-app standards. For example CF allows HTTP requests only on port 80 and 443 and it is moreover not designed to serve stateful applications.

Where CF is opinionated — but not in a bad sense — Kubernetes is not. It runs basically everything that can be packaged into a container. This can be especially interesting to customers that want to push applications to the cloud that are not cutting-edge 12-factor-apps.

Based on this groundlaying difference, wether you have a running CF, or a running Kube, or a running CF and a running Kube, there are more reasons to use CF and Kube together and get a best of both worlds.

Advanced Scheduling Features: Kube and Diego do good basic decisions on how and where to place workloads across a cluster. But in specific scenarios a user may want to deploy his workloads different and have more control over the cluster:

  • isolated,
  • co-located (eg when two services frequently interact which each other),
  • on hosts with specific hardware (eg hosts with ssd),
  • and more...

At the end of the day, the customer knows best how and where to schedule his workloads.

The Kube-Scheduler is a policy-rich scheduler with many advanced scheduling features that satisfies such requirements. I already mentioned the Node Selector feature, where you can schedule workloads (in Kube usually pods) to specific workers (nodes). Kubernetes also provides a generalised version of Node Selectors: Affinity/Anti-Affinity (for simplicity reasons I will call them just “Affinity”). It generally extends the types of constrains you can express when scheduling pods. It has two different types of Affinities:

  • Node Affinity: Constrain which nodes a pod can run on, based on the node labels.
  • Pod Affinity: Constrain which nodes a pod can run on, based on the pod labels that are running on a node (this allows the user for example to co-locate workloads).

An additional, related concept or feature in Kubernetes is Taints and Tolerations. Affinity constrains attract a pod to one or a set of nodes, whereas Taints and Tolerations do the opposite and repel one or a set of pods. It works similar to labels, but here nodes are “tainted” and repel pods that do not “tolerate” the “taints”.

This is not everything. On top of all these features, Kubernetes gives the user the possibility to bring his own scheduler. All this features give the user lots of scheduling control. With Eirini these features will (or could) be enabled on CF. However, if you don’t require such advanced scheduling features, and you basically don’t care where your workload ends up, you will be fine with Diego as backend. It also does a great job on scheduling workloads.

Reduced Foodprint: If you have both systems running, CF and Kube you have redundant components running: (control plane + worker nodes)*2. In such scenarios it is an obvious reason to use both systems together and to reduce the footprint of CF by removing the existing Diego backend and replace it with your running Kubernetes.

User Experience (UX): Now lets think of the other way, not CF → Kube, but CF ← Kube. Kubernetes is a great, but complex container orchestrator. The UX compared to CF, however, is simply not good. You need to write code, wrap it into a Docker image, push it to some Docker registry. This process so far requires knowledge about docker. Now as a app is ready to be deployed it requires knowledge about Kube and the kubectl (the command line interface of Kubernetes). Anyways, with CF users push code instead of containers and, thus, makes a developer life much easier. No knowledge about containers, docker, registries required. You just develop your app and run cf push my-app . That’s it! Your app is running. With Eirini, CF can be added on top of Kubernetes as a UX layer.

Conclusion

Even if it requires to maintain two system when using CF and Kube together, the benefits are impressive and there is no reason to consider the systems as separate ones. It adds customer value in both directions, weather you use CF with Kubernetes or the other way around.

Project Eirini is currently under active development and is now officially a CF incubator project. We already have an end to end solution where you can cf push an app to a Kube backend (as already mentioned you can see a demonstration on YT), but lots of more features and update will come in the following months.

If you liked the read and you are now more interested in project Eirini or other Cloud related topics, say hello on twitter :)

Thanks for reading!

Further Reading

I can recommend to read through Simon Moser’s blog post “Cloud Foundry and Kubernetes: A brief history of CF/K8s”. Furthermore, there were a lot of discussions around CF and Kube at the CF Summit in Boston. An really interesting panel discussion about this topic including representatives of SUSE, SAP, Google, Microsoft, Pivotal, and IBM can be watched here. Another two interesting blog posts are this one from techcrunch.com and one from Daniel Jones who summarised his thoughts from CF Summit in Boston and KubeCon in coppenhagen.

--

--

Julian Skupnjak

Software Engineer @ HashiCorp. ex-IBMer. Ex-EIRINIer. Gopher. Musician.