Node Feature Discovery or: How I Stopped Worrying About Heterogeneity and Love Kubernetes

In this post, we (Balaji Subramaniam and Connor Doyle) describe node feature discovery, a software project that enables hardware feature and configuration discovery in Kubernetes container orchestration system. It is also the first Kubernetes incubator project. By reading this post, you will get a good sense of how we enhanced job placement in a heterogeneous Kubernetes cluster using node feature discovery project. Along the way, you will get an introduction to the Kubernetes incubator process. Happy reading!

Observe: heterogenous actors sharing a limited source (image)

What is Node Feature Discovery?

Node feature discovery is a software project which detects the available hardware features and configuration in a Kubernetes node and advertises them as labels.

Benefit from Node Feature Discovery to Existing Kubernetes Clusters

Node feature discovery detects hardware features from the following sources:

The discovered features are then advertised as node labels. These labels encode several pieces of information:

  • A “namespace” to denote the vendor/provider (e.g., node.alpha.intel.com).
  • The version of this discovery code that wrote the label, according to git describe — tags — dirty — always.
  • The source for each label (e.g. cpuid).
  • The name of the discovered feature as it appears in the underlying source, (e.g. AESNI from cpuid).

Following are some examples of published labels.

We developed the node feature discovery project to facilitate job placement in a heterogeneous cluster based on an already existing mechanism in Kubernetes. A node with a particular label can be targeted with the node affinity feature in Kubernetes.

Heterogeneous Machines and Performance in Data Centers

Data centers are likely to consist of machines with a variety of platforms and configuration. Heterogeneity arises in data centers when machines evolve over time and newer machines with different platforms are deployed with older machines. These machines are also configured differently. Even in relatively homogeneous environments, a cluster consisting of 12k nodes at Google has three different platform types (e.g., the processor package models from different micro-architectural generations and vendors) in ten configurations [1, 2].

In Kubernetes, machine resources are assigned to applications based on CPU and memory requests. As such, the heterogeneity in the machine platform and configuration is not taken into consideration. Ignoring this heterogeneity might lead to unpredictable application performance and in general less-performant applications.

Let’s consider an example feature, turbo boost: a hardware feature which allows dynamic overclocking of CPUs. Using this feature can result in a potential performance benefit. But naively using turbo boost can be detrimental to performance for some applications [3]. The ability to target nodes with or without turbo boost depending on the application can be useful in such scenarios.

By using node feature discovery, machines with specific platform and configuration can be targeted in a Kubernetes cluster.

Using Node Feature Discovery

Node feature discovery is set-up to deploy as a job in a Kubernetes cluster. You can use this script and the following Kubernetes job template to detect and advertise the hardware features in your Kubernetes cluster.

Job Template

After advertising the features, a node with a specific feature can be targeted using node selectors. The following example pod template shows how to target a node with turbo boost enabled.

Potential Performance Benefits from Node Feature Discovery

In order to show the potential performance benefit from the node feature discovery project, we ran an experiment on three identical Kubernetes nodes. Continuing our previous example, we wanted to demonstrate how to target nodes with turbo boost using node feature discovery. We intentionally disabled turbo boost in two of these nodes for demo purposes.

Our experiment involved running the same application ten times with and without node feature discovery. We use the Ferret benchmark from the PARSEC benchmark suite [4] as our application. The benchmark implements an image similarity search. It is expected to benefit from turbo boost as it is CPU intensive [5, 6].

Without node feature discovery, two-thirds of the application instances will run on nodes without turbo boost and as a result be less-performant. By using feature discovery, we are able to target the node with turbo boost and gain performance.

The figure below shows box plots that illustrates the variability in normalized execution time of running ten application instances with and without node feature discovery. The execution time of the runs are normalized to the best-performing run and the change in the normalized execution time is shown (0 represents the best performing run). With node feature discovery, under this experimental setup, we can see significant improvement in performance. Moreover, we also reduce the performance variability between different application instances.

While our example illustrates the benefits of using node feature discovery with turbo boost, it can be used to gain performance improvement and predictability for other applications by targeting nodes with other features and configurations in a heterogeneous Kubernetes cluster. For example, many scientific and machine learning applications can benefit from targeting nodes with AVX instruction set [7, 8] and many web services can take advantage of the AES-NI instruction set [9]. Moreover, complex user requirements can be expressed by targeting nodes with multiple features and a combination of configurations.

Contributing to Node Feature Discovery

We would love to accept contributions from the community. Please provide feedback and ideas on enhancement to the project by submitting an issue to the node feature discovery project. As the project is a part of Kubernetes incubator, you need to go through a few easy steps before we can accept your patches. For more details on how to contribute, see our description.

For starters, internally the sources of discovered features implement an interface. In the future, additional feature sources (e.g., network and storage features) can be added by implementing the same interface.

You can also get in touch with us on SIG-Node and SIG-Scheduling kubernetes Slack channels or by emailing to the dev, SIG-Node or SIG-Scheduling mailing lists. We will be happy to hear from you!

About Kubernetes Incubator

The Kubernetes incubator is a gateway to become a full Kubernetes community project. All new projects will enter Kubernetes community through the incubation process. Entering incubation requires that we write a proposal, find a champion, gain acceptance of special interest groups (SIGs) and finally get approval of a sponsor. For more details on the Kubernetes incubator process, see this description.

Acknowledgements

We thank Brandon Philips, Brian Grant, Sarah Novotny, David Oppenheimer, Dawn Chen, Derek Carr, Vishnu Kannan and Tim St. Clair for reviewing our proposal for this project, providing feedback and setting-up the incubation process.

References

[1] https://github.com/google/cluster-data

[2] http://www.pdl.cmu.edu/PDL-FTP/CloudComputing/googletrace-socc2012.pdf

[3] http://csl.stanford.edu/~christos/publications/2014.autoturbo.hpca.pdf

[4] http://parsec.cs.princeton.edu/

[5] http://parsec.cs.princeton.edu/publications/bienia08characterization.pdf

[6] http://parsec.cs.princeton.edu/publications/bienia08comparison.pdf

[7] https://software.intel.com/en-us/intel-mkl

[8] https://software.intel.com/en-us/blogs/daal

[9] https://software.intel.com/en-us/articles/intel-aes-ni-performance-enhancements-hytrust-datacontrol-case-study

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.