Kubernetes on ARM: a case study

Published in

KrakenSystems

6 min readMay 6, 2019

At KrakenSystems we’re working with various IoT devices. They are our main infrastructure for collecting data and sending them to further aggregating pipelines. For now, they are implemented as Beaglebone black devices, armv7l hard float CPU, AM335x 1GHz ARM® Cortex-A8 and only 512MB RAM. In this blogpost, we’re covering use case and rationale for using kubernetes on such underpowered devices.

Those devices perform simple services. Reading Modbus registers, or xbee protocol, or attaching to OBD (On-Board Diagnostic for vehicles), parsing the data, serializing in the protobuf format and sending on the message bus.

Design criteria & implementation

Deployment format

We want to deploy software as an immutable binary/container. Due to C++ build process arcane setup at KrakenSystems, and plethora of shared library dependencies container setup makes the most sense for this use case. The static binary is also a viable alternative, but that would require refactoring C++ our current build system written as bash/Makefile script collection running for about 15min from 0, and about a couple of minutes on CI after caching.

Another solution was deploying services bare metal. In this legacy setup there was a dedicated shared library folder per service and we did `
LD_LIBRARY_PATH` trickery for shared library version management, defeating having shared libraries in the first place. Yet due to build system current state making static binary was (dev) time-consuming.

Kuberentes with its container management solution fits perfectly to our use-case. Nomad or plain old docker/cri-o/rkt would also satisfy this design criterion. Static binaries with systemd are also a satisfactory choice if it were simple to do in the present codebase state.

Monitoring

Node & service aliveness monitoring is critical. We require some agent running on the node and sending I’m alive to some system, together with a mature alerting pipeline. Consul is one solution. Kubernetes has this out-of-the-box, and together with Prometheus alerting rules seemed like a natural fit. We’re also using Prometheus/grafana/alertmanager throughout our infrastructure which made this option more appealing.

Additionally, liveness and readiness health check aren’t particularly useful for the edge devices since the process crashing signals the issue. They are not server component requiring accepting client connections.

Nevertheless, in the future, we plan to introduce liveness checks on the services as a failsafe mechanism in case service isn’t sending data on the message bus — its main purpose.

The remaining Ascalia infrastructure is on the kubernetes, thus it made sense reusing those same tools and setup for our edge devices. Less different moving parts is always better and leads to operational simplicity despite kubernetes being not simple to operate.

Updates

The edge devices aren’t static islands forever resting int the Pacific Ocean. The code changes often, and configuration even more frequently.

The services are designed for simplicity. Their configuration is saved as YAML file under `inotify` watch for changes. Thus any update mechanism is possible in the future as sidecar, but keeps the development complexity in check. Furthermore, it’s easier to debug.

Per edge device configuration is stored in the RDBMS, Postgres in this instance. Having 100s or 1000s edge devices polling the RDBMS for simple key/value pairs wouldn’t end up nicely. Furthermore, there’s no push style notification from the RDBMS on key update. Thus we need some additional layer in between.

We’re reusing the kubernetes API server and it’s backing key/value etcd store. We’ve defined each edge device as CRD (custom resource definition) object supporting rich and domain-specific information. The kubernetes also server as primitive inventory management supplementing the real Django backed for the operations (i.e. I don’t care what Django does as long as updates the right REST endpoints in the kubernetes API)

In the future it’s possible edge services shall watch the backing key/value store itself, whether it’s kube api server, etcd, consul, riak, redis, or any other common key/value implementation.

Finally, we require async updates. The devices could be offline at the update application time. This rules out all non-agent based configuration management solutions. Ansible, our favorite configuration management tool for its simplicity and power is only used for initial setup, not update procedure (service update that is).

Wireguard VPN setup

Since we’re using wireguard VPN solution we need to keep client server IP/public key list in sync asynchronously. This entails having an additional agent on the edge device you have to monitor, track and make sure it’s alive.

We also need storing the offline device’s public key and easy inspection for those keys/settings. The kubernetes CRDs are the natural fit for this role. We reuse the etcd backing store, have nice RBAC on those object and we’ve defined custom printer columns for easier VPN node management.

We used the following open-source inhouse tools:

Long story short we bootstrapped the wireguard VPN with wg-cni ansible role. This also installed wireguard based CNI for use in our kubernetes cluster.

The wg-cni role created our custom CRD manifests representing client/servers in the wireguard VPN topology.

After applying the manifests we started the wireguard operator daemonset keeping nodes in sync with further additions/removals.

Initial deployment

It wasn’t without issue. We used kubespray as mature kubernetes deployment solution. It’s the only complete solution for bare metal deployment. Being ansible based we’re familiar with it and can easily extend it if necessary….and it was necessary.

We encountered myriad of problems:

missing support on ARM
Default pause image not supporting arm
missing cpuset (kernel update to 4.19 LTS solved it)
Run into space issues a few times
Flannel missing multi-arch support in kubespray (( before we transitioned to wireguard CNI for good ))
…

Most of these are tracked in the following issue/PRs:

Issues:

PRs:

After successfully applying the default container runtime, docker, it was time for basic performance analysis.

Initial performance analysis

Basic checklist

eMMC is mounted without atime
using armhf binaries readelf -A $(which kubelet | grep Tag_ABI_VFP_args
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

USE

About 30% CPU is on the kubernetes without any meaningful work.

Don’t show significant network pressure. Speedtest-cli shows 30MBit download/upload speeds which are more than sufficient for our use case.

In summary, there’s high CPU usage with low disk, memory and network usage.

Performance analysis

Stracing kubelet shows about 66% is spent in the locks:

Though using pprof and tracing profiles showed more useful information:

From there we concluded:

25% time is spent in housekeeping
Changing — housekeeping-interval=10m from default 10s
Increasing node update period didn’t considerably affect CPU usage

This housekeeping is mostly for container metrics, which we don’t really need them every 10s, once in a while is perfectly fine for our use case.

There are no big issues with go’s GC nor scheduler in the kubelet process, thus haven’t analyzed this further.

We observe 30+% branch misprediction rate in the kubelet process. After further analysis this is system-wide. This cheap ARM processor has horrible branch prediction algorithms.

Improvements

We performed the following improvements:

nicked the docker and replaced it with CRI plugin. Concretely we used containerd
increased the housekeeping interval from 10s to 10m
throw away flannel for wireguard CNI (that is native routing mostly)

And in the steady state, we have ~15% CPU usage overhard for the monitoring benefits. Still, quite a bit, though livable. Maybe cri-o would have lower overhead, though containerd’s is pretty slim too. We’ll investigate how can we optimize the kubelet for even lower resource consumption by turning off unneeded features.

Summary

To summarize everything, is running kubernetes on the edge devices sane choice? Maybe.

For us so far so good, everything works with some considerable, though livable overhead.

Trying to only install Prometheus node_exporter, for example, shoots your CPU every scrape, and slows everything to a crawl for those few 100s milliseconds.

This hardware is quite underpowered and with bad branch prediction makes any software running on it weaker than on comparable armv8 or x86_64 architectures.

In the future we’ll try to optimize things even further, hopefully reducing kubelet CPU overhead to a more reasonable percentage. We’ve tried rancher’s k3s without a big difference (( actually worse performance since we couldn’t change housekeeping interval ))

There’s also KubeEdge project which looks promising for kubernetes on IoT.

References

Originally published at https://krakensystems.co/blog/2019/kubernets-on-arm