Go Experience Report: Generics in Kubernetes

At one point recently, Kubernetes was the largest open source Go codebase in existence. It is still massive, and that’s a good thing because Kubernetes is the future of cloud computing.

Image for post
Image for post
Kubernetes is a huge Go codebase. It’s used, stretched, and even abused the language in all sorts of ways. Credit Ashley McNamara and https://github.com/ashleymcnamara/gophers for this image. Credit Renee French for the original gopher concept and design.

I’ve probably read about 30% of the core codebase. I’ve also extensively used the k8s.io/kubernetes package as a Go SDK for talking to the Kubernetes API.

I have enough experience in that particular area to narrow down this experience report to a single type: runtime.Object.

The Kubernetes “Type System”

runtime.Object permeates Kubernetes and related codebases. Just a few examples from that same runtime package:

runtime.Object is the cornerstone of the Kubernetes type system. And when I say “type system,” I really mean that the codebase has a sort of internal “type registry.”

The Kubernetes Type Registry

If you have an implementation of a runtime.Object and you want to do something with it, you have to register it first.

Kubernetes has an internal “type registry,” and it uses the registry to identify the concrete types that it gets passed. Only then will it operate on the type.

If you squint, you can see a faint glimmer of generics here!

Unfortunately, the type registration mechanism isn’t obvious. Generally, it’s done with side-effecting imports (from defaults_test.go):

_ "k8s.io/kubernetes/pkg/apis/apps/install"

That install package “registers” a new group, version, and kind (GVK) with one of the Kubernetes internal type registries. A GVK is a unique identifier for a Kubernetes type.

Kubernetes has built a type naming & identification scheme and a type registry

If you try to decode, convert, or something-else a runtime.Object, you’ll get an error if you didn’t install its GVK already.

In short, Kubernetes has built a type naming & identification scheme and a type registry to store each GVK. It took me a long time to identify (pun intended!) this whole thing in the codebase; after I did, I was amazed and impressed.

The core team replaced a compile-time language feature that was missing (Generics) with their home-built runtime system. And given the tools at their disposal, they did a pretty good job.

Why This Isn’t Great

There are a few basic issues with this approach:

Additionally, since there’s a runtime-defined type registry, anyone writing Kubernetes core code needs to add manual type assertions and registry lookups anywhere that deals with runtime.Objects.

That implementation leads to more tests in the core, and so forth.

When This Has Gone Wrong

You could replace this global registry with generics. I’ll resist the urge to invent yet another generics syntax for Go, but trust me, it’d be beautiful! Needless to say, if we didn’t have this runtime “installation” thing, we’d end up with less core code and less confusion for callers.

The Kubernetes type system is built at runtime, so type checks must be done at runtime

I have personally spent hours debugging applications that use the Kubernetes Go client SDK; particularly apps that interact with dynamic Kubernetes types like Custom Resource Definitions and generated Aggregated API types.

Generally, these applications crash with error messages like “no kind is registered for the type …” (see the source for notRegisteredErr for details).

While the error messages could be improved, the root cause is that the type system is built at runtime, so type checks — aside from the obvious interface implementation checks — must be done at runtime.

But What About Generated Code?

The question that transcends time!

I think generated code is super valuable in boilerplate situations like RPC clients and server stubs.

This isn’t a boilerplate situation. The code we’re talking about is at the core of Kubernetes.

The resulting codebase would be compile-time type safe, but way bigger

But generated code could technically fix the issue, so let’s discuss that for a second. We could generate a concrete struct for each and every Kubernetes type.

Looking at the Kubernetes resource reference for v1.9, that means structs for at least 30 Kubernetes resources (by the way there are plenty more — I just counted the stable v1 ones). It also would make sense to remove a lot of the generalized functions and interfaces that do copies, conversions, etc…

And instead of that generalized code, we’d have 30! *(factorial) conversion functions, 30 copy functions, and so on.

Way more code and less generalization. Sigh…

The resulting codebase would be compile-time type safe, but way bigger, and mostly generated. It would also be much harder to be productive in that kind of codebase.

In Conclusion

In my opinion, generics would really make this situation better. In most places where runtime.Object is used (particularly where it’s returned), type information is lost, and Kubernetes has an elaborate system to make up for it.

Type registries are novel, and kind of amazing to me, but they shouldn’t have to exist here.

I’m so looking forward to the day that it doesn’t have to!

Written by

Gopher, containerizer, and Kubernetes-er

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store