An analysis of the Kubernetes codebase

The Kubernetes community is gathering in Seattle this week for the biggest CNCFcon / KubeCon ever. With an estimated 8000 attendees and a very impressive list of sponsors, it seems obvious that the Kubernetes project has moved beyond the hype to widespread enterprise adoption.

To confirm this assumption and identify emerging trends, my colleague Francesc Campoy and I decided to use source{d} Engine to retrieve and analyze all the Kubernetes git repositories through SQL queries.

The results of the analysis are captured in the following slides and commentary below:

Commits nature and velocity show signs of maturity

While the number of lines of code continues to grow towards the 2 million mark, the commit velocity has been decreasing since March 2018 which implies that the project has reached a higher level of maturity and stability.

# of commits per month in the github.com/kubernetes organization

Most of the contributions are now directed to upgrades and tools for Kubernetes testing (infra-test) as well as the cluster federation, Machine Learning / HPC workloads management and the AWS ALB Ingress controller Special Interest Groups.

# of commits per year and repository in the github.com/kubernetes organization
# of commits per year and repository in the github.com/kubernetes-sigs and github.com/kubernetes-incubator organizations

Languages and API evolution confirm maturity and intricacy

The number of API endpoints exported in the Kubernetes codebase is stabilizing at 16,000 which confirms a level of both maturity and complexity. The decrease between some releases (during 2017) might reveal a lack of backward compatibility.

Number of public APIs over time

At its outset in 2014, the Kubernetes project had 15 programming languages, a number that quickly increased to 35 by the beginning of 2017. Given that Kubernetes came from Google, it’s not surprising to see that Go is by far the dominant language followed by Python, YAML and Markdown. The analysis shows that other languages such as Gradle and Lua have been dropped while some others like Assembly, SQL and Java made a comeback.

Popularity of programming languages in the Kubernetes codebase over time

Contributions point to a healthy open source project and community

Even though Google is the main contributor to Kubernetes by number of commits, individuals (those with @gmail.com and @github.com emails) achieve a similar number. The exact number of organizations contributing is harder to measure, but the analysis shows that people from more than 600 different email domains have contributed, including competing cloud providers such as Red Hat, Huawei and Microsoft.

Number of contributions per email domain and most common email domains (size ~ log of #)

A big thank you to Kris Nova and Joe Beda for reviewing this analysis and providing some feedback.

Companies interested in getting their own code base analyzed can request an analysis here.

Learn more about source{d} Engine: