Proposing an enhancement to Kubernetes: The story of a KEP

Kubernetes is a big project — it has a wide range of users and a large number of people working on it at any given time. For simple bug-fixes and small changes, issues/pull requests are great. However, when trying to convey a much larger change, where a lot of different people may need to be involved, you need something more. This is where Kubernetes Enhancement Proposals (KEPs) come in. In this blog post I’ll talk about my experience, as a relative community outsider, trying to contribute one.

Background

Sidecars are a relatively common design pattern when running applications in pods

We hit problems working on our Vault sidecar container when attempting to use it as part of a Job: when the primary container process completed, the sidecar wouldn’t recognise it should exit and the Job would never complete. It turned out this was a fairly generic problem — imagine you have a Job where you have two containers; one is doing a task (e.g. database migration) and the other is doing something to assist (e.g. mysql proxy). When the container doing the task has completed, the other container doesn’t know to exit so the job will never finish.

There was a longstanding issue for situations like this on Kubernetes but no progress had been made. I implemented a workaround into our Vault sidecar container but thought it would be much nicer to have a solution that would work for all sidecars. I attended KubeCon Copenhagen, went into the SIG-Apps session (they seemed like the people to talk to) and asked them if they had any plans to address it. Unfortunately they said no but they’d be happy for me to have a go at solving it. I asked them how I should go about starting this and they told me to raise a KEP, so I did.

Making the KEP

I decided to fill out the KEP template with a straw man proposal, submit it, capture some feedback and hope that this would start a conversation around the idea.

The pull request was opened May 14th 2018. I got a few comments but I wasn’t really sure how to progress it further. After a while I reached out to SIG-Apps and they suggested I present the proposal at one of their weekly meetings. I was a bit nervous about presenting my idea to a group of experts, but they were all very welcoming. I talked about my proposal, what I was trying to achieve and why. The general consensus was that it was worth doing, however doing something like this would require Kubelet level changes so I would have to get another SIG, SIG-Node, to agree to it as well.

I reached out to SIG-Node to talk at one of their weekly meetings. However, this didn’t go so well and I felt a little bit discouraged after talking to them. During the meeting, the discussion went on for far too long, there was a lot of confusion about what exactly was being proposed and whether the initial problem I was trying to solve was worth the effort. I think a lot of this was fair but I didn’t feel like we really got anywhere by having the discussion. I think the combination of me poorly communicating the idea (I was a bit nervous) coupled with them being less familiar with the problem to start with led to a somewhat unproductive session. SIG-Node also came at the problem from a very different perspective than SIG-Apps; the former focus on how to run workloads on the nodes themselves while the latter are more interested in how a user can express their workloads. In the end though this different take on the problem was useful and allowed me to have a better idea of how I was going to progress the idea, so I went back to the drawing board and suggested a number of ways forward.

Evolving the KEP

Activity on my KEP started to languish and it didn’t look like the proposal was going anywhere. It was at this point that I realised that increasing the scope of the proposal was actually the way to get it moving again. This realisation was prompted by someone from Istio commenting. They were having issues with the istio-proxy sidecar: they wanted it to start up before other containers and exit after them. Now this wasn’t exactly the same as what I had been trying to solve but it had definite overlap. I was quite worried about feature creep but as someone pointed out, sometimes feature creep is a good thing if it allows you to solve multiple problems with one solution. I think this was a pretty valuable lesson and something I tried to keep in mind going forward. There were occasions where people would suggest expanding the scope in other ways but I felt they didn’t actually complement the current proposal and were instead just solving a different problem with another solution. Learning when to push back against ideas and when to incorporate them to strengthen the proposal was a skill I had to develop.

After expanding the scope, the discussion gathered momentum, a few people got quite heavily involved and we got to a point where the proposal was relatively solid. The main outstanding issue was how we implement it in the API: there are countless ways you could do this, all of which would work, so how do you know what the right way is? We had defined the problem, we had defined the solution, we just couldn’t decide the mechanism for actually using the solution. At the time of writing this is where we’re at: the API implementation is still the thing we’re struggling to decide on, but hopefully we’ll make some progress on this soon.

Conclusion

Proposing a KEP has been an interesting learning experience. Some things I took away from it are:

  • Knowing when to incorporate someone’s suggestion and when to politely reject it is not easy.
  • Sometimes you just have to pester people on Slack (politely) if you want to get something done.
  • Defining the problem and coming up with solution isn’t always the hard part. The hard part is working out how to implement the solution.
  • It often just takes a few people who are also really interested in seeing it through to get a proposal moving, instead of a lot of people who just have a passing interest.

Some feedback for the KEP process:

  • Multi-SIG KEPs are hard to coordinate. Talking to an individual SIG in one meeting and then going to another SIG’s meeting to talk about it was not as productive as it could’ve been. You end up playing messenger between them if they disagree. I feel it would’ve been easier if there were some kind of KEP related call where we could’ve all discussed it together.
  • It can be quite unclear who has the final say on things — it often felt like it was down to me as the author to make the final decision.

I shall continue to try to progress this proposal. Feel free to follow along or help out over at Kubernetes/enhancements. I’d also like to give a big thanks to all the people who commented on the PR and helped to get it this far. The Kubernetes community really is amazing and I could tell that everyone who got involved just wanted what’s best for all users.


We’re looking for more people to join uSwitch’s Cloud Infrastructure team, where we do stuff like this, create new tools and run Kubernetes clusters. Our careers page has more information on becoming a Platform Engineer at uSwitch.