Reflections on a trip to KubeCon / CloudNativeCon 2019

Published in

John Lewis Partnership Software Engineering

8 min readMay 31, 2019

I was lucky enough to make the trip over to Barcelona last week for KubeCon + CloudNativeCon 2019. We are a big user of Kubernetes here in JL&P. As we are increasingly pushing the boundaries of the out-the-box things you can do with it, it was very useful to learn more about what others are doing within this ecosystem. At the very least, we would hopefully pick up some tricks — or at least share some challenges! — and see what is coming down the pipeline that might helps us out.

Any Stand-out Themes?

One personal theme I reflect on, in contrast to some of the “big vendor” conferences I’ve been to in the past, is how nice it is to actually hear from real customers and users of this technology. Sure, I went to a few talks from the big tech companies, but even then they are presenting very experience-focused content; their speakers are focusing on how they’re engaging with the community as a whole, and so on. Not at all sales-pitch-y. This is, I think, pretty brilliant — it meant the conference overall felt more valuable.

It’s clear that the community around these technologies, with Kubernetes at its foundation, is vibrant. This is fantastic for the future of these sort of services, and it’s also pretty clear to me that this is, at least in part, because the people behind it are working very hard on promoting this attitude.

From a technology perspective, people really are actually doing the hybrid & multi-cloud thing. I personally see this as something you should really only do if you absolutely have to (if lock-in scares you, perhaps think about your exit strategy rather than missing out on the benefits of backing just one provider?). That said, I appreciate that for some it isn’t an option — compliancy for example. So if we did have to do this at some point in the future, at least other folks will have trodden down the path a bit.

The expo hall — before it got even busier!

As I work with — and am hopefully seen as someone who helps set the direction for (!) — a set of “platform” teams, it was very reassuring to hear from several other end-users of this tech in the enterprise, all approaching things in more or less the same way that we have chosen to.

This was definitely a community of platform builders using these tools as the raw blocks to build even better platforms, tailored for their own users (software developers).

Many of these organisations also seem to have started from a place similar to ourselves (”I need to go faster”, “I need to put more powerful tools in the hands of our engineers”) and they continue to face the same sort of challenges we do (”how do I choose which option to go with?”, ”how much do I take on / get others to do / leave to product engineers?”). We are clearly not alone in the questions we’re facing.

And of course, the onus is then on us to continue solving the next challenge, and the challenge after that. We should also be sure to stay on top of the best ways to build platforms, so this doesn’t become stale and ineffective — a trap we’ve fallen into in the past.

This Kubernetes Thing Might Catch On

It was really nice to hear so much positivity about Kubernetes — the project that started this whole thing and which turned 5 years old at roughly this time. There were a few nods to this around the conference centre.

A tribute to Kubernetes being 5 years old … with donuts!

During one of the keynotes, Janet Kuo did a very nice segment on the Kubernetes project in CNCF itself. I liked her timeline narrative:

2003 — Borg was created. No, not in Star Trek (which was, staggeringly enough, May 1989!), this is Google’s collective instead
2006 — Linux cgroups arrived
2008 — Linux adopted the “container” naming convention
2009 — Omega — the next-gen Borg at Google, which heavily influences Kubernetes design
2013 — Docker. Hooray for Docker, it’s the best
2013 — Project 7 within Google (open sourcing container orchestrator — it comes from Seven of Nine from Star Trek of course — the Friendly Borg)
2014 — Kubernetes announced at DockerCon. There was much rejoicing
2015 — v1.0 of Kubernetes, CNCF formed, the first KubeCon at the end of the year
2016 — Kubernetes SIGs came into being, KubeCon EU, industry adoption, in Production, at scale. This thing is a big deal
2017 — CRD introduced so can build your own Kubernetes resources on top. Kubernetes becomes a platform for building platforms (if Kelsey Hightower had a pound for every time that was mentioned at this conference …). Cloud adoption drives Kubernetes to become a de facto standard, many new native APIs introduced
2018 — Graduated from incubation in the CNCF. This is seen as a “stable in Production” thing
Today — one of the highest velocity open source projects. It is number 2 on Pull Requests on Github (behind … Linux), #4 on issues/authors (out of 10’s of millions)

Now it is v1.14 and the most stable and mature ever. You can even run it on Windows nodes — now that is crazy 😉

*It was also great to be reminded of the Kubernetes Comic Books — I didn’t realise there were two of them!*

Right … that’s all well and good, but what about some actual technologies of interest?

Service Meshes

Service Meshes have been an interest for me since I first learnt about the concept in more detail at Google’s Next conference last year. Up until now I’d always assumed that Istio would be our answer to that — we run on Google Kubernetes Engine after all, and Google are the main contributor to Istio — but I used the opportunity at KubeCon to get a bit more of a rounded view on this topic.

Firstly, there was the introduction of SMI (Service Mesh Interface). The announcement was brief, but welcome — having these things start to coalesce around a standard can only be a good thing.

I also listened to 1&1 talk about some of the challenges they’d experienced with Istio which mimicked my own (I’ve had some real fun and games trying to get the GKE Addon to play nicely with PodSecurityPolicy — I think we’ll wait for the Operator!).

Finally, I took in a talk on an alternative mesh technology, Linkerd, which I liked the sound of and am tempted to try as an alternative. That said, my suspicion is that in the end, we’ll still end up on Istio, in a large part due to its richer features plus the fact that our concerns will most likely be addressed by the time we get round to implementing it.

If you’re interested, in my opinion there’s a really nicely put together comparison piece here.

Observability

I also deliberately took in a number of talks on observability tools — in large part because it’s an area of focus for my team at the moment.

We actually have three different angles on this:

1. Prometheus at Scale

We have an emerging scaling problem with Prometheus. At the moment, our Promethei are relatively small but still by far the biggest pods we run on our newer version of the platform and are occasionally perturbed by changes in our tenants’ usage. Their standalone nature feels wasteful and limits us to only holding data for (I think) 45 days. Teams want to keep it for longer, which is not unreasonable in my view — we need to do better.

I was therefore particularly interested to learn more about M3, Cortex and a little about Thanos as open source solutions to this particular challenge. Whatever we end up picking will probably end up making an interesting blog post in its own right!

2. The Logging Problem

Logs are useful to debug problems. They’re also, in my view, a trap for metrics to go into. We are doing a really good job lately at avoiding the latter, but that doesn’t mean we can continue to get away with providing our engineers with sub-optimal tooling for log analysis.

We already know about the Elastic Stack, but I went to a talk on Loki just to see what that was about and I liked what I heard — “built for engineers to solve problems” in particular — but I think it’s just too early days for us to gamble on it. Probably. Maybe. We will see 😄

It was also interesting to observe how crowded the booths for Elastic, Loki, Logz.io and DataDog were. Or maybe folks just wanted the stickers … 😉

3. Distributed Tracing

We are pretty close to needing to provide some sort of distributed tracing tool, in my view. We are doing microservice-y things and we are at the point now where those microservices are a little less customer → microservice → legacy and a bit more customer → microservice → another microservice → oh and maybe another microservice too → eventually legacy.

So the news that the two main open source ways of instrumenting that stuff — OpenTracing & OpenCensus — are joining together (in a backwards-compatible way) is very pleasing indeed.

I’m hoping we’ll have an excuse to write more about this topic soon once we’ve gotten more hands-on with some of the tooling in this area. It looks very powerful.

A nod to the wonderful TV Show the IT Crowd

Wrap Up

And that wasn’t all either — I went to some other really interesting specialist tracks, including a couple of sessions on Open Policy Agent & Gatekeeper (which I think we should use), Multi-Cluster Ingress, dynamic pod auto-scaling, a Custom Deployment Controller, plus some weird and wonderful Kubernetes failure scenarios.

All-in-all, it was a really fantastic conference with loads and loads of technical depth on an incredibly diverse platform ecosystem, which is supported by an active and enthusiastic community.