You need SRE skills to thrive in a serverless world — Kelsey Hightower

Now that you have a system like Kubernetes, or even Lambda for that matter — who watches the watcher?

Forrest Brazeal
A Cloud Guru
9 min readJan 24, 2018

--

Welcome to “Serverless Superheroes”!
In this space, I chat with the toolmakers, innovators, and developers who are navigating the brave new world of “serverless” cloud applications.

For this edition, I chatted with Kelsey Hightower, a Developer Advocate for the Google Cloud Platform and the co-author of “Kubernetes Up and Running”. The following interview has been edited and condensed for clarity.

Forrest Brazeal: Kelsey, you’re a leading expert on Kubernetes, Google’s open source container orchestration system. How did you get involved with Kubernetes, and what got you excited about it initially?

Kelsey Hightower: A little background may help people understand. I’ve had the job of system administrator in finance, startups, and web hosting; I’ve even worked in a few datacenters. I know the pain of managing infrastructure, and I’ve been through all the hype cycles — “Write once, run anywhere”, virtualization, configuration management, all these things — way before containers.

From a sysadmin I transitioned to become a developer, so I also know what it’s like trying to get your code out the door, and dealing with the friction between developers and people in operations.

Fast forward a few years, I’m working at a container company called CoreOS, where I watched Google launch Kubernetes at DockerCon 2014. They had a press release and a GitHub repo, but you couldn’t really use the project because there wasn’t any documentation.

So I rolled up my sleeves and dug through the code base, and I wrote the very first “here’s how you install Kubernetes on your laptop and play with it” post. The post went to number 1 on HackerNews — the Google press release was at number 2 — and let’s just say CoreOS found its place in the Kubernetes community.

But CoreOS was not quite committed to Kubernetes at that time, so I started contributing from home. At night I was putting in bug fixes or refactoring tests, organizing code. I was a contributor to Kubernetes before I was on stage talking about Kubernetes.

I became excited about Kubernetes because I knew the problems from my past that it solves. Like, wow. If I had this ten years ago, life would have been better. So I obviously knew the potential it had to ease people’s jobs right now. Not in the future — right now.

So how exactly does Kubernetes make people’s lives easier right now?

It’s the consistency. Kubernetes, if you personify it, is doing the things that the very best system administrator would do. If I give you a piece of code, it becomes very easy to deploy it to the environment it needs to be in.

Especially three or four years ago, before we had volumes and all this networking stuff, at the core it was very obvious where Kubernetes’ value was: deploying and managing containers.

You have some containers and a bunch of machines. Using Kubernetes you could declare, with a very small snippet of configuration, “Run this container”, and then just watch it run. If you destroyed a machine, Kubernetes would “move” the container onto a different machine. That alone is more advanced than what most people are doing, even today.

So now, three or four years later, Kubernetes is eating the container world. But there’s also been a parallel emergence of “serverless” technologies, functions as a service, which address some of the same infrastructure management problems you just mentioned. What’s your take on FaaS as a concept?

The first time I saw the “FaaS” idea was back in the CGI days. You write some PHP code, put it behind Apache and then Apache calls it when you get an HTTP request. The limitation back then was: there was no scaling, no concept of a cloud, and there was no clean API to do this for every language.

Now we’re seeing Amazon take another crack at this with Lambda. They say: “Now that we have the cloud, and some idea of what this elastic compute environment can do, we can take your source code and run it with the rest of the cloud stuff — authentication, databases, and API gateways.”

Another difference between Lambda and CGI scripts is the cost model. Do you see functional billing as a game-changer?

I can understand the cost model of “pay per invocation”. I see that almost as an on-ramp to the bigger parts of the cloud.

I personally look at FaaS as a programming environment for the cloud. As a cloud provider I have events, I have message queues, all these services — that’s the value I provide. I’m not going to charge you for the SDK to leverage those services. And that’s why I don’t want to create a huge barrier to entry in terms of cost on the functions.

You can actually run functions on top of a Kubernetes cluster with Kubeless. Does that strike you as the best of both worlds?

If you have a VM-based environment, you need to do a lot of work before you can run an app. You need configuration management and do all this other stuff before you can deploy.

Kubernetes raises the bar with better abstractions — you just give it a container, declare how it should run, and you’re good to go. But Kubernetes is still missing some critical workflows, like the ones provided by serverless platforms. If you want those workflows, you can layer them on top of Kubernetes by installing something like Kubeless.

Now you have the ability to upload your snippet of code, also known as a function, and run it. Remember, all these FaaS offerings are going to build a container under the covers. But with Kubernetes, it’s open source; you have full visibility and control.

FaaS provides a nice abstraction for certain use cases, but it’s not for everyone. Kubernetes represents the highest level of abstraction that most people can understand and leverage today, especially coming from VMs. And then serverless is a step above that.

If you have a Kubernetes cluster, which we’ve seen strong adoption over the years, wouldn’t it be nice to have a workflow like Lambda, that runs on top? All the necessary abstractions are there, which puts Kubernetes in position to become the foundation people use to create a new workflows on top of it. Even Serverless.

Long term, does Kubernetes continue to be an abstraction people are comfortable with, or will more people shift up the stack and build at the FaaS level?

People should use the right abstraction for what they’re doing. For example, when I write a Google Assistant integration, the only thing I want to do is run a snippet of code that responds to the user’s query. I can run that on Google Cloud Functions. I don’t need a container, I don’t need data volumes or storage, I just want to run it as a function. It’s the perfect abstraction for that use case.

Now, if I want to do some machine learning, playing with TensorFlow, then I want to give my code a whole container that can mount a data volume, use a GPU, and that’s not necessarily the sweet spot for serverless platforms today. So I think what people should do going forward is use the highest level of abstraction that will work.

We’re also getting farther away from the set of skills that is traditionally associated with DevOps. When I interviewed Simon Wardley for this series, he talked about DevOps being “the new legacy”. Does that rhyme with your experience?

I agree with Simon. I think what happens in technology is, any time we learn a discipline, we give it a name. “I’m a system administrator. I do DevOps. I do SRE”. And all we’re really saying is: at this point in time, we’re going to checkpoint our discipline and give it a name.

What’s supposed to happen is, after we give a name to a discipline, it should roll into the technology. There’s no reason you should be doing DevOps for 40 years. Once we get the practice right, it should turn into technology.

Kubernetes is born from DevOps. Even though it stems from Google, it’s not necessarily a clone of what we do internally at Google; it actually plays well with what’s going on in the Docker community and what people coming from Puppet and Chef are trying to achieve.

We’re saying, if you use configuration management for deploying applications, Kubernetes makes life easier by taking what we’ve learned, as a collective community, and making them the defaults. We don’t need to “program” the best practices anymore.

If a node fails, wearing your DevOps hat, what would you do? You would automate the service to fail over to another machine. That’s built into Kubernetes. We’ve also learned through DevOps practices that you should have things like centralized logs and monitoring.

You know what, Kubernetes has that out of the box. So we’ve rolled a lot of these DevOps practices right into the technology. And when I talk about “we”, I’m talking about all the contributors to the Kubernetes project.

If a lot of these classic DevOps disciplines have been automated now, what’s the next, higher-order set of skills that people should be mastering?

Site reliability engineering. Now that you have a system like Kubernetes, or even Lambda for that matter — who watches the watcher? Even if the cloud provider is doing everything, I need to double check — is my latency where my customers need it to be? The provider’s going to do the best they can to give me a great service, but if my customers don’t agree, then I have a problem.

Maybe I’m using the wrong library, or making database calls inefficiently. This is where your SRE team adds a lot of value. Who cares about deployments? That’s a done deal. But once it’s deployed, how do I tune what my customers experience?

And this is where we can free ourselves up and get to the items in our backlog. Every DevOps person has a backlog. “We’re gonna get centralized monitoring one day. We’re gonna get CI/CD one day.” Well, now you can start knocking that stuff out.

What do you see the Google Cloud Platform doing with serverless and/or Kubernetes that really excites you?

We believe we’ve been doing serverless for a long time. App Engine definitely represents what even AWS has been saying since the last re:Invent — serverless is not just FaaS. We want to give people the experience of pain-free usage, and we believe that App Engine has been doing that for almost 8 years now. Or BigQuery — it’s fully managed, you just run your queries and decide how fast you want it to go.

On the compute side we have Cloud Functions, a FaaS offering we are extending to a broader developer community by adding support for more programming languages and features that let people focus on code, not infrastructure.

We also have end to end solutions that people don’t necessarily know about: we’ve integrated Cloud Functions into DialogFlow. If you want to build your own Alexa skills or Google Actions for the Google Assistant, you can use DialogFlow. When building those skills or actions, all the ML training is done in the browser, leveraging GCP in the background.

That’s the idea and the value of serverless. You’re not provisioning infrastructure, but you’re still doing the compute. So we’re already taking this idea of serverless and bringing it really close to the value line.

Some people reading this article will be feeling a little discouraged right about now. They may be thinking: “Kubernetes and serverless are so cool, but I’m not getting to use any of this stuff in my daily job. Is it too late for me to get myself and my organization on board?” Can you offer any encouragement or advice for these folks?

I would say: “you already have the necessary skills to start adopting these new technologies”. Those skills came from the blood, sweat, and tears managing the platforms that came before Kubernetes.

They know their organizations better than anyone else, so they’ll always be valuable no matter how the technology landscape changes. So celebrate that, take pride in that — nobody can take that away from you, not serverless, not the cloud provider.

But you have to take action and show value. And this is where I think people make mistakes. They go in and want to have a Kubernetes conversation, or a serverless conversation. What they should do is say “What does my team or organization want to do?”

Maybe you have a developer complaining about having to spin up infrastructure before they can get something done. Well, tell them: “Try this. Check in your code and watch it deploy.” If they see value in that, they’ll get behind you and support that experience. And then one day you can tell them it’s Kubernetes, or Lambda, or something else.

We have to get more business savvy and talk about why we want to build a different thing, rather than walking around with a Kubernetes shirt, like: “Hey, I went to a conference and I think we should do Kubernetes now.”

Slow down! You know exactly where the business problems are. Solve the problem, and then talk about Kubernetes.

--

--

Forrest Brazeal
A Cloud Guru

AWS Serverless Hero. Cloud architect @Trek10inc, words and cartoons @acloudguru. Previously cloud infrastructure @Infor. Opinions here are mine.