As I take stock of work I’ve been doing in the DevOps realm in the last 4 years or so, I would like to write down some of the skills I think would be good to have for a DevOps practitioner in A.D. 2019. I am referring here to technology-related skills. I’ll discuss ‘softer’ skills related to culture, collaboration, empathy, etc. — skills which are actually anything but ‘soft’ — in a different post.
I’ll group the various technologies by categories, in a bottom-up approach, with the top categories building off of the base offered by the bottom categories.
Maybe it used to be the case years and years ago that a DevOps practitioner (aka sysadmin in a long-gone jargon) didn’t need to know how to code. This is no longer the case.
I recommend being proficient with at least one solid programming language, and also with bash scripting. I personally started my programming career by using C and C++, so I have a soft spot for ‘systems’ languages. Today I like Golang a lot, and I also keep reading great things about Rust. My go-to language though for DevOps tasks is Python. I also do a fair amount of bash scripting. I don’t like JS/node but I use it at $WORK. One language I’d like to learn at some point is Clojure.
Cloud services and APIs
It should come as no surprise that in 2019 you need to know very, very well the services offered by one of the Big Three cloud providers: AWS, Microsoft Azure, Google.
I was just telling a co-worker that today the equivalent of “nobody got fired for using IBM” is probably “nobody got fired for using AWS”. I’d start there if I were deciding which cloud services to learn. Although I am not a big believer in certifications, I found very useful to study for the “AWS Certified SysOps Administrator– Associate” certification. I learned a lot especially around networking and VPCs.
Once you get a feel for the services by using the Web UI, try to achieve the same things by exercising the APIs — using either the CLI, or the SDK in conjunction with your favorite programming language.
Infrastructure as Code
Combine the first category above (code) with the second one (cloud APIs) and you get Infrastructure as Code (IaC). It’s true that IaC was around before the full-blown advent of cloud computing — Puppet and Chef were the pioneers there, followed shortly by Ansible and SaltStack. I feel that Ansible is still being used today to some extent, especially since it’s now under the umbrella of RedHat, but it seems to me that the other ones fell by the wayside.
In any case, the new kids on the block here (and Terraform isn’t quite new) are tools such as Terraform and Pulumi, which create “immutable infrastructure”, i.e. they keep a state of the resources created on the cloud provider’s side and they maintain that state at all times. See this great article by Yevgeniy Brikman discussing this and other differences between Terraform and old-school IaC tools: https://blog.gruntwork.io/why-we-use-terraform-and-not-chef-puppet-ansible-saltstack-or-cloudformation-7989dad2865c
I am particularly intrigued by Pulumi and I want to play with it some more, since it lets you use ‘real’ programming languages and integrates natively with Kubernetes as well.
Another AWS-specific tool that needs mentioning here is of course CloudFormation. I have mixed feelings about it. On one hand it can be easier to deal with than complex Terraform setups, but on the other hand it is ugly to deal with.
Out of AWS also comes a promising tool called the Cloud Development Kit or CDK, which integrates as Pulumi does with ‘real’ programming languages, but lacks the immutable infrastructure features of Pulumi and Terraform.
You can’t get away from containers these days. At first I was highly suspicious of Docker and I felt that I was taking a step backwards by learning the bash-like syntax of Dockerfiles, especially after using ‘real’ configuration management tools like Ansible or Chef. But of course I got used to Dockerfiles and docker-compose files, then I started using the enormous ecosystem of tools around Docker, and pretty soon I realized there is no looking back. The aha moment for me was realizing the great benefits of containers as a software packaging tool. Hide all complexities of installing the pre-requisites for your software in a Dockerfile, then have a one-line README for the usage of your software: docker run mydockerimage.
With great power comes great…pain, in that you soon realize you have to orchestrate all these containers you are running. Enter Kubernetes. It used to be the case that you felt obliged to mention Mesos and Docker Swarm in the same breath, but that ship has sailed a long time ago and nowadays Kubernetes is the clear winner of the gold metal in the container orchestrator race. K8S definitely has a steep learning curve, but it’s worth your time to become as much of a K8S expert as you can as a DevOps practitioner in 2019. Of course, bare bones Kubernetes is not sufficient, and you also need to learn a tool like Helm to package your Kubernetes services. Turtles all the way up, mostly composed of YAML.
Continuous Integration / Continuous Delivery (with its job interview question companion Continuous Deployment) is definitely the buzzword du jour. I for one am a big believer in this process. It eliminates the ‘works on my machine’ syndrome and it offers a repeatable way for testing and deploying your software. It used to be the case that CI/CD was actually a hodgepodge of shell scripts or dubious quality plugins that you had to install in Jenkins. Nowadays, the concept of CI/CD pipelines transformed all that into a more orderly Infrastructure as Code approach, if you are generous enough to call YAML code. (I would like to take a second here and complain violently about YAML. There is always a thing that turns out into the bane of our existence as either Developers or Operations or DevOps, and I feel that YAML is it these days.)
Tied into CI/CD is the concept of GitOps, which I am becoming more familiar with these days. It has Git as the source of truth, and it uses PRs as triggers for deployments Worth studying.
As an aside in this CI/CD section, I want to mention tools for managing environment variables and secrets. They probably merit a category in and of themselves. They come up all the time, especially in microservice architectures based on the 12-factor principles. Worth keeping them in mind as you work through the layers of your infrastructure.
So there you have it, the 4 Cs of DevOps: Code, Cloud, Containers and CI/CD. If you stop here, I feel that you are in decent shape. But do continue on for more goodies.
Monitoring, logging, tracing
This is the DevOps equivalent of testing. If it’s not monitored, it’s not in production. And same about logging. Tracing / observability is the hot new thing here, which if nothing else generates tons of fun heated discussions on Twitter. In my mind, Prometheus has won the race in terms of modern monitoring tools, and Grafana has also won in terms of dashboarding tools. For logging, we have the venerable ELK stack, with L potentially swapped out these days for other technologies. For distributed tracing, I’ve had good success with Jaeger.
I just stumbled today on the acronym MLA (Monitoring, Logging, Alerting) in an interview with a Product Manager at Canonical. I guess I need to add alerting to the list. It is a necessary evil, because nobody wants to be woken up by a PagerDuty/OpsGenie/VictorOps/YourFavoriteAlertingVendor alert in the middle of the night — but on the other hand your job may depend on getting and reacting properly to those alerts.
I mentioned testing both in the CI/CD section and in the monitoring section, but it warrants its own category. It should come as no surprise that the code in ‘infrastructure as code’ needs its own tests. There are many tools that can be used to unit test your Terraform code, or your Kubernetes manifests, or your Helm chart configuration files.
There are also many other types of testing: smoke, integration, acceptance, load testing. Most of these can be automated and run out of your CI/CD pipelines. Monitoring in itself can serve as a smoke test for your deployments (think ‘canary’ releases). The more advanced among us (myself not included) query the metrics generated from their monitoring tools and use them as health checks for their deployments, so that they can roll back the deployments if the metrics show that the system is not healthy.
I think this covers the bases so far in terms of a solid set of DevOps skills. But wait, there’s more!
Another buzzword du jour, many times used in association with FaaS (Functions as a Service). There is real promise though in this technology. AWS Lambda was the pioneer there, but all the major cloud providers have serverless offerings these days. An interesting hybrid between pure SaaS and containers is AWS Fargate. You get orchestration for your containers, but also the serverless aspect by not having to manage your own Kubernetes or ECS cluster.
There are many frameworks that offer easier and saner deployments of your Lambda functions. AWS SAM, Serverless (confusingly named), Zappa, Chalice, etc.
A big caveat with using FaaS is that there usually are a lot of limitations in place. If your task takes more than a small-ish number of minutes, you may be out of luck, and it may make more sense to use either a Kubernetes Job or a Fargate task.
Streaming/event driven platforms
Kafka is the name of the game here. More and more modern architectures have Kafka as one of their elements. The more adventurous of those have Kafka as their core element, acting as the source of truth for their data. The database becomes an afterthought in that case. It pays to learn Kafka.
Of course you can’t talk about DevOps without reminiscing about that stomach churning moment when you were dropping tables left and right in the staging DB, only to figure out that it was actually production.
Good ol’ RDBMSes will be around for a very long time, so you need to know SQL at least in a rudimentary way.
The buzzword du jour in 2008–2009 was NoSQL. I think we are past that stage. But Redis and MongoDB continue to be very popular, so it’s probably worth getting familiar with them. I want to shed a tear here for Riak, which was my favorite.
This is about it. Before I leave you, here is a visual representation of the modern DevOps skill stack.