What Does Peeling An Egg Have To Do With DevOps?
Learn These Guiding Principles To Help You Level-Up Your DevOps Chops
A few days ago I was peeling an egg, like an idiot. My girlfriend, Angelly, was like — “why are you peeling the egg like that?”
She picked up another hard-boiled egg, smacked it on the cutting board, and then rolled it while pressing down firmly. The egg’s shell was off in about 3 seconds total.
Here I was chipping away tiny piece by tiny piece.
My point is: Working hard is overrated.
The best solutions are the simple, effective solutions.
Is it hard to take rust off of a screw? Probably, right? Unless you know that coca-cola removes rust from screws.
Now, is it hard? No. All you need to do is drop it in a cup and wait.
If you do not know the simple techniques, you cannot use them. You get experimentation instead of implementation. Exploration instead of replication.
When a technique in software engineering is used over and over again because it makes the problem you are solving simpler, it becomes a pattern, or “best practice”.
Sometimes, they have complex, scary-sounding names — like Command Query Responsibility Segregation (CQRS), or Event Sourcing (ES) — but these patterns exist to solve problems. Specifically, problems found in building sane, distributed systems.
When we look at the coding context as a whole there are more universal patterns: “Keep it Simple, Stupid” (KISS), and “Don’t Repeat Yourself” (DRY), for example.
I want to talk about these same sorts of patterns and principles as applied to DevOps. DevOps is often represented as the promised land where the birds are singing and the sun is shining. Without the right techniques, though, it will be a pointed spiky hell, like my tiny eggshell shards.
On top of the universal patterns like KISS and DRY, there are other “principles” that I’ve found building DevOps Systems that aren’t quite patterns, yet. These principles can help you move from picking away eggshells piece by piece to having them off in no time flat. They are, in no specific order:
- Don’t repeat others.
- Empower developers to be productive as possible.
- There is no production.
- Put everything in the cluster and backup the whole thing.
- VPNs are not simple, and there are simple solutions to solve the same things.
- Codify and Automate Everything.
Lesson 1 — Don’t repeat others
If you can buy it off the shelf or there is an easy to use open source version of a tool you need, use it.
Don’t reinvent the wheel. Buy the wheel.
Did you know you can use the same mail server that craigslist uses? Did you know you can use it for free? Need a mail server? Use that! Don’t build it, build around that!
I love awesome-self-hosted for finding things, as well as searching for it for use with Helm on their hub.
Although Helm is a pretty new tool for finding and sharing software, a v3 website is now up: https://v3.helm.sh/
Helm is really great for getting wheels.
Let me take you back to when I started programming. The first “real” language I learned was QBasic in 9th grade. I’d been making websites with HTML and CSS for a few years already. Back then, the internet was new. While people could create packages, they weren’t really sharing them around like we are nowadays. My standard mechanism for storage was a floppy disk — which I wish I could find so I could extract my 11th-grade brick break game I made in Java Swing…
When I programmed, I used standard libraries, baby. That was all there really was. That I knew of at least. I was like 13. I’m sure some of you more experienced Java Pros (Hi Vlad and Nick!) were killin’ it. ;)
Not today. Hell, today, you can get entire functioning UI components all bundled up nicely and ready to go. You can get a library for dealing elegantly and easily with dates with a single command.
Wouldn’t it be great if there were something that allowed you to use and install entire functioning subsystems? Need a database? install database
Good news! There is! You can now also get entire functioning subsystems all bundled up and ready to go, using Helm.
It’s the same concept we’ve embraced via NPM and Gradle, but the packages it manages are known as Charts.
A chart maps containers to run in different ways inside of Kubernetes.
Using a chart is kinda like buying a wheel. Except the wheels are containers mapped to run inside of Kubernetes.
The cool part about Charts is that you can easily package up your own services to run easily in any Kubernetes cluster, and even share it with others if you’d like.
This means that you could describe entire environments in a codified manner:
dependencies:
- name: backup
repository: http://jenkins-x-chartmuseum:8080
version: 0.0.2
- name: monitor
repository: http://jenkins-x-chartmuseum:8080
version: 0.0.3
- name: marketing-site
repository: http://jenkins-x-chartmuseum:8080
version: 1.1.10
- name: denormalizer-service
repository: http://jenkins-x-chartmuseum:8080
version: 1.0.0
- name: mongo
repository: https://kubernetes-charts.storage.googleapis.com/
version: 1.0.0
Need to update the marketing site to 1.2.0? Just change it, and commit. Which brings us to Lesson Two.
Lesson 2 — Empower developers to be as productive as possible
So there I was, sitting at my desk, face in my code, tracking down a bug. Users had been complaining about this for weeks and I found some free time and decided to be a hero.
Badda bing, badda boom! Found! Fixed!
I shouted over my half-height cubicle in the sun-lit room — “I fixed the bug!”
Next Tuesday, when the release went out, users were gonna love it! That is, after all of us got together in a room and pushed the boulder of the release up and over the top of another small mountain… err, if we were able to push it over, that was… if nothing goes wrong along the way…
Okay, next Tuesday, if the release went out — users were gonna love it!
This was my experience deploying when I was a junior software engineer at my first programming job out of college.
Things have come a long way since then.
Now, I practice trunk based development and deploy many times a day. When I make pull requests, a little bot posts a comment with an ephemeral pull request preview environment after a successful build and passing tests.
In today’s environment, you don’t need to shout over the half-height cubicle.
The more you can empower engineers to be able to control the parts of the infrastructure that they need, the simpler your job becomes as a DevOps engineer becomes.
In lesson one, we saw it was possible to update a production environment by literally changing a number and committing.
In the ideal world, where we are also packaging all of our applications into Charts, we are also empowering each engineer on the team to modify how that service runs in production. A chart is literally a mapping of containers to Kubernetes, and for that purpose, it exposes things like resource requirements.
Seems how I’m not great at guessing how much memory or CPU settings a new service will need — and I’m guessing you aren’t either — I also like to deploy monitoring and alerting with some rules that will inform me and my team via Slack when those settings need adjusting. This means it will guide you to correct settings once you deploy it. I used to spend hours at a time running Prometheus queries and adjusting, just like I used to spend too much time peeling my eggs. Now, I’ve learned to do it the smart way.
I can already hear you saying: “That sounds complicated.” Nope. Just install the chart.
Anything that can be automated or simplified should be. For example, what if simply deploying a service with a label that said “expose: true” would map it to a DNS path? This is where operators come in. This is a more advanced Kubernetes helper tool, and something worth understanding, but let’s not get lost in those details right now.
And that brings us to Lesson Three.
Lesson 3 — There is no production
This was one of those epiphany moments for me. It took me looking at things from a new perspective, so try to follow me for a minute.
For over a decade, I’d thought there were only a handful of environments. At it’s simplest, there was a Staging Environment and a Production environment. First, you deployed to staging, tested things, and then deployed the next thing, tested it, etc. Once everything was deployed together — “integrated” — then we could continue the process and repeat the process to production.
This is how I did things for years and thought nothing of it. It’s the way it was.
This is another one of those peeling apart eggshells piece by piece type of moments.
A chart is a dependency graph. Use the chart to represent the environment, and then the process of deploying to production is just deploying a single chart!
If each team or project or bounded context had its own chart, you could have many production environments that allowed you to group and update sets of services together in a single transaction.
So instead of thinking of production as this big, all-encompassing environment, realize there are actually many little productions.
Big changes are scary and dangerous. By keeping many small production environments you isolate changes and make the rest of the system more resilient.
Also, because everything is just in Charts all the way down, every setting is exposed and tweakable in a uniform manner. Meaning, things like running a less powerful Kafka cluster in staging where it’s not needed is just a simple config change.
Lesson 4 — Put everything in the cluster and back up the whole thing
Ok, so we can run things easily in production, but what about databases? What about Kafka? Is that safe?
Well, if you’ve been paying attention, you’ll remember I mentioned databases as a thing that can be packaged into a chart.
In Kubernetes, there is now an API specifically for running stateful applications like databases in highly available setups, called StatefulSets
.
Pretty much the entire point of Kubernetes is to run containers reliably.
Paired with a tool called Velero that backs up an entire Kubernetes cluster, installed via Helm Chart, you can now backup the entire state of a Kubernetes cluster, as well as all of the attached volumes, like the ones created by the StatefulSet, and restore everything with a single command. Backups are also easy to configure on a schedule.
With backups and restores just a single command away, coupled with managed Kubernetes, spinning up an entirely new cluster and restoring it to your backed up version can now be done in just two commands. Three if you want a brand new backup first.
Instead of thinking of just servers as cattle, entire clusters start to become disposable.
Is staging being annoying? Throw it away. Just a simple backup, restore, and DNS change away.
Lesson 5 — VPNs are not simple, and there are simple solutions to solve the same things
Have you ever enjoyed using a VPN?
I mean, honestly.
Has anyone?
Google tried. Then, a couple of years back, announced that their company would no longer use them. I guess they didn’t enjoy using them, either.
Instead, they rely on trustless networks.
Instead of a master key that lets you access all the networks, a key is placed in front of each service via a Single Sign On screen. Need to access the monitoring service? Just log in with your authorized company credentials.
A small shift is all that’s required. Instead of using a VPN for authentication, use a proxy.
A VPN also places the Kubernetes management URL on a private network where an SSO gateway does not. However, they also provide an alternative authentication mechanism. For Google Cloud and AWS that is IAM authentication, and you can also whitelist IP addresses.
If you can maintain a lot less infrastructure with the same benefits, do it. Be like Google: ditch VPNs for trustless networks.
Lesson 6 — Codify and Automate Everything
Something takes a bit more effort to codify? I don’t care! It will save hours, days, or even weeks of effort later on. Codify it.
It’s the only reliably repeatable way to recreate infrastructure.
If every setting can be modified by changing and committing something to git, then your technology organization as a whole is essentially declarative. Any dev with git access can easily maintain any system.
To do so, use lots of little repositories, then tie them together later. Monorepos lead to people taking shortcuts and a dependency on an artificially important structure. Instead, use lots of little repositories, and tie them together later.
My friend Matt and I make a tool for doing this in development, it’s called meta
. Helm is a tool for doing this in production: everything is just a dependency graph!
Conclusion
Don’t peel an egg piece by piece. Smack and roll.
Interested in having me automate your DevOps pains away? I’ve got the smack and roll on lock.
Check out my website to learn more about me, and my agency, Unbounded to schedule a free 15 minute Discovery call. I’m also available as an advisor on AdvisoryCloud.
Just Getting Started with DevOps? Check out My Journey to Acheiving DevOps Bliss, Without Useless AWS Certifications on HackerNoon and find a free 21 day email course at the bottom!