DevOps: The Many Facets of Infrastructure

Elliot Graebert
4 min readApr 30, 2022

--

Photo by Ray Harrington on Unsplash

By facets, I’m referring to the many different services, processes, or considerations that go into running a single service in a thorough manner. Cloud providers make it simple to deploy a server, but the long term story around host management is far more complex. Failing to understand the big picture and future direction of these facets can put availability, security, and velocity at risk. I believe by writing out these facets in list and tabular format, they become understandable and conquerable.

The goal of this blog post is to capture this exhaustive list of software infrastructure facets in order to help people identify gaps in their knowledge or infrastructure and make improvements. If you are in any field that is DevOps/SRE/Security/Developer adjacent, there is content in this post that is meant for you.

I’ve also written several deep dive posts that go into specific facets in much greater detail. Check out:

Abbreviated List of Facets

The facets are ordered from basic building blocks to late-stage processes.

Also Available as Google Sheet

Deployment Pipeline: store code, configure, and deploy

  • Source Control — SCM, User Permissions, Repo Settings
  • Host Configuration — Base Image, Baketime Config/Packages, Host Benchmarking, Bake Framework/CI, Runtime Config/Packages, Secret Management, DNS, NTP, Service Discovery, SDLC
  • Infrastructure — Specification/Configuration, Execution (CI), SDLC

Availability Architecture: once you can deploy, what do deploy

  • Redundancy — HA and Scaling Design
  • Monitoring — Metric/Log Generation, Metric/Log Visualization, Workflow Validation, Alerting, Delivery
  • Recovery — Auto Remediation, Backups, Disaster Recovery

Security Architecture: once you have infrastructure, time to secure it

  • Secure Access — Environment, SSH, Web Access
  • Encryption — At-Rest, In-Flight
  • IAAA — Environment, SSH, Application
  • Logging — Hosting Provider, DNS, Host (System), Application
  • Telemetry — HIDS, NIDS
  • Vulnerabilities — Scanning and Remediation across Hosting Provider, Hosts, Containers, Web Endpoints, and Code

Operations: once it’s secured, how do you sustainably support it

  • Documentation — Ops Guides, User Guides, and Compliance Controls
  • Support Operations — On-Call Rotation Process, Support Ticketing & Tasking, Metrics/SLI’s, Feedback, Cost Analysis
  • Compliance Processes — Disaster Recovery Testing, Access Control Review, Red-Team/Tabletop Exercises, Onboarding, Offboarding

Some people’s initial reaction to this list is that these facets are only needed for advanced DevOps teams, and that only a subset of these facets are needed for smaller companies. I am making the argument that consideration of all of these facets is the minimum bar for infrastructure, but each company’s implementation will differ based on their operational maturity. Or put another way: the more you have to lose, the more robust your implementation should be.

How does one use the list above/table below?

As a reference to root out gaps in your knowledge

Copy the Google Sheet and fill it out for your organization. A seasoned DevOps engineer should be able to enumerate all the implementations of these facets without difficulty. Can you? It’s critical knowledge for redesigning and expanding infrastructure.

As a tool for identifying weak points in the infrastructure and prioritizing fixes

Gather your team and fill out the sheet together. Ask the team to rate their maturity or thoroughness for each facet. Use this information to determine the weakest facets and consider prioritizing projects to address them.

New to the organization? This table is a great way to quantify organization maturity and structure. By asking probing questions, you can start to see how teams interact in practical ways (outside of what the org chart claims). Any engineer that can easily iterate through the table is going to be a key engineer for major initiatives.

As a reference table for your team’s documentation

Make the sheet a centerpiece of your onboarding documentation. Use it as a helpful way to give new hires answers to the multitude of questions they will have.

As a framework to compare different architectures

I initially created this table to compare multiple different production architectures that were being used at my company. I filled out the technologies/processes used in each and assigned values to represent similarity. Afterwards, I had a concrete metric I could approach leadership with in order to get support for breaking down barriers and killing duplicative tooling.

The Many Facets of Infrastructure

Due to limitations in Medium, it is too difficult to get this table to render here. Instead, I’ve provided the deep dive into these facets as a free to share Google Sheet. Use the link below to see a deep dive into the table.

You don’t need to read every cell in the table below. Skim the first two columns until you find a topic area you find interesting and read from there.

A snippet from the Google Sheets, as an example

So what’s next?

My goal is to expand upon this list by doing a set of deeper dives across these topics. The order I work on these will be entirely based on feedback I get from the readers. In fact, if you’ve gotten this far in the article, then I’m very excited to hear your feedback.

If there’s a topic you want to see a deep dive on, or if there’s something you think I’ve missed, let me know.

Many thanks to my editing team: Mary, Ashir, Nick, and Emily

--

--

Elliot Graebert

Director of Engineering at Skydio, Ex-Palantir, Infrastructure and Security Nerd, Gamer, Dad