DevOps For the Non-Initiated

Alex Floyd Marshall
Secret Handshakes
Published in
14 min readMar 11, 2021
"devops in a box" by psd is licensed under CC BY 2.0
“devops in a box” by psd is licensed under CC BY 2.0

DevOps is a term you may hear bandied about by tech-folks at your organization. It’s a buzzword, for sure, and like any buzzword can be easily misused and misunderstood. The term itself represents a fusion of two concepts — Development and Operations — which have traditionally been separate “realms” within the IT landscape. But the reason for this fusion is the more important part to understand, and by putting the concept of DevOps into context, we’re going to help you communicate better with your techy colleagues.

Henry Ford vs. Thomas Edison

To frame this discussion, and put it onto slightly more familiar terrain, I’m going to refer to two major titans of industry: Henry Ford and Thomas Edison.

"Ford Assembly Line, 1953" by aldenjewell is licensed under CC BY 2.0
“Ford Assembly Line, 1953” by aldenjewell is licensed under CC BY 2.0

Henry Ford, the founder of the Ford Motor Company, is the best known architect of the “assembly line” method of mass production. We’re all familiar with the images of partly constructed cars rolling down the line, with workers adding new components at each station. The process was repetitive but also precise and allowed Ford to rapidly produce a massive fleet of identical cars. Now, here’s the thing: before you get to that production line, before the cars start being assembled, you need a full architecture or design blueprint for those cars. You need to know what every piece looks like and where every screw fits. Then you can lay out all the stations of the assembly line to that exact specification and let the factory roll. As a corollary of this: when you change the design, even for a seemingly small change but especially for a big one, you have to reconfigure the factory and retrain your workers. As you might imagine, that’s a big undertaking and it’s a slow moving process, which explains why cars are often rolled out on “model year” production cycles. It’s also why the engineering process for designing a car (and the assembly line that goes with it) is a multi-step process, with lots of opportunities for review, revision, and quality control.

"Thomas Edison Lab" by milan.boers is licensed under CC BY 2.0
“Thomas Edison Lab” by milan.boers is licensed under CC BY 2.0

Now, let’s contrast that to Thomas Edison. Edison is principally known as an inventor. How many of us have heard, or used, his quote about how he didn’t fail, he found a million ways NOT to build a light bulb? That quote illustrates a totally different approach to “production”: rapid experimentation with lots of small changes trying to arrive at the best result. Compared to the slowly turning ship that is an assembly line production factory, an inventor’s workshop is quick on its feet. It’s what we call, to use another contemporary buzzword, “agile.”

Early Software Production

"Waterfall" by Thomas Strosse is licensed under CC BY-SA 2.0
“Waterfall” by Thomas Strosse is licensed under CC BY-SA 2.0

The early days of software production share a lot in common with an assembly line production factory. “Software” first existed in physical media: whether it was punch cards fed into a machine, floppy disks, or CD-ROMs, there was literally a physical production component to the software distribution process. And just like in a car production factory, changing that process could be slow and complicated. There may not have been as many steps on the assembly line, but every release came with a significant cost in terms of the time and money required to produce and distribute the finished product. There was, like with cars, a built-in incentive to slow things down, make sure the “final” product was really ready for prime time, and then mass produce the identical copies. Similarly to the process of engineering a car, this incentive promotes a process involving multiple behind the scenes steps, including reviews, revisions, and quality testing, to be sure that the final “design” that is going into the production process is sufficiently finished. This process has become known as the “waterfall” method, the image being of a waterfall with multiple stages that spill over into one another on the way down.

How the Web Changed the Process

The biggest change in software production is the emergence of the web as the dominant means of distributing software because it enables entirely digital distribution, ditching all of the physical media, like CDs or floppy disks. Once you are distributing your software digitally, the cost of sending out a new release becomes extremely minimal. You no longer need to worry about totally rearchitecting your distribution and delivery process, you just “publish” the code and wallah! Of course, it’s a tad more complicated than that, as you’ll see, but compared to having a full physical infrastructure for production and distribution, it’s practically free to send updates or new releases. This gives rise to one buzzword we’ve already mentioned: “agile.” Agile software development is software development taking advantage of this feature of digital distribution to follow a pattern much more like what we saw in the inventor’s workshop: rapid experimentation involving small changes to continuously improve the product. Instead of releasing a large number of changes at infrequent intervals, like a car model year or the traditional “waterfall” method of software production, we can now release lots of small changes in quick succession, letting us do two important things from a business standpoint. First, agile development can respond to bugs and errors quickly and get fixes out the door fast. Second, agile software produces a lot of data from its small, experimental releases that can be leveraged to make the product better.

Let’s adopt an example here to illustrate how this works. To use something that most people are probably familiar with, let’s talk about the web interface for Gmail. Now, there’s a lot going on behind the scenes that makes Gmail work, but we’re going to leave behind most of that complexity and just focus on the app you see when you log in to check your mail. This app lives entirely in your web browser: you access it by logging in, not by installing anything on your computer. This sort of “web app” is the idealized version of the agile workflow. Other types of apps, like mobile apps you install on your phone or apps you download and install to your computer, can get close to this thanks to digital distribution. They don’t need to worry about the physical infrastructure of distributing their software, so they can push out updates much more rapidly than “traditional” software could. However, their users still need to install updates they send out, and they would probably get annoyed if they were prompted to do this 5–6 times per day or week, depending on the development team’s speed of pushing out changes. So most installed applications will, while being mostly agile, still have some sort of final “hold” leading to a consolidated release of a bunch of changes at once so that they don’t drive their users too crazy (or get too far ahead of the ones who don’t download updates right away). On the other hand, our Gmail web app (and other web apps like it) don’t require the user to install anything. Updates happen automatically: the next time you log in or open up a new Gmail tab, you’ll be on the latest version, no further action required on your part. In fact, you probably have no way of knowing what version of Gmail you are on because “versions” for web apps like Gmail are entirely in the background: it’s something the internal development team works with but users don’t actually see. Users only see whatever version is currently open in their browser. This means that the software team working on the Gmail web app can literally publish changes as often as they like. They could publish 20 new versions an hour and it would be fine because literally all that’s required for a user to get the newest version is for them to open up Gmail in a new tab.

Because the Gmail creators can deliver these changes so seamlessly to users, they can be sure that their updates are out there being used by a lot of people at once. In fact, because this is a web app, they are the ones directly serving it to those users (it’s not actually installed on the user’s computer in other words, it’s living on Google’s servers). So the Gmail team even has a bird’s eye view of exactly how many users have any given version open at any given moment and which users those are. This also means they can spot issues quickly and precisely and they can collect a lot of information about how new features they are experimenting with are being used or received. And both of those things let the Gmail team continue to develop the app in a way that “continuously improves” based on real world data from real users. This all points to a crucial underlying principle: agile, at its core, isn’t just about speed, it’s about improving business value by being more responsive to customer needs.

CI/CD

So that’s the agile dream: publish changes to your software as fast as you can make them, gather data about how those changes “performed” in the real world with your real customers, and then use that data to plan the next round of changes. The faster that loop can be performed, the better your product according the agile philosophy. And for web apps like Gmail, the only limitation on the speed of that loop is how fast your developers can type.

"Computer Code Fabric" by colleengreene is licensed under CC BY-NC-SA 2.0
“Computer Code Fabric” by colleengreene is licensed under CC BY-NC-SA 2.0

Well: almost. Because if you go too fast, things often break. And while some amount of “breaking” is actually fine because it provides valuable learning/feedback (remember Edison talking about all the ways he learned not to make a lightbulb), there’s also some breaking that we’d really like to avoid if we could. For example: if you are not a programmer, computer code probably looks like some sort of alien language to you with lots of swirly braces and weird syntax and even weirder punctuation. While all of that stuff means something to programmers, it’s also easy, even for experienced programmers, to make silly mistakes (like forgetting to put in one of those braces or spelling something wrong). Publishing those kind of mistakes isn’t really a great opportunity for valuable learning. It may reinforce that individual programmer’s knowledge of how their language works, but it doesn’t really give valuable feedback on the product design from a business perspective. So we’d really like it if those sort of silly human errors didn’t get published because they create a lot of messy “noise” in the data we’re trying to actually interpret to further improve our product. Similarly, if we have a bunch of people working on a software project, we’d really like it if they weren’t working at cross purposes with one another and publishing changes that undermined each other’s work.

For things like this there has emerged a set of tooling called “Continuous Integration, Continuous Delivery” (CI/CD). In a nutshell, what these tools do is run automatic tests on the software’s code to check for things like spelling/syntax errors and other things that we’d really like to not have in the versions that get released because they would just clutter up the data. They can also perform other testing we’d like to do before a release, like some sort of “performance test” to verify that new code we’re testing hasn’t accidentally created a major slowdown in the app (we don’t really need customers to tell us that was a turn in the wrong direction, do we?). After all these tests are completed, these tools can then define automatic workflows that are triggered, up to and including releasing or “deploying” the new code (or possibly staging all the new code before it’s manually released in a more consolidated package, like we see with installed apps). We can also get really fancy with this sort of stuff: maybe we release it for a small subset of our users first, evaluate how things are going, and then release for everyone. Maybe we split traffic between an old version and our new version and do a side-by-side comparison for a certain period of time. There are lots of possibilities, but they all embody the same principle: enabling teams to follow the agile philosophy — small, fast releases evaluated for data to improve the product — without introducing unnecessary errors or problems that cloud up that data with noise we could have avoided.

The Next Step: The Cloud and DevOps

So far, we have seen how digital distribution, reducing the cost of deployments significantly, makes possible a more “agile” methodology for software development in which rapid experimentation is used to continuously improve products by evaluating the data gathered from those experiments. The next step in the evolution, here, is the arrival of “the cloud.” I’ll explain the cloud in more detail in another post, but for our purposes, here’s a basic understanding. In a traditional software deployment, even for a “web app”, the computers or servers running the software have to be set up manually. That’s a labor intensive process. While it’s less labor intensive than setting up a whole production line, it’s still labor intensive. And it creates a division within traditional technology teams between “developers” — those writing the code — and “operators” — those managing the computers/servers the code runs on.

"clouds" by lonnypaul is licensed under CC BY-NC-ND 2.0
“clouds” by lonnypaul is licensed under CC BY-NC-ND 2.0

The cloud changes this. In a cloud deployment, computers/servers are deployed “virtually,” which is to say that instead of needing a team to physically install and manage them, you use software to “provision” them through another service (such as a major cloud provider like Amazon, Microsoft, or Google). This substantially reduces the cost (at least in terms of time/barrier to entry), for rolling out new applications and services, or for expanding the geographic reach of those applications and services to new parts of the world. Instead of (literally) requiring heavy lifting and expensive equipment acquisitions, now an engineer can simply type a command from their workstation and have new “virtual equipment” available. Agile becomes even more agile, in other words.

This is the context into which “DevOps” fits. DevOps is a tearing down of that division between Developers and Operators because the underlying “equipment” we are operating is virtual, not physical, and so we can “program” it the same way we program the software itself. This is a concept known as “infrastructure as code”: programming this virtual equipment to match the needs of our applications. To do this programming of infrastructure, a whole host of “DevOps” tools have emerged (among the most famous: Ansible, Chef, Puppet, Terraform). These tools can further be linked into the CI/CD tools we discussed earlier to manage the entire agile process, letting us create “test environments” on the fly, run those tests, prep software for deployment, and actually run the deployments onto the provisioned cloud resources.

DevOps also introduces several other significant business benefits. For one thing, it reduces the need for businesses to have dedicated “operations” teams. Large businesses will still want some “ops” people, but they probably need fewer of them if they have adopted a DevOps approach. And smaller businesses that are doing everything using the cloud may not need anyone dedicated specifically to ops so long as their team is well versed in DevOps tooling.

Second, DevOps brings us full circle on the assembly line process by creating a “virtual” assembly line. Remember when we were talking about the difference between Ford and Edison, we said that Ford’s assembly line was repetitive and precise while Edison’s lab was quick on its feet and moved from experiment to experiment. From the perspective of software distribution, the part that we want to have “experimental” is the application itself. We want to try new features, testing them out and getting real, live data on that. But the “infrastructure” that application lives on — the servers — aren’t really “experiments,” we’d like those to just work exactly the way we expect them to. In fact, we’d really like provisioning those servers to be “repetitive and precise” like an assembly line. And that’s what DevOps lets us do: we define exactly what we want that underlying infrastructure to look like, and it ensures that we get the exact same thing every time. Now we can deploy new, experimental version of our apps as often as we like and we don’t need to worry about whether the data is telling us something about our application or the server it’s running on because that server is always exactly the same, like a Model T rolling out of Ford’s factory.

Finally, this is also a boon for security because it reduces room for human error and short-cut taking around the underbelly of our application (it’s infrastructure). We can build in the security parameters we want on that infrastructure from the beginning and know that every time we spin up a new server it will have those same security parameters set. In a “manual operations” setting, this wasn’t a guarantee. A human setting up a new server might forget to do something. Or might decide it’s not that important, really, and they’d rather get to lunch a few minutes earlier. Or make a mistake while doing it and render what was supposed to be a secure firewall inert (or make it into a brick wall that blocks everything). DevOps reduces all those opportunities for mistakes and makes sure our infrastructure is secured (to our requirements) the same way every time.

Buzzword vs. Philosophy

To summarize, DevOps is an approach and related set of tooling that enables three things: Infrastructure as Code (the cloud model of provisioning “virtual equipment” for our software to run on), complete reproducibility of that infrastructure (the assembly line model), and even better manifestations of the “agile” process of “continuous improvement” through small, fast experiments while baking in the testing (using tools like CI/CD) and security we want. That’s what we mean when we talk about DevOps, putting the infrastructure our software runs on into an assembly line we can “program” just like software so that when we are “moving fast” in our agile experiments, we can be confident the things we are “breaking” are things we actually cared about testing, not the stuff we want to assume is working.

Sometimes, however, DevOps gets adopted as a buzzword that means something else. It might be shorthand simply for “making things faster” in the software development process. There’s some truth to that, as we’ve seen, but it’s not really a complete understanding of what DevOps is trying to do. Or it might mean “cutting out operations people because we don’t need them anymore.” As we’ve seen, DevOps can reduce the need for dedicated ops teams (especially on smaller, fully cloud-oriented teams), but just slashing an ops budget is not DevOps. Or it might mean “giving developers total control over the infrastructure.” Being able to program infrastructure in a DevOps fashion should give developers more control/influence over how that infrastructure is structured. But we also need to make sure we’re doing that in a way that is actually maintaining the reproducibility and the stability/security of that infrastructure, or else our programmers are going to introduce a whole host of new problems that have to be solved.

Ultimately, DevOps, like the agile philosophy it both extends and enables, is about improving business value. Business value may have some relationship to speed (in an agile framework, faster “experiment loops” lets you improve your product faster, which may be a major competitive advantage). It may also be related to budgeting and department sizes or the division of responsibilities between teams/roles. But if the tail is wagging the dog and decisions around “DevOps” aren’t being made with a real eye towards how they are improving the overall business plan, it’s not really DevOps, it’s buzzword envy. Hopefully, this post helps you understand better how DevOps can contribute to the overall business success, and with that better understanding you’ll be better equipped to talk with your colleagues and make informed decisions.

--

--

Alex Floyd Marshall
Secret Handshakes

Lead Cyber Security Engineer at Raft, a new breed of government tech consultancy. Member of the CNCF Security TAG. Freelance writer and occasional blogger.