Creating reproducible snowflake servers

An introduction to Software Defined Infrastructure

Thomas Kraus
8 min readSep 4, 2017


I used to automate in order to create many identical servers. Now I automate to provision the exact same server again and again. Here’s what I learned on my journey towards using Software Defined Infrastructure.

Back in 2004 I co-founded a small company in software development and web hosting. For the web hosting, we started out with a few servers in a shared data center; in a shared 19" rack even. When it was time to purchase a 4th machine, we decided we should have our own 19" rack. We announced a maintenance window for the few websites and services we were running, and moved the machines to a dedicated rack nearby, still in a shared suite.

As our hosting business grew a little over time we made some redundancy- and connectivity improvements. For example, we added serial port console access to our servers to recover from firewall-related incidents and we added some basic fail-over capabilities. Most of the configuration activities (operating system, networking, middleware, monitoring, backups) were performed manually, so eventually all machines became unique snowflake servers over time. Only our customer-related operations were largely automated, i.e. scripted in Bash.

Any new servers I used to prepare as much as possible at home before installing them in the datacenter. Some installation tasks were scripted, other tasks were fully manual. After sliding them into the 19" rack, I “only” needed to fine-tune the networking configuration via the console for 20 minutes or so.

How are things different now?

Some years later, as a full-time IT consultant for several large, corporate organizations, I noticed the purchasing of new hardware components was still done ad-hoc (project driven), and they would get installed and configured semi-automated, e.g. using disk imaging or scripted software installations. In contrast to my own way of working in the past, most of the software setup work was performed remotely by administrators after the bare metal was in place.

Nowadays, for many administrators, bare metal is not a concern at all anymore. IT infrastructures, the composition of software, hardware and network resources to run software applications, have gone through quite some evolution the last few decades. Not just from a hardware and performance perspective, but especially from a management (i.e. system administration) perspective.

Both lead time and the number of mistakes in for instance setting up new servers or modifying configuration of network equipment, have greatly decreased due to automation. At the same time, traceability and testability of your infrastructure changes have improved enormously.

Hold on, I don’t need all my servers to be identical…

Right, they shouldn’t be all the same, but they shouldn’t become unique due to configuration drift either. The classical, largely manual, approach to managing servers and other infrastructure components has certain disadvantages, including:

  • Lack of automation, which impedes scaling
  • Poor testing abilities
  • Poor change control

To deal with these disadvantages, the classical infrastructure setup activities got more and more automated in the recent past. As an intermediate step, some organizations (my own company as well) used virtualization technology for some years. Virtualization technology, e.g. VMware, Xen, or Microsoft Virtual Server, helped solve some challenges in resource utilization but still left the management of these virtual resources a rather manual task.

Software Defined Infrastructure to the rescue

Nowadays, many organizations use a more software-defined approach. The hardware itself (cpu, hard disks, switches and routers), still necessary of course, is now considered more of a commodity as the overhead that comes with purchasing and managing a special piece of hardware is no longer worth it. As a bonus, this decoupling of physical hardware resources and logical, or virtual, resources allows for renting the virtual resources from a specialized (public) cloud provider instead of owning the hardware… and its lifecycle.

Software Defined Infrastructure (a.k.a. Infrastructure as Code) is a way of approaching network and system administration tasks from a software development perspective. Renting the resources instead of owning them (often referred to as Infrastructure as a Service, or IaaS), allows you to further optimize the activities that are more closely related to your business.

Benefits of Software Defined Infrastructure include time savings and improved reliability (reproducibility, traceability) due to, for instance, version control, code re-use, and automated testing. This enables system administrators to quickly define and launch (“provision”) new servers of various sizes. Provisioning is done either manually through web interfaces, or in an automated fashion by custom code or 3rd party tools that connect to APIs of the infrastructure provider. Example (open source) provisioning tools include Vagrant, Terraform or Ansible.

When the logical resources have been created and connected, something useful needs to happen with them, i.e. software applications need to be installed and configured. This is typically done by so-called configuration management tools. Popular open source examples include Puppet and Chef.

What does a Software Defined Infrastructure look like?

Although implementations, tools, platforms and providers may vary, let’s use the picture above as a shared conceptual view on what an SDI can look like. Note that the technologies mentioned are examples.

With the software-defined approach, your entire compute, storage, and networking infrastructure, plus the software applications running on top, is expressed as code and configuration. Stored in a repository, typically version-controlled, your infrastructure is testable with automated tooling. Executing this code multiple times (with some variation in parameter values) can quickly provision and configure several environments (let’s call them Test, Acceptance, and Production) that only differ in terms of the parameter variance you specified. For example: less CPU/memory/storage in Test, different IP ranges and firewall settings, different server login authorizations, but otherwise identical.

Great, but isn’t virtual machines as a platform a thing from the previous decade?

For some businesses this is the right level of abstraction due to process or tooling requirements, but of course the possibilities don’t stop here. Not only can you manage virtual servers with storage and networking in an automated way, it is in fact possible to increase the abstraction level a bit.

In the recent past additional services have emerged that all leverage the benefits provided by flexible infrastructure that is maintained as a software project. Platform as a Service (PaaS) allows for deploying software, configuration, or content at an abstraction level higher than the operating system. PaaS examples include AWS Elastic Beanstalk, Heroku, or the Facebook App Development platform. Software as a Service (SaaS), in turn, is about making complete, fully functional software packages available to customers as a service. SaaS examples include Google Docs, Salesforce CRM, or Dropbox. You may have heard the terms Functions as a Service or Backend as a Service as well, more on those in a bit.

Another example of increasing the level of abstraction is the rise of container technology, which was made easily accessible to many system administrators and even developers by the open source project (and company) Docker, in 2013. Although not the sole implementation of container technology, Docker allows you to package a software system with all its run-time dependencies in a so-called Docker image. This Docker image can be deployed (that is, a container with running processes gets instantiated) on a server or a platform prepared specifically for running containers. This server or platform does not need to have any prerequisites (e.g. software libraries, tools, middleware) installed, since containers are, well, self-contained.

Furthermore, containers provide isolation (at process and filesystem level) within a shared operating system. This has clear benefits: whereas before you could run, say, 5 virtual servers on a physical server, now you can run dozens of containers on a single server. It has become irrelevant on which server they are actually running, because regardless of where a container gets instantiated it will run just fine and does not interfere with other containers. Obviously, when containers manage persistent data or state, this needs to be handled properly across the container’s lifecycle. That is, when a container gets renewed or deployed onto a different server, the data or state should remain accessible.

And I hear we can do computing even without servers nowadays?

An even more recent development, starting in 2015, is the notion of serverless computing, a.k.a. Functions as a Service. Sadly, we cannot get rid of all servers and data centers just yet, as computations and data storage still need to take place at some point. The term refers to, again, raising the abstraction level, allowing software developers and IT administrators or DevOps teams to be responsible for, and focus on an even smaller part of the technology stack, i.e. no individual servers or containers to manage anymore. Although not suitable for every type of software system, it’s definitely an interesting next step both for software development as well as for IT infrastructure management.

The setup consists of small functions that get invoked by a trigger (e.g. an HTTP request). They perform some computation or lookup in a database or remote service, and return some result. A typical use case for a function is a chat bot implementation: it gets an event (a message in your favorite chat application), it looks up or computes some result, and posts this back as a chat message. The functionality may be generic, e.g. current traffic information for your route to home, or the local weather forecast. It may also be about specific business functionality, i.e. on demand compiling a report with information from a database.

For mobile apps that require processing or persistent state, (M)BaaS, short for (Mobile) Backend as a Service, is a popular way to implement lightweight backends. Although functions themselves don’t maintain state and only run briefly, they can communicate with persistent storage to store or lookup some data, provide social media integration, and facilitate push notifications. New versions of the backend are instantaneously available to all mobile app users after deployment, which can be an advantage over having to wait until all users have updated the app on their mobile devices.

All major cloud providers offer Functions as a Service: AWS calls it Lambda, Microsoft calls it Azure Functions, and Google offers this as Cloud Functions. Beware of the pricing models, though. For small amounts of requests it’s either free or very cheap. When scaling up it can get expensive, but the total cost of ownership calculation becomes complex. For instance because you need an API gateway (paid per request), the function itself, some persistent storage, and perhaps a message queue; but you save effort on managing resources at the machine or container level. Some more thoughts on cost comparison, specific to AWS, can be found here.

Want to learn more?

For more details, I can recommend the O’Reilly book “Infrastructure as Code” by Kief Morris, head of ThoughtWorks’ European practice for Continuous Delivery and DevOps. The quick guide to choosing infrastructure tools by the same author is also very informative.

The website SDxCentral (Software Defined Everything) contains news, research and white-papers on various infrastructure topics.

A fun and interesting introduction to serverless computing was presented by James Thomas at the Codemotion Amsterdam 2017 conference, find slides here and code examples here.

Along with the increase of abstraction and automation, security considerations have become more important. In a follow-up article, we’ll see how to address and improve your SDI security.

Thanks to Evelyn van Kelle and various other colleagues at the Software Improvement Group for providing valuable input and feedback.