Notes on managing a home development environment

I recently weighed-in on a discussion about spend on public clouds for personal projects, and while my costs are still substantial, I offset a lot of it by investing in resources at home where I could push, even further, from my mind, costs associated with trivial services, or things early enough in development where the costs are unnecessary, or even just where development is slowed down by not doing this locally.

With that in mind, I decided I’d conglomerate elements from a few different posts, and detail how I manage my home setup.

Hardware

This consists of two physical servers, one that has substantial resources to work as a hypervisor, and another that is exposed to the upstream network hardware, but also provides core services like internal DNS, and is the center point for my configuration management system, that sort of thing.

The services host is lean: it runs most services (the exception being the salt-master process) in Docker. For example, dnsmasq and Consul run as containerized services on this host. Basically, anything that is used by the entire environment, or things that need to be available during VM provisioning (such as a service that, you’ll see why later, returns information about a new VM and sets off a countdown to coordinate with configuration management on this host).

The hypervisor has 32 GB of memory, a 120 GB SSD, and a 12 TB btrfs storage volume used for VM storage. This runs a flat QEMU/KVM installation. Managing provisioning will be covered in another section.

Network

My router has only two DHCP reservations, and it is for the above two hosts. The reason this is the case (rather than relying only on dynamic addressing, or static addressing), is that the VMs will sometimes have lifetimes of a few minutes as well as those with long lifetimes, but many of these do require DNS entries, so this, likewise, is handled dynamically. Like most people, I use a standard consumer router; because I’m not a network engineer, my inclination to spend time configuring routers and switches at home is pretty limited, so the simplest method of accomplishing what I need is to rely on DHCP, reservations for addresses that should not change, rather than static networking, and NAT configurations for services that need to be exposed outside of my home network (very, very limited; I recommend, for example, securing an exposed SSH service with 2FA, if such a service must be exposed, and a VPN server is not a viable alternative for you).

I do this running Consul as a dnsmasq backend (running on the Services host) to receive updated hostnames for a client IP as changes are made on the network (a reboot, or new provisions, etc.). How this works is covered ahead, but for now, just understand that DNS on the network is handled using this method.

As for the host networking itself, the services host also acts as a jumphost into the network; SSH is accessible (and thus other services can be forwarded, in lieu of resources for a typical VPN, etc. into the network) via NAT to this service host. Once inside the network, the hypervisor (and VMs, assigned LAN IPs) is accessible. Networking on the hypervisor is just a standard bridged interface.

Certain VM groups — a Kubernetes cluster running in VMs on this hypervisor, for example-have additional network overlays with other requirements, for example, still subject to the normal things like external load balancers, etc. that might also run on the hypervisor.

Provisioning & Configuration Management

Because most of the VMs I’ll build will use one of a handful of OSes, I have template disk images, and a template XML file. This, for example, assumes you manually created one VM image for each template you want to create, so for example an ubuntu template disk image, stored as ubuntu.img.

I can provision a new host from these files using something like:

NEW_HOSTNAME=whatever-service2.boulder.gourmet.yoga; NEW_OS=ubuntu; sed -e 's|NEW_HOSTNAME|$NEW_HOSTNAME|' -e 's|NEW_OS|$NEW_OS|' $NEW_OS-template.xml >> ~/$NEW_HOSTAME-$(date +%F).xml

to create a template like:

<domain type='kvm' id='5'>
<name>whatever-service2.boulder.gourmet.yoga</name>
...
<devices>
<emulator>/usr/bin/kvm-spice</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/media/Media1/libvirt/images/whatever-service2.boulder.gourmet.yoga.img'/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>

where (in the template) these new values are substituted in for a new XML template, which can, then, just be defined normally:

virsh define ~/$NEW_HOSTAME-$(date +%F).xml ; \
virsh start $NEW_HOSTNAME ; \
virsh autostart $NEW_HOSTNAME

but before you can do the above, you need to prepare the image and especially if you plan to run multiple VMs from this image, things like preventing IP conflicts (because things like a MAC address, or host keys, will be persistently configured on the host, etc.), run something like to create a new disk and prepare it:

cp /var/lib/libvirt/images/$NEW_OS.img /var/lib/libvirt/images/$NEW_HOSTNAME.img && \
virt-sysprep -a /var/lib/libvirt/images/$NEW_HOSTNAME.img

Normally, this would just create a new VM from a cloned base image, but because our services are more dynamic, a script baked into the image (or, you can SSH into the new VM to run) on startup, will do things like start the Consul client containers (to setup DNS), and setup the salt-minion process and configuration:

#!/bin/bash
salt-setup () {
sed -i 's/HOSTNAME/$(hostname)/g' /etc/salt/minion && \
service salt-minion restart && \
salt-call grains.setval role '["docker","base"]'
}
consul-setup () {
docker run --restart=unless-stopped -d -h $(hostname) --name $(hostname) -v /mnt:/data -p $(curl -s ip.boulder.gourmet.yoga):8300:8300 -p $(curl -s ip.boulder.gourmet.yoga):8301:8301 -p $(curl -s ip.boulder.gourmet.yoga):8301:8301/udp -p $(curl -s ip.boulder.gourmet.yoga):8302:8302 -p $(curl -s ip.boulder.gourmet.yoga):8302:8302/udp -p $(curl -s ip.boulder.gourmet.yoga):8400:8400 -p $(curl -s ip.boulder.gourmet.yoga):8500:8500 -p 172.17.0.1:53:53/udp progrium/consul -domain gourmet.yoga -dc=boulder -server -advertise $(curl -s ip.boulder.gourmet.yoga) -join 10.0.1.13
}

So what happens in the above is, in the second part, a service (on the Service host) ip.boulder.gourmet.yoga is called to confirm the network IP, and to kick off a timer-script on the Service host intended to allow new Salt keys to be added and accepted automatically (not super secure, and I recommend not using this strategy elsewhere; you can just do this manually, if you use Saltstack, or using an authenticated method of doing this), and set up the Consul client and connect to the primary Consul instance on the Services host.

In the salt-setup function, it just updates the Salt minion config (this is preinstalled on the template image, as is Docker) with the Hostname in id field to identify itself to the Salt master.

The role grain being set for Salt will determine what states get applied to this host. Since all VMs of this template type will require the base state (management for things like my default set of users, default packages, etc.), and because of how I manage DNS as docker role is applied (mapping to a Docker state as well), I only apply to base role grain items on provision.

Consul is a fairly lightweight package, and for example, were I to drop broad use of Docker across my environment, updating the base Salt state to pull the binary instead of a Docker container, that would be a trivially simple change to make. This is likely the more useful implementation, if you do not use Docker broadly. Something like this in your consul.slsstate:

extract_consul:
archive.extracted:
- name: /usr/bin/consul
- source: https://releases.hashicorp.com/consul/0.9.3/consul_0.9.3_linux_amd64.zip

where as if you do heavily use Docker, keeping your processes contained to containers, probably appeals to you on some level. In this scenario, the Docker restart policies would replace something like creating an upstart service, or supervisord -managed script for the Consul binary, which is the primary function it serves, plus, working with a local-network Docker registry, makes this a quick way to manage the state of various services.

Scaling and some Final Thoughts

The obvious reasons one might do this are the same reasons such workloads have moved to public clouds, but with enough automation and abstraction around the processes above (for example, wrapping the provisioning process, or using one of many language bindings for libvirt, in a script) can make it easy to do, and the benefits of a robust, local environment for a build and release pipeline, or just a way to move clutter from a laptop without losing the benefits of local development.

Scaling such an environment has the obvious constraints of using any of these things at home; power isn’t cheap, computers are often loud, and hardware fails eventually. The cost vs. risk is basically this: do you benefit from these processes and having such an environment be local? If so, then the cost to maintain the hardware is likely a better investment than the cost in time and productivity and maybe frustration (if you use a platform whose tools are considered a professional skill, in and of themselves), people like kernel developers come to mind, where cost-effective hardware isn’t always available, and builds on a shared-provider make building in a VM not ideal.

If you don’t benefit, then it’s likely not worth the headache; that’s the problem public cloud platforms were intended to solve, after all, to abstract out infrastructure management to allow developers access from their most comfortable level in the stack, which arguably is the goal of any technologist. With that said, if you work at that level, odds are that an opinionated platform isn’t going to cut it when you routinely won’t find value from abstracting infrastructure into APIs and declarative DSLs the same way another developer might. The goal in saying all of this is to say that there’s more than one way to approach your productivity challenges, and investing in such a setup is relatively low maintenance, and can be considered reasonably secure or reliable for development.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.