Keeping servers healthy

Published in

Code Enigma

5 min readJun 14, 2019

Part I — Building your server

I previously wrote about how Code Enigma does cloud management. Now it’s time to talk about servers within those cloud infrastructure, and how we manage security, software, monitoring and so on.

We charge a fixed monthly fee for managing servers and some people might think that fee is a little high. We all have home computers, right? We do updates and install antivirus software ourselves, how hard can it be? This is the ‘problem’ — most people have no idea what proper server management looks like.

So let’s look at what we do.

Setting up

Firstly, because of our ISO certifications we have a bunch of administrative steps to take. Things like appointing a build team, creating a checklist for the infrastructure we’re configuring, gathering client and application information, ensuring contracts are in place: Laying the foundation for a quality assured, professional delivery.

When we come to building the actual servers, the first thing we hit is a system called cloud-init which is now more or less the standard for piping in commands to a newly created cloud server to automate some of the provisioning. If we’re setting up a dedicated server, or something on premises, we’ll probably have to do this manually, but for AWS machines we can run a set of commands to start the ball rolling.

These commands do a bunch of tasks like making sure the software is up to date from the get-go (often images we use will have sat on the shelf for a while, so the packages pre-installed aren’t the latest), installing software we need (Puppet — more on that shortly, the ‘awscli’ package for talking back to AWS via their API) and housekeeping stuff like setting the hostname. Once the commands have run we have an up to date skeleton server.

Next, we operate a secure VPN to allow us to connect client servers to our monitoring and management systems, and we need to configure that VPN for the new server, give it an IP address and prepare the routing. Once we’ve done that, we can package up the VPN configuration the new server will need and copy it down to our workstation for later use.

Then we start on configuration management. I mentioned Puppet a moment ago, and I mentioned Terraform in the previous article — well, Puppet is like Terraform (which stores infrastructure configuration in code) but for software on servers instead. We have a bunch of core services we need to be running on all equipment we manage, and we don’t want people to be able to tinker with them — at least, not permanently — and Puppet helps us to prevent that. Every half an hour it checks the key server configuration files it’s tasked with managing against those held by our central Puppet catalogue, and if they’ve been altered it automatically puts back the original version. It also guards against people uninstalling packages we need, so if a package is missing when it runs it will install the package again if it can.

So we need to configure and start up this Puppet system, so it can manage:

Two factor authentication software
Directory server connections (for user management)
Secure Shell basic configuration
Backbone VPN settings
A small suite of basic tools (like vim, bzip2, iptraf, zip, mtr and so on)
ClamAV and its settings
Some software repositories we need (our own, Sury, sometimes DotDeb)
Backup software
Firewall software
Monitoring software
Intrusion Protection System (IPS)
Mail server configuration (particularly important if you don’t want the server to send any mail!)
rkhunter (a piece of software for checking for rootkits / unexpected changes to key executables on the system); and
Itself (inception, anyone?)

There’s quite a lot of stuff right there, you can already see there’s a fair bit of bang for your server management buck! We’ll come back to all of that when we look at a server ‘in service’.

Having got Puppet up and running, we have to do some manual configuration of the IPS, it’s complicated and hard to automate so we do it by hand, and we can then move on to the last all important bits, which are…

Backups! We do this two different ways. One is a disk snapshot (AWS call this an ‘EBS snapshot’) which is a picture of the disk in time, stored on some other nearby backup storage, which can be used to rapidly put a broken disk back, if necessary, in the state it was when the snapshot was taken. This is the quickest way to recover from a disaster, but we don’t like having all our eggs in one basket, so having configured snapshots for this server we move on to set up our Duplicity backups.

The Duplicity software encrypts the data of your disk, using GPG technology, and ships the data off site, if you’re on AWS then we send it to Rackspace, if you’re on any other provider we send it to AWS. The cost of the storage taken up by these backups is included in the management fee.

At this point, barring a few minor additional steps and special case tasks, the server is more or less ready to go. We then copy a series of Ansible playbook templates we created to the client’s Git repository, where we’re storing their configuration, alter it as necessary and run it to automatically configure the software the client has ordered on the server. And that’s it, the server is ready.

But wait! Because we take quality seriously and have a quality management system, there’s just one last thing. Another engineer must go over the new server, check it against their own list and approve it before we hand it over.

Interlude: config management

Why Ansible and why not Puppet? We used to use Puppet for everything, however, there are certain parts of the server where it becomes detrimental to be overly controlling. You want developers to be able to change certain configurations to improve the performance of their applications without having to ask a systems administrator, you want clients to be able to manage their own firewall access rules if they wish, what LDAP groups may access their machines, and so on.

The Ansible approach works well for this, because although our core functions are set in Puppet and the same for everyone, we can copy our template Ansible playbooks for each client and then both us and the client can tweak them as necessary. It’s not truly ‘configuration in code’ but rather a reset button for your server. If you mess it up, we can run your playbook and Ansible will put everything back how it was. If you made a change and you didn’t record it in your playbook, that’s on you — you lost it — but you had the flexibility to make it and next time you’ll probably remember to save it in Ansible.

In Part Two we discuss what happens once the server is up and running. If you missed the article I wrote about how we manage cloud services, you can find it here.

Keeping servers healthy

Setting up

Interlude: config management

Written by Greg Harvey