Designing Test Automation

Kev Jackson
THG Tech Blog
Published in
8 min readJul 4, 2023

As part of the development for the Soteria security research project, we needed to build a regression test platform that was independent of the rest of THG’s networking infrastructure.

Unlike almost every other project where it is easy enough to have elastically-scaling VMs or Kubernetes pods, running on the same hardware and networking infrastructure as production workloads, when simulating and analysing DDoS attacks, the option of shared infrastructure was off the table from the start — for fairly obvious reasons!

A Network Lab

The first requirement for the project was to have a separate environment, with distinct physical hardware, where such destructive testing could take place safely. However this environment needed to match (at a design level) the sort of thing that would be commonplace in one of THG’s production data centres. From a hardware point of view this meant “requisitioning” certain servers, NICs, switches etc. and gently asking the DC operations teams to configure them in separate cabinets to match something like the following diagram.

Initial design for DDoS network testing lab

With a facsimile of a production data centre available for our use, we needed to design and build the automation for producing the attack traffic, measuring the impact of our in-house mitigation software on the traffic and capturing the data from each run.

Automation

We had already homed in on using TRex as our DDoS traffic generator as described previously by Robin Cowley:

The next step was to design a system which could:

  1. Build a version of our mitigation software (with various options/flags switched on/off)
  2. Deploy this build to the lab
  3. Perform an attack
  4. Record the results, automatically.

Since we are in the midst of researching the best way to build an attack mitigation platform, including fundamental features such as on which hardware to run the packet analysis fingerprinting algorithms (host CPU — user space, host CPU eBPF — kernel space, smartNIC, FPGA etc).

We defined a matrix of the switches/build-time flags that needed to be set to create the different builds we wanted to test. This matrix is stored as a CSV file and is used as input to drive the generation of each of the variations we want to test.

We also need the automation of these tests to allow us to spot performance regressions when a change in the code introduces additional latency, so each test run needs to be clearly identifiable.

As we are using GitHub and GitHub actions for our usual source control and CI builds, it made sense to hook into this platform as the top-level orchestrator for the automation. The decision was made early on that the team would like to have daily regression test data to ensure that we are continuously focused on system performance. To that end we implemented a scheduled job in GitHub that would trigger the rest of the automation.

Due to the nature of the testing (utilising our own lab hardware), we had to use a self-hosted runner. This is a small application that runs on our hardware and executes the jobs defined in GitHub actions. The main use case for this is to deploy applications inside a corporate firewall despite the automation definitions etc. living in GitHub.

GitHub supply a self-hosted runner package, however this particular software is built with .Net/Mono and has specific operating system and CPU architecture requirements. Given that our research encompasses running our system on an extended version of aarch64 — Morello, which wasn’t supported by the official GitHub software, we needed an alternative.

Luckily this was a solved problem as the other researchers working on CheriBSD had hit a similar issue and had developed a drop-in replacement for the official software. Interestingly, this replacement is easier to deploy than the official package as it doesn’t rely on .Net (there is real value in simple deployments for system administration or “Ops”), which makes it easier to run on Graviton with Amazon Linux 2 (which we struggled to get the official package to install happily on, due to lack of Arm support at the time).

For simple build/lint/test/package actions, it is usually easier (and much more efficient) to rely on the GitHub hosted runners. However when you are working with specific versions of the linux kernel headers (which we are), you need much more control over the exact version of the OS than is provided by relying on the usual:

runs-on:[ubuntu-latest]

Our expanding list of self-hosted runners

Configuring these self-hosted GitHub runners is simple enough and over time we have set up a variety to support the different architectures and OS platforms we need to ensure our code compiles and runs with, including FreeBSD, CheriBSD, Linux and aarch64, x86_64/AMD64, and finally Morello.

Aside: Morello hardware arrives!

Unboxing the Morello system!

We’ve just received the actual hardware from ARM so we’re still using cross-compilation and the cheribuild toolchain until we have access to our Morello system.

Morello desktop system
CheriBSD running on Morello arm64

Back to the GitHub workflow. It has two main steps (ignoring checking the code out of source control). The first step compiles the code with the normal debug flag (to ensure that the code is in a buildable state before proceeding any further), then it configures some needed environment variables before triggering the Python automation.

The second step takes the results generated from the Python automation and commits them to a separate results repository so that they can be stored safely and shared easily.

This means that the real meat of the automation is contained in the python code…

Python Automation

Python automation design

The GitHub workflow acts as an overall orchestrator for the process (as it has easy access to the source code and required secrets). However the bulk of the work is performed by an automated test system designed by Abbey.

The main testing server executes the Python scripts when triggered by the GitHub action. The dependencies are already installed via pip by the GitHub CI workflow, giving the Python script a valid execution environment.

The first thing that happens is an initial setup stage where the automation logs into each of the VMs (or bare metal machines) using Paramiko, to execute clients or install different versions of the software, the ssh-key used to allow connection to each of the machines is stored as a secret in GitHub and passed to the automation by writing the value of the secret out to the standard ssh-key directory (and setting the permissions of the file & directory correctly).

The automation runs through a supplied configuration file (in CSV format) that defines the mix of flags and options for each test:

This whole process involves compiling a custom version of the software with the selected flags enabled, copying the tar.gz file over to the machine, installing it, then running the actual test on the Traffic Generator VM.

The traffic generator is a VM hosting TRex code with an attack script that creates a TCP SYN flood attack:

We use this to perform two distinct “attacks”. The first with 16 bytes as the payload, the second attack uses larger 14.6k bytes payloads. Between each type of attack there is a pause to allow the metrics to return to a baseline before the next attack starts — this is to provide cleaner (and more obvious) results.

The “Packet Forwarding Engine” (running our bespoke DDoS attack mitigation module) is a bare metal machine to allow low-level access to the NICs (and the possibility of FPGA access in the future). Our goal here is to reduce the complexity of device running our software while we’re in the research stages. In the future we will also test with the software running on a VM.

Finally the automation records the beginning and end of each run as an annotation in Grafana dashboards.

Each annotation contains the information of the exact settings

We are using Prometheus’ node_exporter to gather low-level machine metrics on the Packet Forwarding Engine (PFE) machine. However as the team started to dig into the performance of the software, we realised that we needed to expose more data than node_exporter exported.

Abbey provided the additional fixes we needed to the node_exporter project.

Results?

The goal of all this work is to allow the team to have a repeatable process which ensures that we know when a change to the code is having an impact (positive or negative) on the performance of processing the packets in the Packet Forwarding Engine.

To gain any value from this process we must run these tests regularly and store the results from each test run for analysis and inspection when needed. To this end, after the Python automation has completed (running all the tests), the push results step in the GitHub action takes over. This step clones our results repo, compares the state on disk to check if there are any new results (handling the case where the tests failed to complete) and finally commits the new results to the repo.

The status of our automated regression tests (for today at least!)

--

--

Kev Jackson
THG Tech Blog

Principal Software Engineer @ THG, We’re recruiting — thg.com/careers