Implementing a Load Testing Proof Of Concept at Kooth using K6

Published in

Building Kooth

7 min readFeb 7, 2024

Although compliance requirements can sometimes feel like a checkbox exercise, there are often great opportunities to improve our systems and practices that can otherwise be difficult to pitch against critical business requirements. In this case, they gave us the exciting opportunity to set up a better load testing strategy.

Load Testing

Load testing is a vital aspect of software development where varying user demands are simulated to assess how a system performs under different conditions. It involves stress-testing the application to identify weaknesses and ensure it can handle different loads. It helps to pre-emptively uncover performance issues, possible failures, and areas for improvement, guaranteeing a smooth user experience even during peak demand, and ultimately enhancing the overall reliability of the software.

There’s a variety of different load tests that can be performed (see image below), however for this project we decided to focus on “average-load” and “stress” tests. Average-load tests, as the name suggests, will assess the performance of the system under normal conditions. Stress tests, on the other hand, will assess how the system performs at its limit, simulating loads beyond the anticipated average.

*Image: Graph showing 6 different types of load tests, depending on how the number of virtual users (VUs) varies over time. Source:* *https://k6.io/docs/test-types/load-test-types/*

Our Goal

While we had run load tests at Kooth before, they were quite limited, and we often faced challenges integrating with a third party. We wanted to use this opportunity to explore new tools we could use to build a new load testing strategy from the ground up that meets strict compliance requirements and that we can continue to extend in the future.

However, given we had very limited resources to dedicate to this project, we wanted to set some clear goals for an initial proof of concept:

Investigate and select a suitable tool;
Focus on testing both peak and typical loads only (stress & average load tests);
Focus on API testing, particularly endpoints that put heavy loads on our databases;
Keep user journeys simple, prioritising key endpoints; and
Produce a clear data output that can be easily used for reporting to monitor changes over time. It must show how error rates ramp up under additional load.

*Image: K6 by Grafana labs logo (source:* *https://k6.io/)*

Why K6?

Once we had clear objectives, the next step was deciding on a tool to use for running load tests. After some investigation and discussion we finally landed on K6 for a number of reasons:

Javascript & Developer Experience

A key selling point of K6 for us was that all load tests are written in JavaScript. That perfectly aligns with our stack at Kooth, which primarily consists of TypeScript (React & Node.js). This meant that any of our engineers would be able to contribute with minimal onboarding. Using JavaScript allows us to store all load testing scripts in code, which is not possible with other tools that have their own dedicated syntax or UI like JMeter. We also felt K6 was better geared towards our team structure, where we don’t have an explicit QA role and testing is everyone’s responsibility, carried out by full-stack engineers. Finally, the popularity and large community support for K6 was another key advantage.

Flexibility

K6 provides extensive flexibility in the types of tests that can be conducted, allowing us to define precise changes to the system load over time, separated into ‘stages’. It enables the implementation of all six load test variations depicted in the graph above. Being script-based and using JavaScript gives us greater flexibility when writing our tests. This can be extremely helpful when navigating authentication hurdles or executing intricate test scenarios.

Additionally, K6 provides open-source extensions that enhance its functionality, enabling features such as dashboards, integration with Kubernetes (K8s), and the ability to send output data to specific services like Prometheus.

Finally it also offers a tool for ‘Hybrid’ testing, that allows for front-end testing. However, we discovered that this feature is still in its early development stages and lacks maturity, with the syntax for querying elements on the page proving to be problematic and requiring workarounds. This is a feature that we will continue to closely monitor and hope to adopt in the future, as it would be beneficial to streamline our testing process by consolidating both our front-end and back-end tests into a single tool.

Cloud compatibility & Grafana

Although K6 can be very easily set up to run on a local machine, which made a working Proof Of Concept a feasible objective for this project, it can also be ported to cloud environments. Being owned by Grafana also raised our hopes that we should be able to integrate our load test results into our other monitoring Grafana dashboards.

What to test?

We decided to do a deep dive into our existing data using Honeycomb. Honeycomb is an observability platform that gives us detailed insight into our APIs. This helped us determine which endpoints we wanted to hit in our tests and what load to put the system under.

We explored a variety of metrics we could use identify the endpoints to initially focus our efforts on, the key ones being:

Count of API calls per endpoint
Average duration of API calls per endpoint
Sum duration of API calls per endpoint

After investigating these metrics, we settled on using “sum duration of API calls per endpoint” to identify the endpoints to test. This takes into account both the most hit endpoints and the slowest endpoints to show which endpoint is taking up the most resources. Since we were focusing on database performance, as this typically has the most impact on overall performance, the most hit endpoints aren’t necessarily the ones that take up the most load on the database if they are very efficient and the slowest endpoints might not get called very often.

We looked at these over the spread of a week in order to get a fair representation of what our load looks like accounting for different days of the week. Also, we made the assumption that a longer running api call often is due to longer running queries in the database. This is something we would like to validate in future iterations but served us as an initial guide to build tests. We also noted what the typical and peak loads on these endpoints were so we could replicate these in our K6 tests.

Visualising load tests

By default K6 outputs its data into a json file as well as displaying a summary directly in the terminal. Although these are highly detailed, we wanted to have a more visual and intuitive approach to interpreting our load test results. For this we decided to use the extension xk6-dashboard. This gave us a graphical representation of our load tests in real time as well as providing a dashboard as an output.

*Image: xk6-dashboard showing a stress test as throughput is being ramped up in real time.*

Some of the key metrics we focused on include:

Throughput over time — Helps us monitor the load we are applying on the system over time. Can help us identify the type of load test that is being conducted.
Request duration over time — As the load increases we aim for request duration to not increase significantly, unless we are carrying out a breakpoint test.
Error rate over time — This can alert us to issues and when combined with “throughput over time” can tell us at what loads our system starts to encounter issues.

*Image: xk6-dashboard showing an increase in error rates as the system reaches its breakpoint.*

Further areas of improvement

While we were able to successfully produce load tests results, there were a number of areas that we’d like to explore in further iterations.

Scaling & load testing environments

In order to make the most out of our load test results, we would like to run our load tests in the cloud, automatically at regular intervals. This will allow us to constantly capture and monitor the performance of our systems as we make changes.

It would also be beneficial to have a dedicated load testing environment that we can temporarily scale to match the resources we have available in our live environment, to make our tests as accurate as possible. Once the load tests are done, we can automatically downsize our systems in the load testing environment to minimise costs. While we don’t want to run load tests against a live environment to avoid polluting the database with test data, we do want to run tests that match our real average & peak loads.

Data visualisation & storage

The dashboards in xk6-dashboard offer a detailed view of an individual load test. However, they are not as useful when running a large number of load tests for different user journeys or to visualise how our load test results vary over time. We need to explore better ways to aggregate this data in the future and present it in an intuitive way. We also aim to integrate our load test output with our existing Grafana dashboards, in order to keep our performance monitoring in a centralised place. This will also require decisions around which load tests to capture and how frequently we should run them.

Extend to more complex user journeys

Finally, we will also need to consider how to extend these load tests to more complex user journeys. Some areas of our system require more complex authentication methods for accurate testing. We also will need to consider data clean up, which we were able to avoid by focusing mostly on testing GET requests rather than POST, meaning our load tests didn’t write any data to our systems.