Using load testing to give us confidence making changes

Vanita Barrett-Smith
Which? Product Delivery
6 min readJun 26, 2024
A screengrab of Apache jMeter

Hello, I’m Vanita, a Senior Product Engineer at Which? and I currently work in the Content Discovery squad. We help guide users to Which? Digital content & services that will help make their lives easier.

Recently, our squad needed to make a change to turn off caching for a certain part of the Which? website. The infrastructure behind this service had recently been boosted and, based on the dashboards, was performing well. It was tempting to ‘just do it’: turn the cache off and see what happened. However, previous performance testing showed that this service used to struggle without a cache, especially with the volume of requests we see in production. We needed more data to give us confidence in making this change.

What is Load Testing and Why Does it Matter?

We’ve all seen websites go down when there’s a big product launch or a new announcement. If a site can’t handle an increase in traffic, users may see slow loading pages or error pages, leading to a poor user experience and potential loss of business.

Load testing is giving your website a stress test. It helps you assess how your website performs when lots of people are using it simultaneously and with varying levels of traffic. It also helps you identify performance bottlenecks and potential points of failure. By identifying issues in simulated traffic load, you can make appropriate changes to be more confident of your website’s ability to handle sudden peaks in traffic. As in this scenario, it can also give you the confidence to make certain changes.

Getting Started with Load Testing

A load testing tool is a software application designed to simulate real-world user activity and assess how a web application performs under varying levels of load. At Which? we currently use Apache JMeter for load testing, but there are many different tools out there, including LoadRunner and Gatling. Regardless of the tool you use, there are some common steps to getting started with load testing.

1. Define Your Test Plan Scenario

Outline what you want to test during the load test. Determine which aspects of your website you want to assess and which user actions you want to simulate. This forms the basis of your test plan.

In this scenario, we wanted to check the impact of turning off the cache for an API service. Therefore, our test plan involved making requests to the correct API route and ensuring we received a 200 HTTP response, even under 2 or 3x the daily average traffic.

2. Define Your Test Plan Load Levels

Decide on the number of virtual users or requests you want to simulate and how quickly you want them to ramp up. It can be difficult to know where to start without any existing data, so try looking at dashboards or logs to gauge what “normal” levels of traffic might look like for your service or application. Also identify “peaks”, for example: if you expect more visitors over Christmas, see what the maximum number of visitors to your site was during that time period.

For example, from the graph below we can see that traffic follows a regular pattern of dipping overnight and then peaking during the middle of the day at about 3x the nightly traffic. So we can start our load test with the lowest average number of requests, increasing it up to the maximum and above to test how the service performs once we start hitting levels of traffic that are higher than normal.

A graph showing the number of requests to an endpoint over time. The graph is very spiky, showing regular spikes throughout the day

3. Set Up Your Test Environment

Ensure your test environment mirrors your production environment as closely as possible. This means using similar hardware, software configurations, and network settings. Aligning your test environment with production helps ensure more accurate test results.

In this test, we made sure that our test environment had the same CloudFront cache set-up as production. We also increased the number of ECS tasks running to match production.

4. Run the Test(s)

Execute the load test and monitor your website’s performance under simulated conditions. You may want to run the test multiple times, with varying load levels, to assess how your service performs under light, medium and heavy traffic. Pay attention to metrics such as response times, throughput, and error rates to identify any performance bottlenecks or issues.

We wanted to test the impact of a specific change, so we ran the test twice for each load level: once with the cache as normal and once with it removed. We kept all other parameters the same so that any changes in the results could be attributed solely to the cache removal.

During the first test we observed long response times even when the system was under relatively normal load. This seemed unusual and, digging into it further, we realised that the slowness was due to a dependent service that was being called by our API. This service hadn’t been adjusted to match production levels and was therefore acting as a bottleneck. Once we’d bumped the infrastructure behind this API, we reran the test.

Findings

This is a summary of our load test results:

The results show that with the cache removed, we do hit a point at which the service starts to show signs that it’s struggling (particularly with the third and fourth test), but even then we’re still getting a response within 2 seconds. The CPU usage is over 50%, but this isn’t a big concern for us as the service is set to auto-scale. The main takeaway here is that it only starts to struggle under a very high request rate, much higher than we have ever seen — under normal load, it performs fine even with no caching.

Although this testing did delay the change we wanted to make, it gave us valuable insights that meant we felt more confident making that change. As a result, we were able to turn off the cache on the 16th April without any downtime or negative impact for users. The graphs below show the point at which we did this — there’s a notable decrease in cache hit ratio (the cache hit ratio includes other services which still have a cache, hence why it doesn’t fall to 0) and a very slight increase in CPU. More importantly, this wasn’t a surprise to us as our load testing had given us an indication of the impact of this change.

A graph showing CPU utilization for a service — about halfway along the CPU utilization increases very slightly
A graph showing cache hit rate for a service. About halfway through, the cache hit rate drops significantly

Load testing is an essential practice for ensuring your website can handle the demands of real-world usage. By following these steps and using the right tools, you can gain valuable insights into your website’s performance and make informed decisions to optimise scalability and reliability. So, take the time to conduct load tests regularly and keep your website running smoothly for your users.

Which? is the UK’s consumer champion, here to make life simpler, fairer and safer for everyone. As an organisation we’re not for profit and all for making consumers more powerful. Read more about Which? on our website.

If you’re interested in championing consumers with us, you can check out our open job vacancies on our careers site.

--

--