Intentional denial of service — Load Testing

8 min readJan 12, 2018

At Unbabel, we find ourselves with the power to change the world. As incredible as it sounds, we must first ensure that we don’t end up destroying it in the process. Like Peter Parker’s uncle once said, “with great power comes great responsibility”.

Everyday we create and enhance systems and applications which must not only work with all that has come before, but become even more performant and robust overall. While we can have unit test coverage, perform in-depth end-to-end system integration and exploratory tests to ensure that our changes are functionally correct, we need to know that we’re on the right track in producing a scaleable solution, and not just making a different version of what came before.

The path to perfection is long and never-ending, but to better understand where we are along it, we use Load Testing. A non-functional test technique, it allow us to assess how an application behaves under a high volume of transactions, as if all your users started making requests to the system at the same time. This way we can stimulate errors in a controlled environment, validate whether our applications can handle random peaks of high usage, and if they degrade gracefully, or succumb to the load and not-so-gracefully crash.

How do we do it

At Unbabel, we start by identifying which services may constitute bottlenecks in our pipeline; either because of their extreme complexity, or because they are known to be resource incinerators (CPU, RAM, Disk IO, etc.).

When we identify a service that falls into one of these categories, we start with some hypothesis for analysing it. Let’s make a hypothetical (and more simplistic) makeover of a service that receives 2 sets of data as input. The makeover bit isn't important here, so we will skip over those details. This service must return a response in less than 300 ms and we estimate that it should be able to satisfy 400 requests per second.

In terms of tools used, we chose JMeter, an open source tool from the Apache Foundation which we use to load, measure and analyse the performance of our services.

We start by creating a test plan that will be used to validate our hypothesis and later to compare results.

On JMeter:

1 . Add a Thread Group to our test plan

2 . Add a HTTP Request sampler to our Thread Group and configure it to perform requests on our service

Did you notice that we added ${set1} and ${set2} on our input request? This is how we make our test plan data-driven. Let’s create a CSV file on the same folder of our test where we set different values for each test scenario.

This file has 3 columns: set1 and set2 to send as input and the result to assert the request’s response.

3. Add a CSV Data Config Element to allow us to use the CSV file

4. Also add an JSON Path Assertion and set it to assert the output’s value as below

5. We must also add another config element, the HTTP Header Manager, to our test plan so we can set the Content-Type from our request as application/json

6. Let’s add a View Results Tree Listener so we can run the test and check the result of the request and hit the run button to check if it’s working:

7. Finally, we add the other additional listeners so we can analyse the results:

Please note that these listeners are available on the “Graphs Generator Listener” plugin, so you have to install this plugin either by installing in the JMeter’s plugin manager or by downloading it on the plugin page, unzip it and drop it on jmeter folder “jmeter/#version/libexec/lib/ext/”.

Now we are ready to run the test!

We start by analising the results with the Summary Result Listener:

Summary Result

We ran 100K samples with no errors, with an average response time of 347 ms. The standard deviation on response time is around 23ms.

In JMeter, Throughput is the ratio Requests/Time and in this report we can see the average throughput is 270 rps (requests/transactions per second).

Although this already hints that our service doesn’t satisfy the performance requirements, we should check our Over Time results so we can understand better what’s wrong and find out what to improve.

Transactions Per Second Over Time

In the Transactions Per Second Over Time listener, we can see that the throughput is not evenly distributed. Despite being able to reach the 300 rps mark, we were not able to keep it steady, which means we are not able to process the required throughput.

Response Times Over Time:

As the name indicates, this represents the time that was spent connecting, processing and receiving the requests’ overtime. We can check that response times are steadily above 300 ms, which means we are also not complying with the requirements.

Response Times vs Threads

This listener is used to understand the effort that an additional concurrent user brings to our application and to check when our performance starts degrading. We may accept that a higher number of concurrent threads/users may lead to a small increase of the response time, but ideally, it should be steady with almost no oscillations. In our results, we swing between 320 ms and 365, which is acceptable but indicative that we need to get our service faster.

Connect Times Over Time:

This graph illustrates the time elapsed until we connect with the server, including SSL handshake, per request over time. Connect Time can impact significantly the response time but it also depends on other factors such as our internet connection.

As we can see, it takes at most 25 ms to connect to the server, so it looks like we are not facing any problems on this chapter.

Response Codes per Second

As we didn’t have any errors on our requests, this graph is the same as the Transactions per second, but the Response Codes per second is useful to assess that our system can easily recover in case of an error. If we observe that after an http error code like 400’s, the number of code 200’s decreases, we should take a look on how we recover from errors.

The Refactor

After analysing our code, we found out that we made it overly complex and we were able to simplify our service and enhance our code to be more efficient.

After assuring that the service was functionally as expected, we need run the load tests again to check if our refactor works as expected.****

So let’s check the results:

This is our summary report after the changes:

Now the throughput average is at 450 rps and the average response hits the 200 ms with no errors at all.

So we’re compliant with the requirements! Aw yeah!

Still, let’s assure we’re still not sure and look further.

Transactions per second

This is the current behaviour of our service on load. We managed to perform the same 100K request on 3:40 min.

As this is an academic example, we didn’t stretch up the period under stress, but in real life we should. The easiest way is to do it is to go to the test plan’s Thread Group and increase the loop count either by setting a longer time period to continuous loop, or simply by setting a higher number on loop count.

Response Times Over Time

With the exception of 3 outliers, we managed to keep our response times well under 300 ms. So this is also a check on our list.

Continually increasing over time is a strong sign that our service has a bottleneck that needs to be fixed.

Response Times vs Threads

We’ve enabled our server to handle 100 concurrent users in under 300 ms. Still, we could try to understand at how many users we start having a hard time managing. To do this, we need to increase the thread number and the ramp up period on our thread group configuration.

Connect Times Over Time:

Connect times were also as expected, close to 0 with a few acceptable outliers.

If in your tests you get high Connect Times in relation to the total Response Time, try to ping other servers. If the ping time is significant, it may indicate that your internet connection has impact in your Connection Time, so you should also check your connection.

Response Codes per Second

As expected, due to no errors, we ended up again with a graph similar to the Transactions per Second graph. The less colourful this graph is, the happier we are.

To sum up:

We covered some of the most common aspects of load tests:

Assessing non-functional requirements as minimum requests per seconds, peak usage and average response time on Transactions Per Seconds, Response Time Over Threads and Response Times Over Time;
Looking for bottlenecks on our services by analysing Response Times Over Threads;
Determining if our tests are being affected by poor internet connection with Connect Times Over Time;
And finally, checking if our system can handle errors and is able to recover with Response Codes Per Second.

Emanuel Velho
QA Engineer @ Unbabel

Intentional denial of service — Load Testing

Written by Emanuel Velho