Load testing and improving the performance of my HTTP Server

15 min readFeb 28, 2019

The last few tickets I had to tackle as part of my HTTP Server project were to do with load testing and performance improvements. While they did prove to be my least enjoyable tickets, and turned out to be an endless source of frustration, I’m hoping this post will force me to take a better look at what I did not understand, and hopefully create a primer for the next time I might have to tackle a similar issue (please, please, please, let it be never…)

What on earth is load testing?

Wikipedia could not have put it more succinctly than I could, so let me copy/paste the definition it gives me:

Load testing is the process of putting demand on a system and measuring its response.

Just the same way pieces of furniture from Ikea are repeatedly punched by machines to test how well they cope with repeated use, load-testing — in the context of a server — will involve sending a bazillion requests and see what the server says.

What exactly are we interested in when we test the performance of a server?

There are several things that we can measure for and prioritise depending on our goals, such as scalability, reliability, or resource usage (thanks, Wikipedia). There are also different types of testing that can be conducted, such as stress testing and isolation testing, but I’m going to focus on load testing in this article.

In my case, I was to load-test my server so that ultimately, I could find a way to ensure memory usage would not increase during a performance run, regardless of the stress I was putting my server under. I suspect another reason why I was given this task was to shed more light on the architecture of my codebase, as it is often the reason behind poor performance.

Let’s look at the tickets I was set, and delve deeper into how they could help me improve my server:

If you’ve read this ticket and thought “what fresh hell is all this?”, join the club. ApacheBench (ab) is a tool to measure the performance of HTTP servers and I’ll talk about it later when I tackle tools. If you’re on the latest version of Mac OS, it comes prepackaged with it, so you do not need to install it (there, hopefully I saved you from the 20min I spent trying to work out how I could install it when I already had it).

Simulating concurrent requests just means that the tool will pretend to be 50,000 users all trying to use my server at the same time. The goal is for all these users to be able to use my server without any issue, so they should all see their page appear in front of them, which a 200 response should guarantee.

This highlighted the fact that my server was not equipped to deal with this many requests at once, and the aim of the ticket was therefore to go back to my codebase, identify how I could improve that, and implement it. I will be discussing my solutions later in this post.

The first time I saw this ticket, I thought I’d understand it after tackling the ticket before that one. Then I did tackle the previous ticket, and found myself still a little baffled by what I was supposed to do, because I wasn’t sure how I was meant to use the 1GB file. Eventually, I merely needed to send a request to my large file (http://localhost:8080/verylargefile.pdf) and find a tool that would show me a graph of it as it was happening. I was told to use VisualVM, which I will also discuss later.

However, this is were it got complicated, because while I understood what I was meant to achieve (ensuring that a large number of requests to see a large file would succeed), figuring out how to do so proved tricky, as I couldn’t figure out how changing my code impacted the performance of my server.

Here, all I needed was a graph that showed a straight horizontal line when I ran my performance run. Easy, right? Well, kind of. Most of the groundwork ended up being completed during the previous ticket, and this ticket only required some final touches that would smooth that graph over.

Ultimately, the last two tickets forced to to look at how I was writing my files to the OutputStream and figure out a way to improve it so that I could serve successful requests to large numbers of concurrent requests.

What does it mean to improve performance?

Performance is usually measured according to a number of “key performance indicators”, of which the following are examples:

Availability checks for outages. If the user cannot use the application because it is not responding, you probably want to optimise your code for that, as it could be costly to a company that conducts its business online to be unavailable to its customers.
Response time checks how long it takes an application to respond to a user request. Slow pages make for frustrating user experiences, and may also lead them to stop using the service is the problem is frequent.

These two indicators represent service-oriented indicators, as they visibly impact application users. In contrast, the following two indicators are efficiency-oriented indicators, and measure how the application uses the hosting infrastructure:

Throughput is the rate at which application-oriented events occur, for example the number of requests made to a web page in a given amount of time. How many use cases can be handled simultaneously during a given time period, or how fast these use cases are handled fall under throughput-related concerns.
Utilisation is the percentage of the theoretical capacity of a resource that is being used, such as the amount of memory used by a server being hit with a large number of requests.

It is those efficiency-oriented indicators that I was tasked with focusing on as part of my project, and that I will focus on through the rest of this post.

In order to understand better how performance can be improved via efficiency-oriented indicators, let’s look at what poor throughput and utilisation mean and how they can be identified.

A reduced throughput can often be indicative of “the capacity limitations of an application, notably in the web or application server tier”. This means that as the number of requests increases, your application has less and less capacity to handle the requests, and may end up crashing. You can observe this by using a graph tool to follow the throughput results (ie. the requests being sent to the server).

Utilisation, interestingly, goes the other way. While we might want as much throughput as possible, or at least an increased throughput, it’s usually preferable to have an application use less of the capacity of a resource, as it may reduce costs, and is also likely to ensure a better service when user numbers increase. Steadily decreasing free space (it will show on your graph as an upward curve) can indicate a memory leak, which will need to be identified and sorted to improve utilisation.

What tools can I use to measure performance?

As I was working on these tickets, I worked with the three following tools to measure performance:

ApacheBench — a CLI tool which measures the performance of HTTP servers
Apache JMeter — an application designed for load testing and measuring performance
VisualVM — a tool providing a visual interface to observe applications while they are running on the Java Virtual Machine.

JMeter

I will not dig very deep into JMeter, which I only used when I faced issues with ApacheBench. I ended up running a test plan with it (I found this post and this post to be quite useful to set it up), which allowed me to see at what point my requests would time out, a level of granularity ab wouldn’t give me as it would only tell me it would time out.

ApacheBench

Now let’s have a closer look at the ab command in my very first ticket (ish — I changed the port): ab -n 50000 -c 100 http://localhost:8080/

ab invokes the programme
-n denotes the number of requests to perform (50, 000 in this case)
-c denotes the concurrency, or number of multiple requests to make at a time. Here, I should have 100 requests running at a time, until I get to a total of 50, 000 requests.
http://localhost:8080/ is the page I am testing my server with.

This is what a successful result looks like:

$ ab -n 50000 -c 100 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/Benchmarking localhost (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requestsServer Software:
Server Hostname:        localhost
Server Port:            8080Document Path:          /
Document Length:        0 bytesConcurrency Level:      100
Time taken for tests:   107.157 seconds
Complete requests:      50000
Failed requests:        0
Total transferred:      45500000 bytes
HTML transferred:       0 bytes
Requests per second:    466.61 [#/sec] (mean)
Time per request:       214.314 [ms] (mean)
Time per request:       2.143 [ms] (mean, across all concurrent requests)
Transfer rate:          414.66 [Kbytes/sec] receivedConnection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2 175.5      0   19583
Processing:     0    0   0.2      0      21
Waiting:        0    0   0.2      0      20
Total:          0    2 175.6      0   19587Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      1
  99%      1
 100%  19587 (longest request)

You will notice the fact that it took a while to complete, but that the Failed requests field resulted in a lovely little 0. Interestingly, I have not been able to consistently reproduce this result, mostly because it was a fluke. Not from my server, but because Mac OS and ApacheBench don’t play very nice together, and because the “default ephemeral port range on osx is 49152–65535, which is only 16,383 ports” so my performance run usually yields the following result:

$ ab -n 50000 -c 100 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/Benchmarking localhost (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
apr_socket_recv: Operation timed out (60)
Total of 16505 requests completed

I am unsure why it ever worked in the first place, although I’ve also had good results using the -k (Use HTTP KeepAlive feature) and -r (Don’t exit on socket receive errors.) flags.

So how can I interpret the results ab has given me? The first block progresses as the test is running, indicating the milestones completed and errors or success encountered:

Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests

The next block gives you detail of the server and document. In this case, I called localhost:8080 with a path of /, which corresponds to a directory, hence the Document Length of 0.

Server Software:
Server Hostname:        localhost
Server Port:            8080Document Path:          /
Document Length:        0 bytes

Calling the path /textfile.txt results in the following:

Document Path:          /textfile.txt
Document Length:        903 bytes

The block coming next will recap details of your test: the concurrency level (which I’d set to 100), the number of requests to send (set to 50,000), the total time the test took (107.157 seconds), etc. The four fields following HTML transferred offer some statistics about the performance run, and display the mean time and number of requests it took the test to complete:

Concurrency Level:      100
Time taken for tests:   107.157 seconds
Complete requests:      50000
Failed requests:        0
Total transferred:      45500000 bytes
HTML transferred:       0 bytes
Requests per second:    466.61 [#/sec] (mean)
Time per request:       214.314 [ms] (mean)
Time per request:       2.143 [ms] (mean, across all concurrent requests)
Transfer rate:          414.66 [Kbytes/sec] received

Up next are the connection times. This part confused me a fair amount, mostly because I wasn’t sure what was good and what was not. Connect and Waiting are the amount of time it took to establish the connection and get the first bits of a response. Processing corresponds to the server response time, i.e. the time it took for the server to process the request and send a reply. Finally, the Total time is the sum of the Connect and Processing times. If you would like to see a timeline for it, I recommend reading this StackOverflow answer.

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2 175.5      0   19583
Processing:     0    0   0.2      0      21
Waiting:        0    0   0.2      0      20
Total:          0    2 175.6      0   19587

The thing that really baffled me here, however, is the number 19583. Looking at the mean, it is considerably higher for the Connect time than it is for the other times. If, like me, you know next to nothing about stats, here’s the primer for dummies. sd stands for standard deviation, which is a quantity that shows by how much the members of a group differ from the mean value of the group, and it used to determine how accurate the testing is. In this case, you can see the deviation from the group is quite large, indicating something wasn’t quite right.

While I am not sure why that was the case (I suspect it might have been to do with the fact that it took a while to send 50,000 requests), but potential solutions include re-running the test just after having rebooted the server, or “warming up” caches by running the test a few times and discarding those results, as suggested in this StackOverflow response.

Finally, the last block shows you the percentage of requests served within a given timeframe, which will allow you to formulate a response-time distribution graph, should you so desire:

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      1
  99%      1
 100%  19587 (longest request)

VisualVM

As I mentioned earlier, VisualVM is a tool that will allow you to represent the processes of your Java application in graph form and in real time, and end up with something looking like this:

In order to use it, all that’s needed is to install it, open it and select the Java process you would like to observe. Selecting the Monitor option will give you access to the real time graphs. I have found this guide to be very helpful in understand how VisualVM worked.

The top-left corner is the CPU graph, which shows the application’s usage of your machine’s resources as a percentage.
The bottom-left corner is the overview of the number of Classes loaded on the virtual machine.
The top-right corner is the heap graph, which shows Java memory usage in real time. Heap memory is used by all parts of an application, and stores all newly-created objects, which are globally accessible. If you remember my very last ticket, I needed to provide a graph showing that the memory usage did not increase during the performance run (which you can see above).
You can see that the heap memory line isn’t completely flat. This is because memory isn’t freed as soon as it is not in use anymore. Rather, Java will wait until there’s enough bits of memory that aren’t in use, and clean them up all at once, which results in slight dips and peaks.
The bottom-left corner shows the Threads graph, which shows the number of threads that live on the VM (I’ll let you do your own research, as I did not have to focus on that).

Beyond observing my server during the tests, I have no needed to use VisualVM’s other capabilities, and therefore cannot comment on its other possible use, so I will instead move on the most interesting part of this post.

How I improved the performance of my HTTP Server

Threads

I made incremental changes to my codebase as I followed the tickets I was given. The first one forced me to update my ServerRunner to implement threads. This change meant that I was able to deal with concurrency more easily, as my users would not be queued one after the other anymore.

As mentor put it, think of it as an airport security line. If you only have one, but you have a lot of users, it will clog up quickly and people will get restless. Open more lines, however, and you can redirect the flow of users to another line, meaning you can now deal with more users at once. This is exactly what implementing threads did.

This is what my class looked like originally:

You can see that in my startServer method, I use a while loop that implements a try/catch that includes and if statement. This is fine if I’m dealing with small numbers of users, but as soon as I needed to serve more requests, it was clear I would need to use threads, which I eventually did:

I’ve shrank my startServer method drastically in order to use an executor. A Java executor takes a Runnable object, that implements a single run method in which I’ve extracted my try/catch statement.

You will notice on line 17 that my executor is this Executors.newCachedThreadPool(); which allows me to create new threads as needed, while reusing previous threads that are not in use. That way, my pool of threads of automatically scaled to the number of requests my server receives.

Now every time my server receives a request, it creates a socket connection (L30), and calls run on that connection, which triggers getting an input, parsing that input and sending a response for every request that is received.

Backlog

The second thing that helped me was to set a backlog on my ServerSocket. Yeah, I know “wat?” — let’s look more closely:

Previously, I created my ServerSocket this way:

new ServerSocket(portNumber);

It took a port number and off it went, into the wild. Now, I’ve added one more argument when creating a new ServerSocket:

new ServerSocket(portNumber, 500);

That 500 is the backlog. By default, that backlog is set to 50, which isn’t very much. However, when you increase that number, it increases the maximum number of clients waiting to accept a connection (it does not, however, increase the number of connections that can be handled). That way, it becomes easier to queue the number of requests (assuming each request is a new client), and used in conjunction with other refactors such as adding threads, it should help your server handle more requests.

File handling

My last refactor was also my most important one. It touched more parts of my codebase as I had to change the type of my Body object. Where I was previously using a byte[] type, I needed to change it to a FileInputStream type. As I was previously doing File.readAllBytes(file); to read my bytes, this would add the entire file to the heap memory space (remember heap from VisualVM? It’s the same guy) every time I called the method (so every time a request was made).

I have since moved to using FileInputStream to allow me to stream the file, instead of loading it into memory every time. By streaming the file, I only add a defined number of bytes to my heap memory (1024 bytes) at one time. My process will consume the first 1024 bytes of my file, then get rid of those bytes, and load the next 1024 bytes, and do so until it has consumed the whole file.

That way, when I need to serve a 1GB PDF file to 100 clients, instead of using 100GB of heap memory (100 * 1GB), I only use 0.0001024GB (100 * 1024 bytes) at one time. Cool, right?!

And this concludes my short foray into performance testing and improvements. These three tickets have exposed me to some interesting aspects of architecting code that I hadn’t had to consider before, but also to the vast and utterly confusing world of performance testing. Ultimately, what I struggled with the most was understanding what my performance tests were telling me about my code, and translating these issues into refactors that would make a noticeable impact on future tests.

When running tests and looking at the graphs and ApacheBench results, it was not very clear to me what was going wrong (is this number being high a good thing? A bad thing? Is the graph line too high? Too squiggly? Are the errors in my server important? Do I need to mitigate for that?), and it’s clear it will take a few more tried before I feel confident in my abilities to test an application’s performance!