Writing an Impressively Fast HTTP Load Testing Tool in Golang for fun and learning!

7 min readFeb 10, 2020

Recently I took some time to reinvent the wheel to understand Golang on a deeper level. To do this, I rewrote software I previously worked on and help to maintain: Autocannon. The repo for this experiment is available here.

Autocannon is a fast HTTP/S 1.1 benchmarking tool written with Node.js, intending to be familiar for those who have already used wrk or wrk2. It is a cross-platform tool that can be run via the command line or using a programmable interface. It is simple to use, understand and extend.

Hitting all of these features for a personal investigation in my spare time would be impossible, so I focused on the following key goals:

Speed; I primarily wanted to compare speed between Node.js and Golang.
Simple to Use (Command Line); I have never really needed the programmable interface of Autocannon, so to keep it simple, I decided to build this for command line usage only.
Reduced Feature Set: There is no need to enable HTTPS support out of the box for a simple thought experiment, so that entire functionality can be scrapped, and many of the features around controlling the flow of requests in Autocannon aren’t needed — we just want to run as fast as possible.
Cross-Platform; I wanted to investigate Golang’s ability to deliver cross-platform tools.

How Autocannon became fast.

https://media3.giphy.com/media/ULhy3hP2EqRDa/giphy.gif

If you’re interested in Node.js, you may want to read the following paragraphs to understand how this speed is achieved, otherwise, feel free to skip to the next section.

As previously mentioned, Autocannon is fast. 50K+ Requests per second fast. While it is built on Node.js, It has been observed to be faster than wrk, a benchmarking tool built with C. The reason for this is many choice decisions in the implementation of Autocannon for optimisation. Autocannon itself is built on a custom HTTP/S client, which is primarily where these optimisations exist. It is believed that the only route to optimise Autocannon further is working within the underlying networking stack, within the OS.

One of the key optimisations for Autocannon is that everything is built within the JavaScript domain, meaning there are no native dependencies. While native dependencies can be very powerful for heavy compute tasks, they come with a non-negligible overhead when the runtime must deviate from JavaScript land into native dependency land. This overhead can then stack up significantly in an application where there are many calls between JavaScript to native code. Additionally, by working only within the JavaScript domain, it enabled the code paths to be optimised by the JS engine optimising compiler. This StackOverflow answer by a V8 developer covers this quite well. When Autocannon was created, there were some primary places where native dependencies could be a slowdown due to the number of calls between the JavaScript to the native domain. These were identified to be in the HTTP client/parser and within the library used for tracking the histogram of benchmarked values.

The HTTP parser was an issue because for every request made, a response needed to be transferred to the native dependency, parsed, and then the parsed values must be returned/exposed to the JavaScript land. The HTTP parser that Node.js uses in the native ‘http’ library is a dependency created specifically for that use case. While this is pretty powerful, it was decided to avoid using this for the reason mentioned, which meant writing a custom HTTP client with another JS-based HTTP Parser. The HTTP parser in use is http-parser-js, which is a JS library specifically created for the use-case as laid out above.

The histogram tracking library was not a major issue, but it was still a consideration due to the number of calls being made for tracking benchmarked data within it. Initially, Autocannon was built on top of native-hdr-histogram which exposed the C library bindings to Node.js. After some time of using this, the HDR Histogram community built and released a TypeScript based version which would expose only JavaScript code, so Autocannon was quickly migrated to HdrHistogramJS.

Another key optimisation of Autocannon is that it made use of HTTP Pipelining. This enabled the reuse of connections for multiple concurrent requests, so connections being dropped and reopened is not an issue like it could be otherwise. Think HTTP keep-alive, but for multiple requests and responses in parallel.

Building this with Golang.

Taken from https://ednsquare.com/story/marshal-and-unmarshal-of-struct-to-json-in-golang------CG5PTJ

Because of the understanding I had of the optimisations of Autocannon, I knew what would be needed to get this Golang Version up to a comparable speed.

The primary way this speed could be achieved is by making use of an HTTP Client that enabled pipelining. In Autocannon, we needed to build this Client ourselves because nothing out of the box enabled it the way we wanted. After some googling for a Golang library for this purpose, I quickly discovered github.com/valyala/fasthttp, which had HTTP Pipelining support out of the box. This simplified things for my implementation significantly, as this Client did most of the heavy lifting. However, as a note, this library is only able to give a rough approximation of the throughput vs Autocannon’s true representation. This is a limitation in how response sizes are calculated. See this issue for details.

The other key component in this tool is the actual tracking of the benchmarked data. In Autocannon, we used HDR Histogram for this and with some quick searching, I discovered a native version for Golang: github.com/codahale/hdrhistogram. However, while trying to use this I discovered a minor issue in comparison to the native or JS versions, and due to it being archived on Github I needed to fork it for a tiny update here: github.com/glentiki/hdrhistogram.

Sidenote: As a responsible OpenSource contributor I’ve reached out to the repo owner on Twitter to ask for it be un-archived and I’ll happily maintain it, if possible :)

Once I had these bits figured out, glueing these together and making it user-friendly in Golang was pretty easy. However, I had some struggles finding useful Command-Line formatting tools, such as an ASCII table, etc. Eventually, after trying a few I discovered github.com/olekukonko/tablewriter works best to meet the autocannon experience.

Building for multiple platforms and doing releases that could be comparable to Node.js was a slightly more intense endeavour and required some investigation, which I’ve written about here. TL;DR: To build a simple release pipeline, I made use of Github Actions, publishing the code I had pushed under git tag. This code was built using xgo for multiple platforms and published using a custom action I created for the purpose: glentiki/xbin-release-action.

Overall, re-implementing the key functionality from Autocannon took me 2–3 evenings, with a lot of that time just spent googling to find the right tools for the job. The code was the easy part.

Overall, I got the key functionality implemented with 250 Lines of Code (LoC), where pretty much 150 LoC is purely for command line input/output formatting. While I can’t stress enough that the following is an unfair comparison and I understand LoC is a troublesome metric, I believe 100 LoC for a Golang tool to minimally replicate what was done with many hundreds more LoC in Autocannon is impressive in many ways. It shows that the utility of the language, the ecosystem of tooling available and the overall experience is very positive and noteworthy.

Conclusion and comparing output and benchmark speeds.

The following screenshot shows the output from running Autocannon-go, and then Autocannon against a simple “Hello World” Fastify server.

In the above image, you can see that the Autocannon-go version is making roughly 59K requests per second where Autocannon is making roughly 56K requests per second.

Both of these are impressive numbers, but again, it strikes me as noteworthy that I could achieve all of this with the goals that I set out with only 250 LoC and a few dependencies.

I would also like to note that part of this personal work was done as validation to see if it could be fair to compare a Node.js program to a compiled (Golang) one and I believe that, yes, this comparison is fair and valid and Node.js performs quite well within the comparison, too. I also want to give a quick shoutout to Matteo Collina for building a strong base and mentoring me in building out the awesome Node.js tool that is Autocannon.

To finish I will say that I have found my recent endeavours with Golang quite positive and something I plan on pursuing it further. There is also some great tooling available out of the box and from the module EcoSystem, and as someone who appreciates all of this in Node.js, I can appreciate having this available to me in Golang.

If you found this interesting I normally keep my experiments documented on my Twitter, so give me a follow if you’re into this sort of thing. The repo for this experiment is also available here if you want to check out the code, and if you think it’s worth it, a star on GitHub would be appreciated 🚀✨

Writing an Impressively Fast HTTP Load Testing Tool in Golang for fun and learning!

How Autocannon became fast.

Building this with Golang.

Conclusion and comparing output and benchmark speeds.

Written by Glen Keane