rakyll/hey Load Testing —enhanced with HDR

Ahmet Soormally
4 min readApr 16, 2020

--

I’ve played with quite a lot of load testing tools out there; they all have their pro’s, con’s and frustrations. Despite all the power, features and capabilities, I always find myself going back to the rather excellent rakyll/hey load testing tool.

It’s simple, efficient, performant, powerful, and was designed as a more modern alternative to Apache’s AB. Why ever look elsewhere one may ask? Well the answer for me is that the summary output, whilst very cool, is a little too simplistic for my use-case.

To illustrate, I will run a quick test against a mock upstream I created go-bench-suite. https://github.com/asoorm/go-bench-suite

docker run --rm -itd --name bench -p 8000:8000 mangomm/go-bench-suite ./go-bench-suite upstream

Now let’s run hey against it

hey http://localhost:8000/json/validSummary:
Total: 0.0402 secs
Slowest: 0.0247 secs
Fastest: 0.0020 secs
Average: 0.0089 secs
Requests/sec: 4981.1345
Total data: 13765 bytes
Size/request: 68 bytes
Response time histogram:
0.002 [1] |
0.004 [26] |■■■■■■■■■■■
0.007 [97] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.009 [26] |■■■■■■■■■■■
0.011 [0] |
0.013 [0] |
0.016 [0] |
0.018 [1] |
0.020 [28] |■■■■■■■■■■■■
0.022 [13] |■■■■■
0.025 [8] |■■■
Latency distribution:
10% in 0.0038 secs
25% in 0.0047 secs
50% in 0.0060 secs
75% in 0.0178 secs
90% in 0.0202 secs
95% in 0.0212 secs
99% in 0.0235 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0012 secs, 0.0020 secs, 0.0247 secs
DNS-lookup: 0.0006 secs, 0.0000 secs, 0.0031 secs
req write: 0.0000 secs, 0.0000 secs, 0.0002 secs
resp wait: 0.0076 secs, 0.0019 secs, 0.0185 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0006 secs
Status code distribution:
[200] 200 responses

A few things which bother me:

  1. Why are there 50 responses that came back within 20ms, when the vast majority or responses were sub 10ms? Now just for the record, 20ms isn’t really slow, but it is a bit worrying when the vast number of calls are super fast.
  2. The granularity of our percentile latencies P99 just isn’t enough. The reality is that 1 in 99 requests, when you are receiving 5k rps (pretty normal these days), means that every second, 30 requests will experience a slow response time. Now imagine a single web page load generates requests for 100 supporting assets. And assuming that some of these are XHR requests — meaning that we are querying some API. Then you have a service mesh in-place, meaning proxying through a scary maze of microservices. What I’m trying to say is that this equates to potentially every single user getting a less than acceptable experience.

This inspired me to dig a little deeper into what I can do with hey to get a little bit more meaningful output that allows me to visualise with a little more granularity what is going on.

I found that you can output the raw metrics from Hey to a CSV dump.

hey -o csv http://localhost:8000/json/valid
response-time,DNS+dialup,DNS,Request-write,Response-delay,Response-read,status-code,offset
0.0171,0.0067,0.0039,0.0002,0.0095,0.0006,200,0.0041
0.0170,0.0066,0.0047,0.0000,0.0099,0.0005,200,0.0042
0.0170,0.0058,0.0044,0.0000,0.0107,0.0005,200,0.0042
0.0197,0.0092,0.0045,0.0000,0.0094,0.0006,200,0.0016
0.0169,0.0065,0.0045,0.0002,0.0097,0.0005,200,0.0043
0.0198,0.0088,0.0053,0.0000,0.0100,0.0005,200,0.0014
0.0171,0.0058,0.0043,0.0000,0.0105,0.0007,200,0.0043
0.0172,0.0062,0.0044,0.0000,0.0104,0.0006,200,0.0041
0.0198,0.0083,0.0045,0.0000,0.0104,0.0007,200,0.0015
--- SNIP ---

Now I have far too much detail. The raw data is almost undecipherable and I have no idea how to process this information.

Introducing Hey-HDR: https://github.com/asoorm/hey-hdr

Hey HDR is a very simple Extract-Transform-Load script (pretty quickly hacked together), which allows you to pipe the output of Hey into it, and it will re-calculate the histogram buckets to measure a granularity of up to 5 nines. But what it will also do is generate some nice graphs to help you visualise that raw data. let’s run that test again, this time with hey-hdr.

hey -o csv http://localhost:8000/json/valid | go run hey-hdr.go -out example
Count: 200
Max: 29.1ms
Mean: 12.904ms
P50: 10.4ms
P95: 26.2ms
P99: 27.9ms
P999: 29.1ms
P9999: 29.1ms
P99999: 29.1ms

So now we can see more clearly where performance degradation occurs.

We also get some files:

cat example.hdr.csv
Value(ms) Percentile TotalCount 1/(1-Percentile)
0.000 0.000000 0 1.00
3.900 0.100000 20 1.11
4.700 0.200000 40 1.25
6.800 0.300000 60 1.43
9.600 0.400000 80 1.67
12.500 0.500000 100 2.00
13.800 0.550000 110 2.22
--- SNIP ---
28.800 0.999995 200 200000.00
28.800 0.999996 200 250000.00
28.800 0.999997 200 333333.33
28.800 0.999998 200 500000.00
28.800 0.999999 200 1000000.00
28.800 1.000000 200 10000000.00

Let’s load example.hdr.csv into HDR Histogram Plot we can now see more easily where performance degradation occurs.

HDR Histogram Plot

But let’s assume that we did some stuff to speed up our response time, how do we compare visually what the improvement is? Simply by running the tests again, and loading both HDR files into the histogram plotter.

Histogram comparison example

And finally, we also get a png scatter diagram output.

This scatter diagram gives us a clue as to what the problem is — or why 50 requests took so long to respond. Maybe it has to do with the 3-way TCP handshake? Once connections are established, and we are able to re-use them, rather than closing them, performance is noticeably better.

Hope you found it useful, I would welcome some PR’s and enhancements to

--

--