rakyll/hey Load Testing —enhanced with HDR

4 min readApr 16, 2020

I’ve played with quite a lot of load testing tools out there; they all have their pro’s, con’s and frustrations. Despite all the power, features and capabilities, I always find myself going back to the rather excellent rakyll/hey load testing tool.

It’s simple, efficient, performant, powerful, and was designed as a more modern alternative to Apache’s AB. Why ever look elsewhere one may ask? Well the answer for me is that the summary output, whilst very cool, is a little too simplistic for my use-case.

To illustrate, I will run a quick test against a mock upstream I created go-bench-suite. https://github.com/asoorm/go-bench-suite

docker run --rm -itd --name bench -p 8000:8000 mangomm/go-bench-suite ./go-bench-suite upstream

Now let’s run hey against it

hey http://localhost:8000/json/validSummary:
  Total: 0.0402 secs
  Slowest: 0.0247 secs
  Fastest: 0.0020 secs
  Average: 0.0089 secs
  Requests/sec: 4981.1345Total data: 13765 bytes
  Size/request: 68 bytesResponse time histogram:
  0.002 [1] |
  0.004 [26] |■■■■■■■■■■■
  0.007 [97] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.009 [26] |■■■■■■■■■■■
  0.011 [0] |
  0.013 [0] |
  0.016 [0] |
  0.018 [1] |
  0.020 [28] |■■■■■■■■■■■■
  0.022 [13] |■■■■■
  0.025 [8] |■■■Latency distribution:
  10% in 0.0038 secs
  25% in 0.0047 secs
  50% in 0.0060 secs
  75% in 0.0178 secs
  90% in 0.0202 secs
  95% in 0.0212 secs
  99% in 0.0235 secsDetails (average, fastest, slowest):
  DNS+dialup: 0.0012 secs, 0.0020 secs, 0.0247 secs
  DNS-lookup: 0.0006 secs, 0.0000 secs, 0.0031 secs
  req write: 0.0000 secs, 0.0000 secs, 0.0002 secs
  resp wait: 0.0076 secs, 0.0019 secs, 0.0185 secs
  resp read: 0.0001 secs, 0.0000 secs, 0.0006 secsStatus code distribution:
  [200] 200 responses

A few things which bother me:

Why are there 50 responses that came back within 20ms, when the vast majority or responses were sub 10ms? Now just for the record, 20ms isn’t really slow, but it is a bit worrying when the vast number of calls are super fast.
The granularity of our percentile latencies P99 just isn’t enough. The reality is that 1 in 99 requests, when you are receiving 5k rps (pretty normal these days), means that every second, 30 requests will experience a slow response time. Now imagine a single web page load generates requests for 100 supporting assets. And assuming that some of these are XHR requests — meaning that we are querying some API. Then you have a service mesh in-place, meaning proxying through a scary maze of microservices. What I’m trying to say is that this equates to potentially every single user getting a less than acceptable experience.

This inspired me to dig a little deeper into what I can do with hey to get a little bit more meaningful output that allows me to visualise with a little more granularity what is going on.

I found that you can output the raw metrics from Hey to a CSV dump.

hey -o csv http://localhost:8000/json/valid
response-time,DNS+dialup,DNS,Request-write,Response-delay,Response-read,status-code,offset
0.0171,0.0067,0.0039,0.0002,0.0095,0.0006,200,0.0041
0.0170,0.0066,0.0047,0.0000,0.0099,0.0005,200,0.0042
0.0170,0.0058,0.0044,0.0000,0.0107,0.0005,200,0.0042
0.0197,0.0092,0.0045,0.0000,0.0094,0.0006,200,0.0016
0.0169,0.0065,0.0045,0.0002,0.0097,0.0005,200,0.0043
0.0198,0.0088,0.0053,0.0000,0.0100,0.0005,200,0.0014
0.0171,0.0058,0.0043,0.0000,0.0105,0.0007,200,0.0043
0.0172,0.0062,0.0044,0.0000,0.0104,0.0006,200,0.0041
0.0198,0.0083,0.0045,0.0000,0.0104,0.0007,200,0.0015
--- SNIP ---

Now I have far too much detail. The raw data is almost undecipherable and I have no idea how to process this information.

Introducing Hey-HDR: https://github.com/asoorm/hey-hdr

Hey HDR is a very simple Extract-Transform-Load script (pretty quickly hacked together), which allows you to pipe the output of Hey into it, and it will re-calculate the histogram buckets to measure a granularity of up to 5 nines. But what it will also do is generate some nice graphs to help you visualise that raw data. let’s run that test again, this time with hey-hdr.

hey -o csv http://localhost:8000/json/valid | go run hey-hdr.go -out example
  Count: 200
    Max: 29.1ms
   Mean: 12.904ms
    P50: 10.4ms
    P95: 26.2ms
    P99: 27.9ms
   P999: 29.1ms
  P9999: 29.1ms
 P99999: 29.1ms

So now we can see more clearly where performance degradation occurs.

We also get some files:

cat example.hdr.csv
Value(ms)  Percentile  TotalCount  1/(1-Percentile)
0.000      0.000000    0           1.00
3.900      0.100000    20          1.11
4.700      0.200000    40          1.25
6.800      0.300000    60          1.43
9.600      0.400000    80          1.67
12.500     0.500000    100         2.00
13.800     0.550000    110         2.22
--- SNIP ---
28.800     0.999995    200         200000.00
28.800     0.999996    200         250000.00
28.800     0.999997    200         333333.33
28.800     0.999998    200         500000.00
28.800     0.999999    200         1000000.00
28.800     1.000000    200         10000000.00

Let’s load example.hdr.csv into HDR Histogram Plot we can now see more easily where performance degradation occurs.

But let’s assume that we did some stuff to speed up our response time, how do we compare visually what the improvement is? Simply by running the tests again, and loading both HDR files into the histogram plotter.

And finally, we also get a png scatter diagram output.

This scatter diagram gives us a clue as to what the problem is — or why 50 requests took so long to respond. Maybe it has to do with the 3-way TCP handshake? Once connections are established, and we are able to re-use them, rather than closing them, performance is noticeably better.

Hope you found it useful, I would welcome some PR’s and enhancements to

rakyll/hey Load Testing —enhanced with HDR

Written by Ahmet Soormally