K6 benchmark observations

George Shuklin
OpsOps
Published in
1 min readMay 20, 2024

Grafana K6 is an extremely complex (but nextgen compare to ab/wrk) program. I just started to dive into it, and all I have insofar, is a set of observations for performance limits.

The key issue with k6 is not to overload it. Otherwise, it will start to report own problems in metrics, instead of system under test.

My preliminary limits for high-intensity testing:

  • Number of VUs: about 4x of HT thread count for CPUs (or 8x of real cores)
  • ratelimit: about 50k for an old dual CPU system (2x Xeon E5–2630 v3)

It is possible to push limit a bit higher (50k -> 60-70k), but latency start to crawl up, and it crawl up only because of k6, so it’s no longer a good benchmark. You need many servers under k6 to create a good load. (I’m currently ordered 40 baremetal servers to test one Cilium cluster of two medium-beefed servers)!

How to get those numbers?

I thought about an algorithm for doing it, and what I invented:

  1. Establish SUT (system under test) superiority over generator. generator is on the verge, SUT is relaxed and underloaded.
  2. Finetune parameters to get low latency/highest sustained stable performance for a single generator.
  3. Establish generators superiority over SUT by running as many servers with generators as needed
  4. Reduce load on generators a bit (10–20%) to give them some slack.

The result is that every generator is working hard, but not overloaded, whilst SUT can be pushed to the limit.

--

--

George Shuklin
OpsOps

I work at Servers.com, most of my stories are about Ansible, Ceph, Python, Openstack and Linux. My hobby is Rust.