Drill down your infrastructure with Benchdrill

Tasty Tomato
alter-way-innovation
5 min readAug 25, 2017

A benchmark-tools tool for distributed systems

Since we can’t automatically generate bug-free software, we have to test the software we write. Also we have to keep an eye on our applications’ performance. Benchmarks are a handy way to help you achieve this goal. Although there are several amazing tools in the outside that fill this need, we met a use case that called for a new application: Benchdrill.

The main reason to build this tool was to be able to easily charge the hosts in a Docker Swarm. Indeed, Alter Way is developing a big and complex project which involves a metrology stack. The issue for us was a lack in the ability to test the correctness of this stack, as well as having a way to examine its charge on the metrics its collecting. To achieve these goals, we needed a software with these features:

  • be usable with a simple CLI (no GUI);
  • charge CPU, I/O disks, and network of several hosts in the Docker Swarm, and all of that in parallel;
  • have these jobs done quickly, without too much overheard (which forced us to NOT create a container per job, as container creation weight on the system).

Rapidly, we decided to use Machinery since it’s architecture is well-suited to our use case. It allows to sends tasks through a queue to a message broker, which will then distribute them to the workers. When the workers finish their jobs, they send their outputs to a result backend which will display them in the terminal afterwards. A number of message brokers are available, from which we choose Redis since it can also be used as the result backend, which make things simpler.

So we have this nice tool, how to use it with Docker now? Simply by containerising the workers (no need to do that for Redis since it has an official image on the Docker Hub). After writing the Dockerfile, the following question raised: how to distribute the workers in the Swarm? By using a Compose file defining a service for workers, and another for Redis. This way, the number of workers are determined by the scale option of the docker service command. Want 100 workers? Typedocker service scale benchdrill_worker=10,easy peasy!

Now we have workers distributed in the Swarm, ready to receive their tasks and execute them diligently. Now what tasks will we send them actually? Charging loads on a machine ressources is not an easy task, but thankfully there are already awesome benchmark tools that fit this complex work. The two we already integrated in the workers are Sysbench and Filebench. Sysbench comes with multiple built-in benchmarks, but its strong point is that it scriptable with Lua. Although, it is recognized as a good tool to benchmark databases, this comes with the downside that you don’t have a complete control of what’s going on in the system you are testing. For example, if you want to test the I/O disk, you have to consider all the caching that is involved in modern systems. For this specific task, Filebench is the masterpiece benchmark tool to use. It has indeed a lot of options to define what you precisely want to do when testing I/O. Of course, this comes with a higher complexity, as it can be seen with the WML, the language that is used to described the tests you want to run.

To integrate these two benchmarks, the idea was simple: write a task that encompasses the command used normally with Sysbench or Filebench, and transmit it to the workers. Although it worked well with Sysbench, that wasn’t the case with Filebench. Indeed, since the last year, the developers decided to drop the ability to write a test directly in the terminal. They explained that it was too error-prone for new users but also for experimented ones, and it’s not really ergonomic either. So you can only pass a file to the CLI of Filebench, which will read it and execute it. The workaround for that issue is to pass a file to Benchdrill CLI, which will read it and transmit its content as a string object to the workers. They will write it in a local file again then pass it to Filebench as an argument. This is realized by a different task than the previous. Of course, you can also use this task for Sysbench if you want to pass a file.

The next step was to deploy Benchdrill services in a Swarm that contains the metrology stack, then look at the results of the loads charged with Benchdrill in a Grafana dashboard. We used an infrastructure composed of 4 computers. They all have 32 GB and an 8 cores CPU. One of this CPU has a clock rate of 2.66 GHz while the 3 others has a clock rate of 3.00 GHz. Their storage capacity is 512 GB.

This dashboard shows the total CPU time consumed by a Docker service, information retrieved with InfluxDB. In this case, we see, with purple curve, that the charge triggered by 20 workers executing 100 Sysbench CPU built-in test in parallel . Other curves depict Redis and metrology’s consumption of charge. We observe that despite the increase of the CPU time used by the workers, other services don’t show any modifications in their behaviour. Which is exactly what we want, because it means that the metrology stack charge doesn’t depend of the activity of the services it monitors.

Now, the next step of the big project is the development of the orchestration stack. Benchdrill will be helpful, because it can be used to charge heavily some nodes of the cluster while others would not. Therefore, we can observe how the orchestration stack will handle this situation, e. g. if it will balance the overall load of the cluster.

Found it intriguing? Benchdrill is a free software, so don’t hesitate to contribute in the GitHub page or open an issue to ask a question or report a bug! We’ll happily collaborate with you!

--

--