Load-testing Apache Cassandra; Parsing Latency & Throughput Results (Part 1 of 3)

By Mark Price

Linode
Linode Cube
6 min readJul 25, 2017

--

Welcome to the first in a short series of posts describing how to configure and load test an instance of Apache’s popular, open-source, in-memory database, Cassandra. For those who have not come across it before, Cassandra is a horizontally-scalable NoSQL store, designed to be run in large, fault-tolerant clusters.

In this installment, I will work through the process of deploying a single instance of the Cassandra server, and then run a load test against it. I will then present and analyze the results of the load test to determine what sort of performance can be expected from an out-of-the-box install.

Later posts will explore how to use the state-of-the-art Linux performance monitoring tools to observe the server instance as the load test runs.

This guide is not concerned with how to tune or configure a Cassandra instance; a much broader topic of which there are already many examples.

Performance testing

Performance testing server-based software systems, intended for mission-critical use, is a vital part of any project lifecycle.

Before making software products live and available, a developer must understand a system’s limits. This knowledge allows the developer/product manager to engage in capacity planning to make sure that the infrastructure can cope with expected usage scenarios, and also to give an indication of when it may be necessary to invest in more infrastructure to handle unexpected demand.

When testing a server-based software system, the tester is generally interested in two metrics: throughput and latency. These are intrinsically linked, as we shall see.

Throughput simply describes the number of operations per unit time; the maximum throughput of a system might be 10,000 requests per second.

Latency is more nuanced, and can depend on the use-case. Latency is expressed in terms of different levels or, more commonly, in terms of percentiles. A system’s latency profile might be described as 99% of requests complete in 200 milliseconds; but it is usually expressed using a number of levels simultaneously, as in: 99%ile of 200ms, 99.99%ile of 400ms, mean average latency of 60ms.

Typically, a latency requirement will be tied to a certain target throughput. For example, a system may be described as meeting the following criteria:

The system can support 5,000 writes per second, with a 99%ile latency of 200ms.

This combination of the two metrics is required because a system’s response latency will tend to worsen as the workload throughput creeps closer to the system’s maximum ability to cope.

Using this framework for expressing a system’s operating capacity, I am able to describe what sort of performance can be expected from a single, untuned Cassandra node running on commodity hardware.

I will not start with any specific goals for this load test, since it is a single-node setup and I don’t know what the underlying hardware can support. I will first discover a ceiling for write throughput on a single node, then take a look at what request rates can be supported for a given latency profile.

Setting up the client and server

As previously stated, this is not a tuning guide, so let’s go ahead and install a stock Cassandra instance running on OpenJDK.

Server start-up script:

This script will download, unpack and start a Cassandra instance on your server.

Take a note of the server’s private IP address (this is the address that the stress-client will use to connect to the server).

Client start-up script:

This script will download and start the stress-client, instructing it to perform a write throughput test against the server (replace $SERVER_IP with your server’s IP address).

By default, the client will start with a certain number of writer threads, sending write requests to the server until the mean request latency stabilizes. Once stable, the client will stop for a while, then continue with a larger number of write threads. This process is repeated until the number of write threads becomes quite high (913 on the test I ran).

Initial results

At the end of each run, the Cassandra-stress tool reports the throughput achieved and the latency at various percentiles. The first run is performed using 4 write threads:

Running with 4 threadCount

Op rate : 4,119 op/s [WRITE: 4,119 op/s]

Latency mean : 0.9 ms [WRITE: 0.9 ms]

Latency median : 0.8 ms [WRITE: 0.8 ms]

Latency 95th percentile : 1.5 ms [WRITE: 1.5 ms]

Latency 99th percentile : 3.3 ms [WRITE: 3.3 ms]

Latency 99.9th percentile : 8.8 ms [WRITE: 8.8 ms]

Latency max : 87.8 ms [WRITE: 87.8 ms]

All the way up to 913 write threads:

Running with 913 threadCount

Op rate : 74,949 op/s [WRITE: 74,949 op/s]

Latency mean : 12.0 ms [WRITE: 12.0 ms]

Latency median : 8.6 ms [WRITE: 8.6 ms]

Latency 95th percentile : 27.6 ms [WRITE: 27.6 ms]

Latency 99th percentile : 91.1 ms [WRITE: 91.1 ms]

Latency 99.9th percentile : 136.6 ms [WRITE: 136.6 ms]

Latency max : 235.9 ms [WRITE: 235.9 ms]

Plotting the results on a chart makes for easier analysis, so let’s look at a couple of different renderings of these data.

Write-throughput vs thread count

Predictably enough, as the number of writer threads increases, so too (initially) does the achieved throughput:

However, the write throughput eventually tails off and approaches a ceiling. This could be a result of the client’s not being able to generate enough requests to saturate the server; or because the server cannot service a higher request rate. I will explore these possibilities in greater depth in a subsequent post.

Latency vs write-throughput

This chart shows that the mean latency remains relatively stable (and quick at less than 15ms) as the throughput increases, but the outliers at the 99th and 99.9th percentile suffer as the server has to process more requests:

This is a fairly standard result when pushing a system (whether that is the server or the load generator) to its limits.

Wrapping up

In this post, I have covered how to set up and load-test a stock install of Apache’s Cassandra in-memory database. Cassandra comes with a bundled load-testing tool that makes this kind of investigation much easier.

When exploring what sort of performance a system can deliver, it is useful to phrase your requirements (or findings) in terms respective of SLAs, such as the request latency that can be expected by a client at a certain level of throughput.

Visualizing the results from this battery of stressors is a quick way to grasp how any system under test will respond as the experienced load increases, which in turn can help with capacity-planning.

From this simple testing process, I can state that a single-node Cassandra instance on the provisioned server VM can sustain the following latency profile for a synthetic write-heavy workload:

30,000 writes/sec, mean latency 1.7ms, 99%ile 5.5ms, 99.9%ile 25.2ms

70,000 writes/sec, mean latency 8.3ms, 99%ile 86ms, 99.9%ile 133.1ms

In the next post, I will take a look at what is happening on the server, and gain some insight into where the system limits lie.

About the blogger: Mark Price is principal software consultant at Aitu Software, specializing in performance testing, system tuning, and low-latency development. He is a regular presence at renowned IT conferences, like QCon London 2017, and his most recent talk can be seen in this LinkedIn Slideshare. You can find more of his insight and follow his system performance investigations on Twitter (@epickrram) and at his blog, Technical Itch.

Please feel free to share below any comments, questions or insights about your experience with Apache Cassandra and system performance. And if you found this blog useful, consider sharing it through social media.

--

--

Linode
Linode Cube

Cloud Hosting for You. Sign up today and take control of your own server! Contact us via ticket or email for all support inquiries: https://www.linode.com/contact