A Beginner’s Guide to Benchmarking with NoSQLBench
Author: David Jones-Gilardi
Welcome to Part 3 of a six-part series on Apache Cassandra®. In the last post, we discussed advanced data modeling. In this post, we show you how to benchmark and stress test your Cassandra database and other NoSQL databases with DataStax’s open-source benchmarking tool — NoSQLBench.
Operators and developers benchmark and stress test their data models regularly. With mission-critical infrastructure like Apache Cassandra®, benchmarking is standard practice, especially before going to production.
There are several benchmarking tools in the market but most of them require esoteric coding knowledge. DataStax’s open-source benchmarking platform, NoSQLBench, is simple to use while providing sophisticated benchmarking for Cassandra and other NoSQL databases. Testing your data with NoSQLBench provides results within minutes.
In this post, you’ll get hands-on experience with benchmarking and stress testing Cassandra using NoSQLBench. Rather than going in-depth, our tutorial will scratch the surface and cover:
- Understanding parameters and key metrics for benchmarking
- Learning how cycles, bindings and statements work together
- Experimenting with stdout
- Scaling up a test and customizing your own scenarios
- Packaging a performance test with named scenarios
What is NoSQLBench?
NoSQLBench is an open-source, pluggable testing tool for the NoSQL ecosystem. Primarily designed to test Cassandra, you can also use it for other NoSQL technology like Apache Kafka, MongoDB, and DataStax’s Astra DB.
NoSQLBench evolved from DataStax’s internal testing tool, dsbench, which made a quantum leap in testing abilities for the Cassandra community. Engineers and customers were using it for performance testing, data model design, sizing, and deployment of new clusters.
After we completed dsbench, we decided to turn it into an open-source project, NoSQLBench, so the NoSQL community can improve it further. NoSQLBench now integrates with different kinds of workloads and protocols, including CQL support.
If you’d like to contribute to the project in any way, including mentoring and code reviews, let us know on Twitter or in the comments.
Why should you benchmark your data models?
As consumers spend more time online, companies are handling massive traffic everyday on their sites. Benchmarking ensures your data models can handle unexpected spikes without breaking down.
Testing your data model before going into production is an operational imperative. You can understand how your system behaves and performs as it scales, before you release it into the world.
Highly experienced data operators learn to gauge signals and metrics without performing a benchmark. For everyone else however, testing your data models is imperative—especially for those who are operationally conservative.
NoSQLBench with Cassandra exercises
The exercises take roughly two hours to complete with preconfigured scenarios on Katacoda, resources on GitHub, and step-by-step instructions in this YouTube video. We handled the Cassandra backend so you can focus entirely on NoSQLbench.
In the hands-on exercises, you will:
- Run your first NoSQLBench benchmark against Cassandra with pre-packaged data models
- Investigate NoSQLBench metrics from Cassandra using Grafana and Prometheus
- Create your custom workloads to benchmark with NoSQLBench
Exercise 1: Executing NoSQLBench commands against Cassandra
The first exercise shows you how to execute NoSQLBench commands against Cassandra using pre-packaged data models. You have the option to follow this exercise manually using GitHub or to run this Katacoda scenario.
If you’re doing this manually, set up Docker through this 0-setup readme first. Docker is an open-source virtualization software that automates the deployment of software applications inside containers. Once your Docker is running, start a Cassandra database and download NoSQLBench as a jar file.
We also strongly recommend that you create a directory to use with NoSQLBench and download it there as we will generate many files during these exercises.
- Execute an initial run of your NoSQLBench commands
- Create a test schema
- Write initial ramp-up data
- Perform a benchmark test
- Analyze the results
Exercise 2: Investigate NoSQLBench metrics from Cassandra with Grafana
In this exercise, you’ll view and analyze the metrics in charts and graphs on Grafana, an open-source analytics and interactive visualization application. Again, you can run this Katacoda scenario or get the code for this section on GitHub. Follow along the instructions in this video.
Here’s what you need to do for this exercise:
- Export metrics to Grafana
- Launch Grafana
- View various metrics
- Launch Prometheus and view metrics in more detail
Analyzing Grafana Metrics
Once you’ve completed the above steps, we can take a closer look at four types of metrics on Grafana and what they represent.
Figure 1. Four types of metrics on Grafana.
- Ops and successful ops: This graph indicates the operation rate (reads/writes per second) of the benchmark. There should be no discrepancies between ops and success metrics.
- Error Counts: If there are discrepancies between ops and success metrics, you will see errors here. Ideally, there shouldn’t be any errors.
- Service Time distribution: Service time measures how quickly your NoSQL database responds to requests coming in from the client’s application server.
- Op tries distribution: This graph shows how many tries or retries it took to execute your operation. If there are too many retries, your database is overloaded, which warrants an investigation immediately.
Exercise 3: Customizing workloads in NoSQLBench
In the previous exercises, you used pre-packaged workloads to run the NoSQLBench commands. Now, we’ll guide you to create custom workloads for your own application or database. This section is optional but recommended.
- List workloads and named scenarios
- Copy workloads
- Build your own workload
- Combine everything into a single workload file
You’re already halfway through our Cassandra series! If you want to keep learning, head over to Part 4 where we dive into Storage-Attached Indexes. Then, finish off strong with Part 5 and Part 6 where we show you how to migrate your SQL applications to NoSQL and give you a hands-on exercise.
Our newly released Definitive Cassandra Guide will also be incredibly useful to learn everything that you need to know about Cassandra. You can also reach out to me personally through LinkedIn or comment under this post if you have any questions!
- DataStax NoSQLBench
- YouTube Tutorial: Benchmark your NoSQL Database
- Astra DB: Multi-cloud DBaaS built on Apache Cassandra
- NoSQLBench Learning Series for Apache Cassandra by DataStax
- NoSQL Bench Workshop Online GitHub
- DataStax Academy
- DataStax Community
- Definitive Cassandra Guide by DataStax
- Apache Pulsar Performance Testing with NoSQLBench