A Beginner’s Guide to Benchmarking with NoSQLBench

Welcome to Part 3 of a six-part series on Apache Cassandra®. In the last post, we discussed advanced data modeling. In this post, we show you how to benchmark and stress test your Cassandra database and other NoSQL databases with DataStax’s open-source benchmarking tool — NoSQLBench.

Operators and developers benchmark and stress test their data models regularly. With mission-critical infrastructure like Apache Cassandra®, benchmarking is standard practice, especially before going to production.

There are several benchmarking tools in the market but most of them require esoteric coding knowledge. DataStax’s open-source benchmarking platform, NoSQLBench, is simple to use while providing sophisticated benchmarking for Cassandra and other NoSQL databases. Testing your data with NoSQLBench provides results within minutes.

In this post, you’ll get hands-on experience with benchmarking and stress testing Cassandra using NoSQLBench. Rather than going in-depth, our tutorial will scratch the surface and cover:

  1. Understanding parameters and key metrics for benchmarking
  2. Learning how cycles, bindings and statements work together
  3. Experimenting with stdout
  4. Scaling up a test and customizing your own scenarios
  5. Packaging a performance test with named scenarios

What is NoSQLBench?

NoSQLBench is an open-source, pluggable testing tool for the NoSQL ecosystem. Primarily designed to test Cassandra, you can also use it for other NoSQL technology like Apache Kafka, MongoDB, and DataStax’s Astra DB.

NoSQLBench evolved from DataStax’s internal testing tool, dsbench, which made a quantum leap in testing abilities for the Cassandra community. Engineers and customers were using it for performance testing, data model design, sizing, and deployment of new clusters.

After we completed dsbench, we decided to turn it into an open-source project, NoSQLBench, so the NoSQL community can improve it further. NoSQLBench now integrates with different kinds of workloads and protocols, including CQL support.

If you’d like to contribute to the project in any way, including mentoring and code reviews, let us know on Twitter or in the comments.

Why should you benchmark your data models?

As consumers spend more time online, companies are handling massive traffic everyday on their sites. Benchmarking ensures your data models can handle unexpected spikes without breaking down.

Testing your data model before going into production is an operational imperative. You can understand how your system behaves and performs as it scales, before you release it into the world.

Highly experienced data operators learn to gauge signals and metrics without performing a benchmark. For everyone else however, testing your data models is imperative—especially for those who are operationally conservative.

NoSQLBench with Cassandra exercises

The exercises take roughly two hours to complete with preconfigured scenarios on Katacoda, resources on GitHub, and step-by-step instructions in this YouTube video. We handled the Cassandra backend so you can focus entirely on NoSQLbench.

In the hands-on exercises, you will:

  1. Run your first NoSQLBench benchmark against Cassandra with pre-packaged data models
  2. Investigate NoSQLBench metrics from Cassandra using Grafana and Prometheus
  3. Create your custom workloads to benchmark with NoSQLBench

Exercise 1: Executing NoSQLBench commands against Cassandra

The first exercise shows you how to execute NoSQLBench commands against Cassandra using pre-packaged data models. You have the option to follow this exercise manually using GitHub or to run this Katacoda scenario.

If you’re doing this manually, set up Docker through this 0-setup readme first. Docker is an open-source virtualization software that automates the deployment of software applications inside containers. Once your Docker is running, start a Cassandra database and download NoSQLBench as a jar file.

We also strongly recommend that you create a directory to use with NoSQLBench and download it there as we will generate many files during these exercises.

Follow along our video tutorial or this GitHub link to complete the exercises below:

Exercise 2: Investigate NoSQLBench metrics from Cassandra with Grafana

In this exercise, you’ll view and analyze the metrics in charts and graphs on Grafana, an open-source analytics and interactive visualization application. Again, you can run this Katacoda scenario or get the code for this section on GitHub. Follow along the instructions in this video.

Here’s what you need to do for this exercise:

Analyzing Grafana Metrics

Once you’ve completed the above steps, we can take a closer look at four types of metrics on Grafana and what they represent.

Screenshot of results for the four types of metrics on Grafana

Figure 1. Four types of metrics on Grafana.

  1. Ops and successful ops: This graph indicates the operation rate (reads/writes per second) of the benchmark. There should be no discrepancies between ops and success metrics.
  2. Error Counts: If there are discrepancies between ops and success metrics, you will see errors here. Ideally, there shouldn’t be any errors.
  3. Service Time distribution: Service time measures how quickly your NoSQL database responds to requests coming in from the client’s application server.
  4. Op tries distribution: This graph shows how many tries or retries it took to execute your operation. If there are too many retries, your database is overloaded, which warrants an investigation immediately.

Exercise 3: Customizing workloads in NoSQLBench

In the previous exercises, you used pre-packaged workloads to run the NoSQLBench commands. Now, we’ll guide you to create custom workloads for your own application or database. This section is optional but recommended.

To get started, run this Katacoda scenario or use the code on GitHub. Find detailed instructions in this video and complete these steps:

Conclusion

You’re already halfway through our Cassandra series! If you want to keep learning, head over to Part 4 where we dive into Storage-Attached Indexes. Then, finish off strong with Part 5 and Part 6 where we show you how to migrate your SQL applications to NoSQL and give you a hands-on exercise.

Our newly released Definitive Cassandra Guide will also be incredibly useful to learn everything that you need to know about Cassandra. You can also reach out to me personally through LinkedIn or comment under this post if you have any questions!

Follow the DataStax Tech Blog for more developer stories. Check out our YouTube channel for tutorials and DataStax Developers on Twitter for the latest news about our developer community.

Resources

  1. DataStax NoSQLBench
  2. YouTube Tutorial: Benchmark your NoSQL Database
  3. Astra DB: Multi-cloud DBaaS built on Apache Cassandra
  4. NoSQLBench Learning Series for Apache Cassandra by DataStax
  5. NoSQL Bench Workshop Online GitHub
  6. DataStax Academy
  7. DataStax Community
  8. Definitive Cassandra Guide by DataStax
  9. Apache Pulsar Performance Testing with NoSQLBench

--

--

--

We’re huge believers in modern, cloud native technologies like Kubernetes; we are making Cassandra ready for millions of developers through simple APIs; and we are committed to delivering the industry’s first and only open, multi-cloud serverless database: DataStax Astra DB.

Recommended from Medium

Story of Clock mini

Designing a WPF TreeView File Explorer

Julia Programming Language — From Zero to Expert

A Letter to our MetaFinance Community: Updates on Products & Strategies

Use of Underscore in Python

C60 guild new meta game IDO schedule.

Minesweeper in about 100 Lines of Code

OpenBSD Kernel Internals — Creation of process from user-space to kernel space.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DataStax

DataStax

DataStax is the company behind the massively scalable, highly available, cloud-native NoSQL data platform built on Apache Cassandra®.

More from Medium

How to Analyze Prometheus Alertmanager Alerts Using S3, Athena and CloudFormation

An Index Gone Rogue

Real-Time Analytics on Kinesis Event Streams Using Rockset, Druid, Elasticsearch and Redshift

Real-Time Analytics on Kinesis Event Streams

Create a Distributed Database with High Availability with Apache ShardingSphere