Leveraging Virtual Tables in Apache Cassandra 4.0

Author: Aaron Ploetz

Thanks to CASSANDRA-7622, Apache Cassandra 4.0 has a new feature called “Virtual Tables.” In this post, we discuss virtual tables, their implementation in Apache Cassandra 4.0, and how you can use them to improve your cluster’s observability.

Virtual tables are essentially table interfaces that sit on top of an API, instead of data stored on disk. They can be interacted with at the database level, but don’t support all the functionality that “normal” tables do. Virtual Tables in Cassandra cannot be written to, altered, dropped or truncated, nor can they support additional indexes, functions, or materialized views.

Also, virtual tables can only exist in a virtual keyspace. Virtual Keyspaces are special keyspaces managed by Cassandra. They’re not replicated and are specific only to the local node.

Apache Cassandra 4.0 installs two virtual keyspaces:

  • System_virtual_schema: contains the schema data for all virtual tables and keyspaces (including itself)
  • System_views: the lone, actual virtual keyspace in Cassandra

So what can you do with them?

In its current iteration, virtual tables greatly simplify the tools needed for managing and monitoring Apache Cassandra. Accessing metrics has always been achieved with Java Management Extensions (JMX), but with Virtual Tables, many of those metrics are now exposed through Virtual Tables.

Here’s a list of the Virtual Tables included in the system_views keyspace:

Table 1. Virtual Tables in the system_views keyspace.

Let’s go through some quick examples.

Settings

Have you ever been troubleshooting something for a cluster online and posted in a discussion forum only to be asked, “what’s your compaction_throughput_mb_per_sec?” Or, “how many memtable_flush_writers are you using?”

Before, you would have to check the cassandra.yaml file on that node — assuming you have access to it. Now, you can simply query it:

As you can see, it’s never been easier to query the configuration parameters with Cassandra!

System Properties

Relevant system properties are also exposed via a virtual table. This can be useful when working with your enterprise security team, when they identify security vulnerabilities and need to know things about a node’s Java Runtime Environment (JRE):

Note that as Virtual Tables are specific to the local node only, using the IN CQL operator won’t create a performance issue.

Similarly, if you’re working on a new cluster and want to know where the logs are being written, you can use:

This way, you can quickly view Cassandra’s relevant system properties. Previously, the only window to this information was through commands run over a SSH (Secure Shell) connection.

Node performance metrics

Before Apache Cassandra 4.0, you could only access performance metrics through the JMX interface. But with Virtual Tables, you can see how the specific nodes in your cluster are performing.

Let’s say that you wanted to know more about the read pattern for a table that tracks “Nerd Holidays.” That data is stored in the rows_per_read table, and you can query it by keyspace and table name:

Or maybe, you’d like to see the key cache statistics for this node:

This provides an easy way to view performance metrics with Cassandra. It’s a tremendous improvement over having to directly interact with the JVM using an additional tool, like JConsole or JMXTerm.

Figure 1. Interacting with Apache Cassandra via JMXTerm. Does anyone actually miss having to do this?

Summary

Virtual Tables are a welcome addition to Cassandra. Being able to access configuration properties, system variables, and metrics programmatically with cqlsh is a huge time saver. With this approach, you can greatly simplify access to valuable observability data.

In closing, here are the key takeaways:

  • Virtual Tables are a quick way to view data about a single node
  • Configuration properties can be verified without access to the cassandra.yaml file
  • System properties relevant to Cassandra can be verified without a SSH connection
  • Many metrics can now be queried without using JMX

Follow DataStax on Medium for exclusive posts on all things Cassandra, streaming, Kubernetes, and more. To join a buzzing community of developers from around the world and stay in the data loop, follow DataStaxDevs on Twitter and LinkedIn.

Resources

  1. Virtual Tables | Apache Cassandra Documentation
  2. CASSANDRA-7622 — Implement virtual tables
  3. Arithmetic Operators in Apache Cassandra 4.0
  4. Deploy Apache Cassandra 4.0 on Kubernetes and AWS
  5. 3 Things You Should Know About Data Consistency, Distribution, and Replication with Apache Cassandra

--

--

--

We’re huge believers in modern, cloud native technologies like Kubernetes; we are making Cassandra ready for millions of developers through simple APIs; and we are committed to delivering the industry’s first and only open, multi-cloud serverless database: DataStax Astra DB.

Recommended from Medium

The Pragmatic Programmer — Oct

How to Make a Rule-based Chatbot in Python Using Flask

by Silvia Mazzetta Date: 16–06–2020 machinelearning AI ArtificialIntelligence javascript

Intellij…You can’t find what??

Ansible Inside Kubernetes

Relationships in Salesforce You Should Know (2021)

Data structures exercise: Array-backed list in Java with TDD

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DataStax

DataStax

DataStax is the company behind the massively scalable, highly available, cloud-native NoSQL data platform built on Apache Cassandra®.

More from Medium

How to Migrate Your Cassandra Database to Kubernetes with Zero Downtime

Develop a Daily Reporting System for Chaos Mesh to Improve System Resilience

Event Streaming Platform

event streaming platform

Scaling ML Model Serving on Amazon EKS with Custom Metrics