Leveraging Virtual Tables in Apache Cassandra 4.0
Author: Aaron Ploetz
Thanks to CASSANDRA-7622, Apache Cassandra 4.0 has a new feature called “Virtual Tables.” In this post, we discuss virtual tables, their implementation in Apache Cassandra 4.0, and how you can use them to improve your cluster’s observability.
Virtual tables are essentially table interfaces that sit on top of an API, instead of data stored on disk. They can be interacted with at the database level, but don’t support all the functionality that “normal” tables do. Virtual Tables in Cassandra cannot be written to, altered, dropped or truncated, nor can they support additional indexes, functions, or materialized views.
Also, virtual tables can only exist in a virtual keyspace. Virtual Keyspaces are special keyspaces managed by Cassandra. They’re not replicated and are specific only to the local node.
Apache Cassandra 4.0 installs two virtual keyspaces:
System_virtual_schema: contains the schema data for all virtual tables and keyspaces (including itself)
System_views: the lone, actual virtual keyspace in Cassandra
So what can you do with them?
In its current iteration, virtual tables greatly simplify the tools needed for managing and monitoring Apache Cassandra. Accessing metrics has always been achieved with Java Management Extensions (JMX), but with Virtual Tables, many of those metrics are now exposed through Virtual Tables.
Here’s a list of the Virtual Tables included in the
Let’s go through some quick examples.
Have you ever been troubleshooting something for a cluster online and posted in a discussion forum only to be asked, “what’s your
compaction_throughput_mb_per_sec?” Or, “how many
memtable_flush_writers are you using?”
Before, you would have to check the
cassandra.yaml file on that node — assuming you have access to it. Now, you can simply query it:
As you can see, it’s never been easier to query the configuration parameters with Cassandra!
Relevant system properties are also exposed via a virtual table. This can be useful when working with your enterprise security team, when they identify security vulnerabilities and need to know things about a node’s Java Runtime Environment (JRE):
Note that as Virtual Tables are specific to the local node only, using the IN CQL operator won’t create a performance issue.
Similarly, if you’re working on a new cluster and want to know where the logs are being written, you can use:
This way, you can quickly view Cassandra’s relevant system properties. Previously, the only window to this information was through commands run over a SSH (Secure Shell) connection.
Node performance metrics
Before Apache Cassandra 4.0, you could only access performance metrics through the JMX interface. But with Virtual Tables, you can see how the specific nodes in your cluster are performing.
Let’s say that you wanted to know more about the read pattern for a table that tracks “Nerd Holidays.” That data is stored in the
rows_per_read table, and you can query it by keyspace and table name:
Or maybe, you’d like to see the key cache statistics for this node:
Virtual Tables are a welcome addition to Cassandra. Being able to access configuration properties, system variables, and metrics programmatically with
cqlsh is a huge time saver. With this approach, you can greatly simplify access to valuable observability data.
In closing, here are the key takeaways:
- Virtual Tables are a quick way to view data about a single node
- Configuration properties can be verified without access to the
- System properties relevant to Cassandra can be verified without a SSH connection
- Many metrics can now be queried without using JMX
Follow DataStax on Medium for exclusive posts on all things Cassandra, streaming, Kubernetes, and more. To join a buzzing community of developers from around the world and stay in the data loop, follow DataStaxDevs on Twitter and LinkedIn.