Arithmetic Operators in Apache Cassandra 4.0

Author: Aaron Ploetz

DataStax
Building Real-World, Real-Time AI

--

Developers and DBAs now have a set of new arithmetic operators in Cassandra 4.0 making it easier to learn and perform different types of calculations. This post will introduce you to a few of these new operators with simple examples of how they work.

For years while helping folks on Stack Overflow, I’ve answered questions about arithmetic operators in Cassandra Query Language (CQL) with “Sorry, but that is not supported.” Unfortunately, that statement ends up prefacing far too many Cassandra answers. In the past, these types of operations had to be applied on the application side. But the release of Apache Cassandra 4.0 helps to round off some of the sharp edges from Cassandra’s learning curve with the implementation of CASSANDRA-11935, which added support for arithmetic operators to CQL.

Note: All examples below work with both Apache Cassandra 4.0 and DataStax Astra DB.

Simple Operations

So how could we make use of CASSANDRA-11935? Let’s start with some simple operations on a table for storing baseball statistics. Here is a simple query for at-bats, hits, walks, and times hit-by-the-pitch for the player named “Avery.”

> SELECT name, atbats, hits, walks, hitbypitch
FROM baseball_stats
WHERE team='Terminators' AND name='Avery';
name | atbats | hits | walks | hitbypitch
-------+--------+--------------|------------
Avery | 48 | 28 | 17 | 2

So how many times has Avery been on base? To figure this out, we need to perform addition on the number of hits, walks, and the number of times he was hit by a pitch. We can perform this operation within our query like this:

> SELECT name, atbats, hits, walks, hitbypitch,
hits + walks + hitbypitch as "times on base"
FROM baseball_stats
WHERE team='Terminators' AND name='Avery';
name | atbats | hits | walks | hitbypitch | times on base
-------+--------+------+-------+------------+---------------
Avery | 48 | 28 | 17 | 2 | 47

How about showing a batting average (the percentage of time a batter safely reaches base as a result of a hit) for Avery? Typically, a player’s batting average is computed by dividing hits by the number of at bats. Our query can handle that as well:

> SELECT name, atbats, hits, hits / atbats as "avg"
FROM baseball_stats
WHERE team='Terminators' AND name='Avery';
name | atbats | hits | avg
-------+-------+------+-----
Avery | 48 | 28 | 0

Wait, why is avg coming out as zero? What happened? The answer is that because the hits and atbats columns are defined as INTegers, integer arithmetic happened. This essentially means that only whole numbers are considered for operations, and any fractions or remainders are ignored. In our case, 48 cannot go into 28 even one time, so a zero is returned.

To solve this problem, we will use CQL’s CAST function to change each of our values to FLOATs (floating-point numbers). Casting the column values as FLOATs will allow the division operator to consider decimal values.

> SELECT name, atbats, hits,
CAST(hits as FLOAT) / CAST(atbats as FLOAT) as "avg"
FROM baseball_stats
WHERE team='Terminators' AND name='Avery';
name | atbats | hits | avg
-------+--------+------+----------
Avery | 48 | 28 | 0.583333

With the values now appropriately CASTed, we can see that Avery batted 0.583 in 48 at-bats.

Date Arithmetic

Where this feature really shines is when you apply it to time series or event data models.

Let’s say we wanted to query on a simple table holding data on “nerd holidays.” If we wanted to see when “Pi Day” was coming up (probably the most-famous nerd holiday), we could run a range query for all of the nerd holidays in the month of March 2021:

> SELECT event_date, name
FROM nerd_holidays
WHERE year_bucket=2021
AND event_date >= '2021-03-01'
AND event_date < '2021-04-01';
event_date | name
------------+----------------
2021-03-25 | Fall of Sauron
2021-03-14 | Pi Day

Now, here’s where the real magic comes into play. Let’s say that we wanted to see upcoming nerd holidays in the next 30 days. That would look something like this:

> SELECT event_date, name
FROM nerd_holidays WHERE year_bucket=2021
AND event_date >= toDate(now())
AND event_date < toDate(now()) + 30d;
event_date | name
------------+------------------------
2021-10-02 | Ada Lovelace Day
2021-09-22 | Hobbit Day
2021-09-19 | Talk Like a Pirate Day

Additionally, we could adjust the duration, to see nerd holidays for the next three months:

> SELECT event_date, name
FROM nerd_holidays
WHERE year_bucket=2021
AND event_date >= toDate(now())
AND event_date < toDate(now()) + 3mo;
event_date | name
------------+------------------------
2021-11-30 | Computer Security Day
2021-11-23 | Fibbonacci Day
2021-11-23 | Tardis Day
2021-10-21 | Marty McFly Day
2021-10-02 | Ada Lovelace Day
2021-09-22 | Hobbit Day
2021-09-19 | Talk Like a Pirate Day

Similarly, we can also subtract durations. This query will return nerd holidays that occur during the last two months of 2021:

> SELECT event_date, name
FROM nerd_holidays
WHERE year_bucket=2021
AND event_date >= '2022-01-01' - 2mo
AND event_date < '2022-01-01';
event_date | name
------------+-----------------------
2021-12-21 | 2112 Day
2021-11-30 | Computer Security Day
2021-11-23 | Fibbonacci Day
2021-11-23 | Tardis Day

The validity of different duration types is dependent on the precision of the underlying time type. For example, durations of minutes (m) and seconds (s) are valid durations for the timestamp type, but not for the date type. As a reference, a listing of commonly-used durations for both Cassandra and Astra DB is shown in the table below:

Key takeaways

  • Common arithmetic operations (*, /, +, -, %) can now be applied in both the SELECT and WHERE clauses of a CQL statement.
  • The use of multiple arithmetic operators will follow the standard order of operations (parenthesis, multiply, divide, add, subtract).
  • It may be necessary to use the CAST function to enforce the desired level of precision.
  • Duration abbreviations can be used to filter column values of time-based data types.
  • Durations are only valid for types that allow their level of precision.

Developers and Database Administrators (DBAs) using Apache Cassandra have long been without some of the niceties of Structured Query Language (SQL). The use of arithmetic operators is a welcome addition to CQL. It offers both the convenience to apply them at query time as well as the reduction of application code required to process numeric and time-series data sets.

Follow the DataStax Tech Blog for more developer stories. Check out our YouTube channel for tutorials and here for DataStax Developers on Twitter for the latest news about our developer community.

Resources

  1. DataStax Astra DB
  2. Cassandra Issues: Add support for arithmetic operators
  3. Cassandra Query Language (CQL)
  4. Apache Cassandra
  5. New Features in Apache Cassandra 4.0=
  6. Countdown to Pi Day

--

--

DataStax
Building Real-World, Real-Time AI

DataStax provides the real-time vector data tools that generative AI apps need, with seamless integration with developers' stacks of choice.