A series on Cassandra: Part 3: Advanced features

Flavian Alexandru
Outworkers
Published in
4 min readJan 15, 2019

Clustering columns and range queries

Clustering columns are a simple of telling Cassandra or DSE how to keep things ordered in a specific way. What’s the main advantage? That on every insert Cassandra will have its usual PAXOS fun and end up inserting the record in its right place according to the ordering criteria you defined through your clustering columns.

When you will later query those columns for data, the great benefit you get is that you no longer have to worry about sorting your data. This can help you get incredible performance out of queries, and it is very powerful in instances such as time-series data.

The CQL modelling is done like this:

CREATE TABLE EventsByUser(
user_id uuid,
id uuid,
time date,
PRIMARY KEY((user_id), id, date)
) WITH CLUSTERING ORDER(id ASC, date DESC)

Here we have a way to store all events generated by a user ordered by time in naturally descending order. What this empowers us to do is to get a list of the last N events that took place without needing to re-sort. This will be done using range queries. To model the same in phantom, have a look at the below. We are intentionally using JodaTime, however a plain old Java Date would do just fine.

import com.outworkers.phantom.dsl._case class EventsByUser(
user: UUID,
event: UUID,
time: DateTime
)
abstract class EventsByUsers extends Table[EventsByUsers, EventsByUser] {
object user_id extends UUIDColumn with PartitionKey
object event extends UUIDColumn with ClusteringOrder with Ascending
object time extends DateTimeColumn with ClusteringOrder with Descending
}

One trick of the trade is that you will need to directly specify a clustering order for every column that is part of the Primary key. You don’t need to do this for the partition key.

Now you can run range queries:

object EventsByUsers extends EventsByUsers with DbProvider {
def getSlice(
user: UUID,
start: DateTime,
end: DateTime
): Future[Seq[EventsByUser]] = {
db.eventsByUser.select.where(_.user_id eqs id)
.and(_.time gte start)
.and(_.time lte end)
.fetch()
}
}

Counter Columns

Counter columns are another incredibly powerful and easy to use feature pre-packaged in CQL and Cassandra. They offer you a simple way of doing distributed atomic counts directly with the help of the database while rarely if ever at all having to worry about the consistency of the data over your clusters. So how do counters work?

Let’s have a look at a “real-world” example:

CREATE TABLE TestTable(
id uuid,
count counter
) PRIMARY KEY (id)

This is a very simple and straightforward way of counting things in distributed fashion, and it is just as easy to replicate in Scala using Phantom DSL.

import scala.concurrent.Future
import com.outworkers.phantom.dsl._
case class TestRecord(id: UUID, count: Long)abstract class TestTable extends Table[TestTable, TestRecord] {
object id extends UUIDColumn with PartitionKey
object count extends CounterColumn
}

What you now have is the ability to count whatever combination of unicity your primary key defines, meaning you can count anything you want, just as if you had an AtomicInteger at hand, except this time it’s trivial to share it across all your machines and applications.

And you can simply query for it using Phantom:

object TestTable extends DbProvider {
def getCount(id: UUID): Future[Option[Long]] = {
db.testTable.select(_.count).where(_.id eqs id).one()
}
}

Static Columns

Static columns are an easy way of sharing the same data across the same data partition. For instance, this is specifically useful when you want to implement a one-to-many relationship and one of the fields of data in the “many” part is shared.

A very simple example would be grouping team members by the id of a team. What you get is a one-to-many relationship between the team’s id and each of the individual team members, but the id of the team’s couch would stay the same.

Let’s try and model this in CQL:

CREATE TABLE TeamMembersByTeam(
team_id uuid,
team_member_id uuid,
couch_id uuid static,
PRIMARY KEY(team_id, team_member_id)
)

Now if you haven’t missed out on our introduction to this series on Cassandra, you probably know that’s how to define a one-to-many in CQL. The catch is that the ID of the team’s couch will now be shared for all team members, meaning that if you update it for one of the team members it will also be updated for all others.

This is also quite nice and easy to model in phantom:

import com.outworkers.phantom.dsl._case class TeamMember(team: UUID, id: UUID, couch: UUID)abstract class TeamMembers extends Table[TeamMembers, TeamMember] {
object team_id extends UUIDColumn with PartitionKey
object team_member_id extends UUIDColumn with PrimaryKey
object couch_id extends UUIDColumn with StaticColumn
}

If you enjoyed this article, follow us on Twitter and stay tuned for more: @outworkers_uk. Outworkers is an elite marketplace for Scala engineers with a unique out-staffing model. If you’re looking for high level Scala expertise to transform your business and applications, give us a call and we will give you an incredible definition of engineering!

Want to learn more?

As official Datastax partners, Outworkers offers a comprehensive range of professional training services for Apache Cassandra and Datastax Enterprise, taking your engineering team from Cassandra newbies to full blown productivity in record time. Our example driven courses are the weapon of choice for companies of any size and if you happen to be a Scala user, we will also throw in a professional training session on using phantom in your company. All of our face-to-face training courses come with free ongoing access to our online training material.

For enquiries and bookings, please contact us by email at office@outworkers.com.

--

--