Optimising Counters in Apache Cassandra

Apache Cassandra is widely used at eBay Kleinanzeigen. This blog post describes how we optimised the performance of the counter type in Cassandra.
by Swen Fuhrmann, Senior Software Engineer

Photo by Petri Heiskanen on Unsplash

eBay Kleinanzeigen is a classifieds platform in Germany where users can list and advertise their items. On the listing page of an item is a counter showing how many people have already visited this page. We have a service which counts the visits and provides the total visit count for each listing page. This service uses the counter feature of Cassandra.

Example: Listing with counter

Although the service runs quite smoothly, we saw an increase in latency when querying the counter value after a while. Since Cassandra scales very well, we thought to simply add more nodes. But wait! Before we went ahead with this solution, we needed to understand the problem before putting more nodes on the cluster.

A first look at the graphs for the nodes showed that the disk reads are saturating. The reason was quite simple. We ran Cassandra in a private cloud where the disk IO is limited. Increasing the disc IO limit was not an option for the cloud team. Putting more nodes on the cluster might have helped because traffic would distribute over more nodes. On the other hand, a bigger cluster is harder to operate, and costs are higher.

So, we tried to understand how Cassandra worked and if we could fine tune something here. First, take a look at our data model. It’s a simple counter table:

CREATE TABLE visits_by_item_id (
item_id bigint PRIMARY KEY,
visits_count counter
);

Whenever a user visits the page with that item the first time, the counter will be incremented:

UPDATE visits_by_item_id SET visits_count = visits_count + 1 where item_id = <item-id>;

Furthermore, the current view counter for that page will be fetched with this select statement:

SELECT visits_count FROM visits_by_item_id WHERE item_id=<item-id>

There are a lot of read requests but the data loading from disk should be quite small. Basically, it’s only a 64-bit signed integer, meaning 8 bytes plus some overhead for meta data.

Next we need to have a closer look at how Cassandra stores the data on the disk. When we describe the table in the command line client cqlsh we get this:

DESC TABLE visits_by_item_id;CREATE TABLE visits_by_item_id (
item_id bigint PRIMARY KEY,
visits_count counter
) WITH …
AND compression = {‘chunk_length_in_kb’: ‘64’, ‘class’: ‘org.apache.cassandra.io.compress.LZ4Compressor’}

Beside a lot of other settings, the compression is activated per default. Several partitions will be compressed together as a chunk and stored on disk. The size of a chunk is controlled by the parameter chunk_length_in_kb. The default in Cassandra version 3.x is 64kb, in version 4.x 16kb.

We are still on version 3.x. That means in our case, for each read request, 64 kb must be loaded from disk, decompressed in memory before the 8 bytes of the queried counter value can be read. Because of this we must read a lot more data from the disk as we actually need!

In the first step we reduce the chunked_length_in_kb setting to 4kb.

ALTER TABLE visits_by_item_id WITH compression = {‘chunk_length_in_kb’: ‘4’, ‘class’: ‘org.apache.cassandra.io.compress.LZ4Compressor’};

Because the first results looked promising, we reduced it even more to 2kb after several days.

Be careful when doing these changes on a production system!

NOTE: Reducing the chunk size will only pay off when the read ahead has also been reduced. See here.

The change will not take immediate effect because the new settings will only apply to new written SSTables. Because SSTables are immutable the new setting will be applied when they will be compacted. It’s possible to force a rewrite of SSTables with nodetool, see here.

However, as nothing is for free, what’s the price of reducing the length of the chunks? First, compression will be less effective as smaller chunks contain less redundancy. That was acceptable for us.
To find the right chunk for a partition, Cassandra maintains an index in an off-heap memory area. Smaller chunks mean more chunks per node which results in more memory consumption for the chunk index:

The memory usage increased after making the chunks smaller. However, compared to the overall memory of the machine of several GB, the increase by roughly 40MB is negligible in our case.

With reducing the chunk size, we were able to reduce the bytes we must read from disk, but the amount of IO ops was still quite high.

What else can we do?

Cassandra has a dedicated cache for the counter datatype. The size of this cache can be configured in the cassandra.yaml file with the config counter_cache_size_in_mb. The default size is 50MB. We increased the cache to 480MB. That change reduces the IO ops as well as the IO disk reads once again a lot:

Conclusion

Before throwing more nodes into the cluster, the goal is first to try and understand the bottlenecks. The default configuration values in Cassandra are often not optimal and depend on the use case. Finding good values can save resources and money.

For our counter use case, we found two settings which had quite a huge impact on performance.
- Reducing the chunk size. Counter tables usually have a quite small partition size. A reduced chunk size will reduce the amount of data to read from disk.
- Increasing the counter cache. That reduced the IO ops and data to read.

With these two optimisations we were able to reduce the data we have to read from the disk by ~90%.

The IO ops we are reduced by ~80%.

--

--