How Do We (Plan) Monitor Cassandra Cluster

Alvin
prismapp
Published in
7 min readNov 2, 2016

Note: this is based on what i read about how cassandra works and what i understand. hemm, and i doubt my understanding. please correct me for my misunderstanding.

This post is part of series about development stories how we plan to monitor our Cassandra cluster. We part this series into 3 phases, that reflect the main questions when planning monitoring cassandra cluster development:

Phase 1: What we want to monitor?
In this phase, is mainly about development planning and objective goals setting.

Phase 2: How do we monitor cassandra cluster?
In this phase, we would consider how we could achive the goal that we had set in phase 1. What are metrics that is provided from cassandra cluster? Are those metrics could fulfill the objective goals that we had set? How do we could collect those metrics? What metrics that we need to collect? What tools that we want to use?

We also need to consider what strategy that we want to use, to collect the metrics? and what are the trade-off? and wondering if we could do the strategy?

There are 2 main strategies in monitoring:
1. Push: cassandra cluster reporting its metric to monitoring system
2. Pull: monitoring system pulls metrics from cassandra cluster

Phase 3: How we present the result?
In this phase, we would consider how to present the metrics that we had collected. How we organize the monitoring dashboard, so everyone could analyze the metrics easily.

We also need to consider what metrics that we need to be alerted on when something wrong happens.

So let’s begin to phase 1.

In this development, we try to adopt documentation-driven development (DDD) , reference to: https://gist.github.com/zsup/9434452 . In my understanding, it is like documenting what we gonna do first, before do the development. Every plan changes or spontaneous ideas that occured in the middle of development, need to be documented first, before being developed. Because it is supposedly dangerous if we have configuration that we didn’t know.

What we want to monitor?

We reasoning about what goals that we want to achieve about this development. Our main goal is to monitor cassandra cluster. So we question all the problem when we couldn’t monitor cassandra cluster, then based on those problem lists, we reason how we handle the problem, and set as objectives goals, then categorize the objective lists. Here the result:

Capacity:

When provisioning cassandra cluster, we need to provide the reasoned resources to handle the process, so the resources wouldn’t cause bottleneck to the running cassandra process.

Server Resource
* We need able to know how much resource utilization on server level (CPU, memory, disk, network)

JVM Resource
* We need to know JVM resource utilized by cassandra.
* We need to know how much resouces utilized on off-heap and in-heap memory

Cassandra process is run on JVM, and using off-heap and in-heap memory for its components. In JVM, high usage in-heap memory, could cause long GC process time, and causing application execution process temporarily stopped when GC process is running. Conversely, off-heap memory would not accounted for GC process.

Server Error
* We need to know if there any bottleneck to cassandra process that caused by server resources
* We need to know if there any data couldn’t be written to disk storage

Disk Usage
* We need to know how much disk usage by cassandra data (per keyspace, per table, per log)

Performance:

Client Connection
* We need to know how many current client that is connected to cassandra cluster

Client Request
* We need able to measure how fast cassandra cluster responses to client request.
* How much time is required to complete request (per coordinator node, per keyspace, per column family).
* We need to know which replica node that not able to response

Client Error
* We need to know how many and why client request that is not responded

Caching
* We want to know how effective our cache strategy, if we need to enable row cache or not

When creating keyspace schema, we need to set the caching strategy that we want to use. When write/read data, cassandra access cache in-memory first before lookup to disk.

There are 2 caching strategy in cassandra:
1. Row cache: a row is consist of partition key data and column data. this strategy cache all partition data in a row. Need a lot memory resource. Row-cache good for read-intensive row data, rarely be updated, because when partition data is updated, it would be evicted from cache, and would not be cached until there is read to those partition data.
2. Key cache: only cache partition key data of a row, column data stored in SSTable file on disk storage. Less memory resource needed.
More cache data is hit, more effective the strategy

Thread Pool
* We need to know if there is bottleneck in cassandra processing stage, when an client request error happens

Cassandra is based off a staged event driven architecture. Every event task in cassandra, like task for replicating data accross cluster, task for doing gossiping protocol to each cluster node, task for read data, etc., it would be grouped into a stage by each task similiarity. Each stage having its queue and thread pool, and communicate each other using messaging service.
If we able to monitor each activity state in each stage, it would give us baseline metrics for further analysis when something wrong happens.

Reference to:
-
https://www.pythian.com/blog/guide-to-cassandra-thread-pools/
- http://batey.info/cassandra-tpstats.html

Write Path
* We need to know how much throughput and fast client request writing data into cassandra cluster
* We need to know how often memtable is flushed and commit log is purged

When writing data, cassandra would write it into commitlog (in disk) and memtable (in memory). We could set how much memtable size that want to be used. Memtable would be flushed to disk when it full, and data in commit log would be purged when memtable flushed. Because those process would impact disk IO , higher memtable mean lower flushed period and low commit log purging.

write path

Cassandra good at write because it write data to memory, and write it sequentally, that means data on a file located contiguously on memory block, not randomly located, so it minimize seek time of disk IO.

Reference to: http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlHowDataWritten.html

Read Path
* We need to know how effective read path latency
* We need to know how many cassandra need to read from disk, rather that memory
* We need to know how many average SSTable on disk need to be read
* We need to know how effective our bloom filter strategy

read path

Partition data is data that is queried by its partition key. When cassandra need to read partition data on local node, it would combine the data that is read from memtable (located on in-memory) and SSTable (located on disk).

Cassandra is good at read because when it need to read data from disk, it could directly point to exact location on disk where the desired data located (using compression offset map), minimize seek time to find block memory location of partition data on a SSTable file.

Bloom filter is used to give which SSTable file lists where partition data that is queried residing. It is like checking to bloom filter endpoint, checking all SSTable index that stored in bloom filter endpoint, if an SSTable file in disk storage on the node, has data for a particular queried row.

Good bloom filter strategy resulting low false positive rate, so minimize disk IO to lookup to SSTable file per read, but need consume higher memory resource.

Reference to:
- http://shareitexploreit.blogspot.co.id/2012/09/cassandra-read-performance.html
- https://blog.medium.com/what-are-bloom-filters-1ec2a50c68ff#.sqm5zynle

Operation:

Main goal for this category goal, is we would to know how often cassandra internal doing process for cluster operation task, rather than more priority process, handling client request task (write/read data).

Eventual Consistency
* We need to know how data consistency process is done

Every data that is stored in cassandra would be replicated accross the cluster based on its replication factor. When coordinator node couldn’t write to replicas node, it would store the data on it self, then re-attempt to write to replica node if the replica node available.

Reference to: http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesTOC.html

Compaction
* We want to know how effective our compaction strategy, if it could save disk usage with minimum read performance impact

When creating keyspace schema, we need to set the compaction strategy that we want to use. Data that is deleted in cassandra wouldn’t suddently cleaned from disk, but it would be tagged as tombstone, and would be cleaned per compaction period. Compaction process would merge SSTables. Old data that failed to cleaned would consume high disk usage.

When compaction period happens, it would temporarily increase disk IO, which could impact read performance, for reads that are not fulfilled by cache.

Reference to: https://www.instaclustr.com/blog/2016/01/27/apache-cassandra-compaction/

Thats all our current cassandra monitoring objective goals that we set. In next post, we would reasoning how do we could monitor our cassandra cluster, to achive these goals.

to be continued..

References:

  1. http://www.slideshare.net/aaronmorton/cassandra-day-atlanta-2016-monitoring-cassandra

--

--