[PoC7] Storing ~2millions/minute records in Cassandra by retrieving from a Netty-based REST web service requests

Mert Çalışkan
3 min readJul 28, 2015

--

Having delt with REST web service implementations based on RESTEasy and Netty for quite some time, I’ve managed to handle from 20k/sec to 40k/sec requests in on EC2 instances. But without storing any of those requests, that won’t make any sense for my production setup. So it’s time to move onto the NoSQL land. I sticked with Cassandra, which offers columnar DB with scalability and availability features.

I installed Cassandra Cluster on 5 nodes and got it up & running. You can find the steps to do it in here. For connecting to Cassandra, I’ve modified my netty-rest-simple REST web service client and named it as cassandra-netty-rest-simple. It uses Achilles under the hood, which is an ORM-like persistence manager for Cassandra. Other than that it uses RESTEasy for JAX-RS compliant REST web service implementation and Netty for handling the high-throughput with its NIO architecture.

I used the AMI ami-a8221fb5 (from Frankfurt region) and instance types i2.2xlarge and m4.2xlarge respectively for creating all 7 instances given in the diagram below.

After setting the Cassandra cluster, just execute the nodetool status on one of the nodes and you should be getting an output as seen below. I omitted the Host ID and Rack columns from the output for simplicity.

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
— Address Load Tokens Owns (effective)
UN IP1 51.63 KB 256 42.9%
UN IP2 66.51 KB 256 37.9%
UN IP3 82.5 KB 256 39.4%
UN IP4 66.58 KB 256 38.4%
UN IP5 104.86 KB 256 41.4%

To get cassandra-netty-rest-simple (instance F) up and running just execute the command given as follows, after creating a fat jar out of it of course with the maven-shade-plugin defined. The each node’s IP of the Cassandra cluster can be provided as parameter with a space delimited list.

java -jar cassandra-netty-rest-simple-0.0.1.jar IP1 IP2 IP3 IP4 IP5

Within the connector part of the code, the replication factor was set to 3 and the consistency level was set to ONE for both read and write. You can just have a peek at the code to get a basic understanding.

Execution of the http-requester was same as the previous PoCs, here and there.

java -jar http-requester-0.0.1.jar <IP> <PORT> 1000000 1000000 200

A sampling for a minute gave out the numbers which sums up to 1.675.121 requests.

22690 req/sec
31063 req/sec
31557 req/sec
26515 req/sec
27138 req/sec
31510 req/sec
22223 req/sec
29160 req/sec
31140 req/sec
29701 req/sec
26529 req/sec
31512 req/sec
27290 req/sec
26686 req/sec
21130 req/sec
31218 req/sec
31766 req/sec
26539 req/sec
31290 req/sec
26965 req/sec
29155 req/sec
23700 req/sec
24841 req/sec
31416 req/sec
26128 req/sec
31541 req/sec
31692 req/sec
21518 req/sec
31582 req/sec
26209 req/sec
26203 req/sec
25848 req/sec
31205 req/sec
31477 req/sec
22138 req/sec
26038 req/sec
30692 req/sec
25506 req/sec
25211 req/sec
25685 req/sec
27375 req/sec
30977 req/sec
26741 req/sec
29672 req/sec
26129 req/sec
31119 req/sec
26148 req/sec
31313 req/sec
27776 req/sec
26157 req/sec
28272 req/sec
25946 req/sec
30600 req/sec
23647 req/sec
29033 req/sec
26308 req/sec
30148 req/sec
26429 req/sec
29779 req/sec
30145 req/sec

Fair enough but since Amazon creates a bottleneck on the network with its 1Gpbs bandwith, I thought I should fire up m4.10XLarge machines instead of m4.2xLarge ones :). 40 cores but 10Gbps network, yay!. Having this configuration running for a minute ended up 2.375.654 requests/min in numbers. Yeah better but not good enough... The graph of the requests/sec were as follows.

I see numbers as high as 70k/sec but the whole replay spikes badly. Looks like I need to take care of some sort of drain. Stay tuned!

--

--

Mert Çalışkan

Opsgenie Champion at Atlassian. Oracle Java Champion. AnkaraJUG Lead. Author of Beginning Spring & PrimeFaces Cookbook.