3 Reasons I chose Datastax Astra over AWS Keyspace for Cassandra Cloud Hosting (mid-2021)

Chang Xiao
Siggy Recommender
Published in
2 min readJul 26, 2021
Photo by Carlos Muza on Unsplash

We have recently completed a major migration of our database technology from Hbase (part of the Hadoop stack) to Apache Cassandra.

One of the original appeals for selecting Cassandra is AWS Keyspace since Siggy runs entirely on AWS infrastructure.

However, after testing our Siggy on AWS keyspace, we discovered the following reason for staying away from it:

1. No support for count query and virtual token

We use count query to fetch index status from the database which is a common use case, however, this is not supported by AWS at the moment
https://forums.aws.amazon.com/thread.jspa?threadID=324957

We also need to fetch a large amount from the database periodically. We use the token() function to paginate and combine our results which is not supported by AWS. (https://docs.datastax.com/en/developer/python-driver/3.24/cqlengine/queryset/#token-function)

2. Lack of technical documentation

We have the need to select a large amount of data from time to time as mentioned previously, AWS Keyspace limits the SELECT to 1MB of data and will “paginate the result automatically”.

This is fine except there is no documentation on how this is done, code examples. The code examples AWS provided is rather lacking (https://github.com/aws-samples/amazon-keyspaces-examples/tree/main/python/datastax-v3/connection-sigv4)

The only way we can test this “pagination” feature will be by importing a large database and perform a SELECT query for over 1MB of data that did not pass the sanity check.

3. Datastax (literally) wrote the code

Since Datastax literally wrote the Cassandra client driver (https://github.com/datastax/python-driver), it meant examples from their documentation https://docs.datastax.com/en/developer/python-driver/3.25/ will work with their Cloud Hosted Cassandra (Astra).

This is a huge benefit and relief for any developers who are implementing Cassandra only to find the limitations with AWS keyspace.

Conclusion and Observations

AWS generally has excellent cloud offerings for different technologies. However, it looks like they built Keyspace based on how they believe they can optimize the performance and cost for AWS instead of thinking about how the developers will use the technology.

It’s especially disappointing to hear comments from AWS like:

Aggregator functions in Cassandra have notoriously poor performance and scalability and are not really recommended for operational, production workloads (https://forums.aws.amazon.com/thread.jspa?threadID=324957)”.

From a developer/AWS customer’s perspective, it looks like a multi-billion dollar tech company making excuses. I’m certain they have the technical experience to overcome these “performance” issues.

Disclaimers

I am not affiliated or paid by either AWS or Datastax in anyways, these are my own experience and observations.

--

--

Chang Xiao
Siggy Recommender

Starter, dev, digital consultant, cyclist, tennis player. Currently focused on data science and specifically recommendation systems.