NoSQL Cloud Service Benchmarking: DynamoDB vs. Datastore
Many new applications are tending to use more NoSQL databases (such as MongoDB, Cassandra) instead of relational databases (MySQL, PostgreSQL) . NoSQL databases have no schema for their stored data and have no ACID (Atomicity, Consistency, Isolation, Durability) guarantee. As a result, they tend to have more scalability and flexibility which are some of the reasons why NoSQL database became popular recently.
NoSQL Cloud service
Other than self-hosted NoSQL databases stated above, there are also have some NoSQL services like AWS DynamoDB or Google Cloud Datastore which help you manage the NoSQL database setup like auto-scaling.
Which one should we use?
There are many aspects to consider when choosing NoSQL cloud services such as performance, scalability, ease of use, pricing, consistency mode, etc.
In this experiment, we will focus on the performance of 2 NoSQL cloud services which are AWS DynamoDB and Google Cloud Datastore.
Overall
In this experiment, we will …
- Benchmark the average throughput and latency of both services on different workloads.
- Test on the free tier of both NoSQL service.
- Use Iaas service of both provider (Google Compute Engine and AWS EC2) as a client to benchmark the NoSQL service.
- Use YCSB (Yahoo! Cloud System Benchmark) as a library for benchmarking.
- Test for 7 days, each day we will run each workload twice (once by Compute Engine and once by EC2).
Metrics
We will measure the performance on average throughput and latency for each different workloads.
Workloads
We will use 4 different workload. Every workload will have 1000 operations.
- Workload A: 95% read, 5% update (read mostly)
- Workload B: 50% read, 50% update (balanced)
- Workload C: 5% read, 95% update (write mostly)
- Workload D: 70% read, 15% update, 15% insert (read mostly with insert)
NoSQL cloud service configuration
AWS DynamoDb
We will set the configuration based on the free tier pricing of DynamoDB which are …
- 25 RCU (Read Capacity Unit)
- 25 WCU (Write Capacity Unit)
- No auto scaling
- Tokyo region
Google Cloud Datastore
Because the Datastore free tier does not limit the RCU and WCU and there is no configuration about them, we will use the default configuration and set the region to Tokyo.
Prepare the benchmarking script
This script will be run on both Iaas service.
EC2 & Compute Engine instances setup
AWS EC2 setup
- t2.micro
- Tokyo region
Google Compute Engine setup
- f1.micro
- Tokyo region (ap-northeast)
For both instances, we will have to
- Set credentials to connect both DynamoDB and Datastore
- Install YCSB
- Schedule crontab to run the script
Results
For workload A (read mostly), DynamoDB has higher average throughput and lower average latency than the Google Cloud Datastore
For workload B(balanced), DynamoDB has higher average throughput and lower average latency than the Google Cloud Datastore
For workload C(write mostly), DynamoDB has higher average throughput and lower average latency than the Google Cloud Datastore
For workload D(read mostly with insert), DynamoDB has higher average throughput and lower average latency than the Google Cloud Datastore
Conclusion
- DynamoDB has significantly higher average throughput and lower average latency on all different workloads which means DynamoDB free tier has better performance.
- DynamoDB still can be scaled higher if needed while Google Datastore does not have scaling option.
- DynamoDB average throughput and average latency fluctuate on different days and workloads while Google Datastore’s are stable.