All data stores mentioned in the book "Designing Data-Intensive Applications"

When looking for good references for improving my software architecture skills, I came to the book “Designing Data-Intensive Applications,” written by Martin Kleppmann. As soon as I read the last page, I did a simple exercise: tried to recall the databases mentioned throughout the previous 624 pages. Checking personal notes or the book itself was strictly forbidden.

Since I could easily remember more than 20 products, my immediate conclusion was that I needed to narrow down the studies. Before trying to understand what could be useful in my future projects, I was forced to come up with methods for choosing a focus. Maybe the most cited technologies? That’s when I remembered one of the most straightforward but useful applications of Apache Spark: counting words!


I converted the Kindle book (purchased through to a .txt file and loaded the contents into an Apache Spark server using Python. After experimenting with a couple of other strategies (most frequent capitalised words, TF-IDF), I selected the Index section and selected capitalised expressions starting new lines.

The outcome was a list of 342 words, which were verified manually for taking expressions such as “R-trees” and “ETL” out of the results. Since this job would be forcing me to recall the meaning of each name and search for official websites when still in question, I decided not to try to write an automation script.

Once the list was narrowed to 72 items, a straightforward word counter did the job. For every product with more than two words, I queried the book for the single most meaningful word. e.g., “Apache Kafka” refers to the number of times “Kafka” is mentioned. “(IBM) System R” had to be considered a single expression for not mixing with other kinds of system's. “(Google) Bigtable”, in the book, sometimes refer to the “Bigtable” data model, first proposed by Google’s database and later implemented in other products. In the end, I decided to count both cases in favor of Google's product.

In a few cases, it’s hard to draw a simple line of what is a data store and what is not. Apache Lucene, a dependence of both Elasticsearch and Apache Solr, was also added to the list.

(46) Apache ZooKeeper means that ZooKeeper is mentioned 46 times in the book (without counting the Index section).

None of the logos are owned or were created by me, so I don't take responsibility over their eccentric design.

(46) Apache ZooKeeper

(44) PostgreSQL

(42) MySQL

(41) Apache Kafka

(40) Apache Cassandra

(37) Oracle Database

(33) MongoDB

(31) Riak

(28) Apache HBase

(20) Microsoft SQL Server

(19) VoltDB

(17) Amazon DynamoDB

Bildergebnis für DynamoDB

(14) Apache Lucene

(14) Project Voldemort

(13) Apache CouchDB

(13) etcd

(13) Datomic

(12) IBM Db2

(11) Google Spanner

(10) Elasticsearch

Bildergebnis für logo Elasticsearch

(9) Couchbase Server

(9) Redis

(8) LinkedIn Espresso

(8) Google Bigtable

(8) RethinkDB

(8) LevelDB


(7) IBM System R

(6) Apache Solr

(5) RocksDB

(4) RabbitMQ

(4) Vertica

(3) Microsoft Azure Storage

(3) Event Store


(3) HornetQ

(3) Amazon S3

(3) Neo4j

(3) Apache DistributedLog

(3) Apache ActiveMQ

(3) Memcached

(3) Teradata

(3) FoundationDB

(3) Bayou

(2) IBM MQ

(2) NonStop SQL

(2) StatsD

(2) HyperDex

(2) Terrapin

(2) Yahoo! Pistachio

(2) ZeroMQ

(2) ParAccel

(2) Brubeck

(2) LMDB

(2) MemSQL

(2) Druid

(2) Consul

(2) Firebase

(2) ElephantDB

(2) Apache BookKeeper

(2) IBM WebSphere

(1) Microsoft Azure Service Bus

(1) Apache Qpid

(1) AllegroGraph

(1) Apache HAWQ

(1) Titan

(1) Amazon RedShift

(1) InfiniteGraph

(1) NATS

(1) RAMCloud

(1) MSMQ

(1) webMethods

If it isn’t obvious yet, yes, you should get yourself a copy of Martin’s book. Well worth the investment of time and money.