Hacking with Elassandra
Cassandra + Elasticsearch, The marriage made in heaven
Cassandra
Cassandra is distributed and highly available data storage. It has high write throughput and capable to write any node in the cluster. Cassandra cab be identified as Available
and Partition Tolerance
system which supports Eventual Consistency
. Following are some major concerns on Cassandra.
- It does not supports full text search
- Overhead of secondary indexes which are using for additional searching
Elasticsearch
Elasticsearch is Lucene index
based powerful search API. It storing documents as JSON object and facilitate full text search with the document data. When storing the documents it maps individual field in document into indexed form. Elasticsearch can be identified as Consistent
and Available
system(its not fully Partition Tolerance
). Following are the main concerns of Elasticsearch.
- Write supports only for
master node
- Not ideal for primary data storage (Elasticsearch appears to
lose writes
both updates and even non-conflicting inserts) - No multi datacenter clustering (Need to achieve multi datacenter clustering with additional tool like
kafka
)
Elassandra
In Elassandra, Elasticsearch is embedded in to Cassandra as a plugin. So it introduced as Elassandra = Elasticsearch + Cassandra
. We can achieve both features in Elasticsearch and Cassandra on Elasssandra. Above mentioned drawbacks (which are on both Cassandra and Elasticsearch) can be overcome with Elassandra. Following are some notable features. Read more about Elassandra features from here.
- Full text search on Cassandra data
- Elasticsearch native API to access Cassandra data
- JSON REST API to access Cassandra data
- Search on multiple keyspaces in one query
Elassandra architecture
Each Elassandra node comes with Elasticsearch instance and Cassandra instance. It provide both Cassandra API(e.g cqlsh
) and Elasticsearch API(e.g REST API
) to manipulate the data. When write data to Cassandra it automatically(synchronously) index in Elasticsearch(via custom secondary index). Written data first goes to in memory Lucene index and periodically flush to SSTable in disk. The data written to Elasticsearch available to search after refreshing the index. By default Elasticsaech refresh the indexes every second(default refresh_interval
1 second). Elassandra cluster follows masterless ring architecture(as same as Cassandra). It replicates the data according to the replication factor
.
In Elasssandra original JSON documents(_source
) stored in Cassandra. Only Lucene indexes stored in Elasticsearch(no Elasticsearch _source
) . Elasticsearch index can be built with the existing the data of Cassandra(basically index build from the data in SSTable
). More information about Elassandra can be find from here.
Elassandra use case
Following figures shows an example use case of Elassandra. In that scenario I have used Elassandra to build full-text search REST API with the data on Cassandra. I’m writing data to Cassandra from applicatoin-service
(use Cassandra API to write the data). Then Elassandra will index the data on Cassandra in Elasticsearch. I can search the data from rest-service
(use Elasticsearch API to read the data) to build the full-text search API.
Run Elasassandra
I have run three node Elassandra cluster with docker in my local environment. I have built customer Elassandra image which support to run in cluster mode. Following is the docker-compose.yml
to run the Cassandra cluster with Elassandra docker.
Now I can start the Elassandra cluster with one node at a time. elassandra1
is the seed node
, please wait few minutes till starting the elassandra1
before start the other nodes.
Create Cassandra table
Next I have created a Cassandra keyspace(storage_document
), user defined types(attachment
, party
) and table(documents
).
Create Elasticsearch index
I need to create a Elasticsearch index corresponds with storage_document.documents
table in Elassandra. Following is the way to do that. It will create a Elasticsearch documents
index mapping corresponds with storage_document.documents
table. Read more about creating Elassandra index from here.
Bootstrap data
The mapping between Cassandra storage_document.documents
table and Elasticsearch documents
index is created now. So all the data saving in storage_document.documents
table automatically indexed in Elasticsearch documents
index. I have bootstrapped the Cassandra storage_document.documents
table with following sample data set.
Search data
Now I can search the data using Elasticsearch REST API in Elassandra. Following are some sample Elasticsearch queries which I have used to search the data.
1. Search all data
2. Single boolean match
3. Multi boolean match
4. Nested match
5. Boolean match with nested match
6. Pagination
7. Sort
8. Pagination with Sort
References
- http://elassandra.readthedocs.io/en/latest/
- https://github.com/strapdata/elassandra
- https://simongui.github.io/2016/07/20/elassandra.html
- https://medium.com/@itseranga/cassandra-lucene-queries-with-udt-e5d1a10d2b9c
- https://medium.com/rahasak/deploy-multi-data-center-elassandra-cluster-c6cb4abf50d1
- https://medium.com/rahasak/migrate-cassandra-data-to-elassandra-1d22adfbf11a
- https://medium.com/rahasak/dynamically-add-new-fields-to-elassandra-b1c67b6b6cbd