Hacking with Elassandra

Cassandra + Elasticsearch, The marriage made in heaven

(λx.x)eranga
Effectz.AI
4 min readApr 18, 2018

--

Cassandra

Cassandra is distributed and highly available data storage. It has high write throughput and capable to write any node in the cluster. Cassandra cab be identified as Available and Partition Tolerance system which supports Eventual Consistency. Following are some major concerns on Cassandra.

  1. It does not supports full text search
  2. Overhead of secondary indexes which are using for additional searching

Elasticsearch

Elasticsearch is Lucene index based powerful search API. It storing documents as JSON object and facilitate full text search with the document data. When storing the documents it maps individual field in document into indexed form. Elasticsearch can be identified as Consistent and Available system(its not fully Partition Tolerance). Following are the main concerns of Elasticsearch.

  1. Write supports only for master node
  2. Not ideal for primary data storage (Elasticsearch appears to lose writes both updates and even non-conflicting inserts)
  3. No multi datacenter clustering (Need to achieve multi datacenter clustering with additional tool like kafka)

Elassandra

In Elassandra, Elasticsearch is embedded in to Cassandra as a plugin. So it introduced as Elassandra = Elasticsearch + Cassandra. We can achieve both features in Elasticsearch and Cassandra on Elasssandra. Above mentioned drawbacks (which are on both Cassandra and Elasticsearch) can be overcome with Elassandra. Following are some notable features. Read more about Elassandra features from here.

  1. Full text search on Cassandra data
  2. Elasticsearch native API to access Cassandra data
  3. JSON REST API to access Cassandra data
  4. Search on multiple keyspaces in one query

Elassandra architecture

Each Elassandra node comes with Elasticsearch instance and Cassandra instance. It provide both Cassandra API(e.g cqlsh) and Elasticsearch API(e.g REST API) to manipulate the data. When write data to Cassandra it automatically(synchronously) index in Elasticsearch(via custom secondary index). Written data first goes to in memory Lucene index and periodically flush to SSTable in disk. The data written to Elasticsearch available to search after refreshing the index. By default Elasticsaech refresh the indexes every second(default refresh_interval 1 second). Elassandra cluster follows masterless ring architecture(as same as Cassandra). It replicates the data according to the replication factor.

In Elasssandra original JSON documents(_source) stored in Cassandra. Only Lucene indexes stored in Elasticsearch(no Elasticsearch _source) . Elasticsearch index can be built with the existing the data of Cassandra(basically index build from the data in SSTable). More information about Elassandra can be find from here.

Elassandra use case

Following figures shows an example use case of Elassandra. In that scenario I have used Elassandra to build full-text search REST API with the data on Cassandra. I’m writing data to Cassandra from applicatoin-service(use Cassandra API to write the data). Then Elassandra will index the data on Cassandra in Elasticsearch. I can search the data from rest-service(use Elasticsearch API to read the data) to build the full-text search API.

Run Elasassandra

I have run three node Elassandra cluster with docker in my local environment. I have built customer Elassandra image which support to run in cluster mode. Following is the docker-compose.yml to run the Cassandra cluster with Elassandra docker.

Now I can start the Elassandra cluster with one node at a time. elassandra1 is the seed node, please wait few minutes till starting the elassandra1 before start the other nodes.

Create Cassandra table

Next I have created a Cassandra keyspace(storage_document), user defined types(attachment, party) and table(documents).

Create Elasticsearch index

I need to create a Elasticsearch index corresponds with storage_document.documents table in Elassandra. Following is the way to do that. It will create a Elasticsearch documents index mapping corresponds with storage_document.documents table. Read more about creating Elassandra index from here.

Bootstrap data

The mapping between Cassandra storage_document.documents table and Elasticsearch documents index is created now. So all the data saving in storage_document.documents table automatically indexed in Elasticsearch documents index. I have bootstrapped the Cassandra storage_document.documents table with following sample data set.

Search data

Now I can search the data using Elasticsearch REST API in Elassandra. Following are some sample Elasticsearch queries which I have used to search the data.

1. Search all data

2. Single boolean match

3. Multi boolean match

4. Nested match

5. Boolean match with nested match

6. Pagination

7. Sort

8. Pagination with Sort

References

  1. http://elassandra.readthedocs.io/en/latest/
  2. https://github.com/strapdata/elassandra
  3. https://simongui.github.io/2016/07/20/elassandra.html
  4. https://medium.com/@itseranga/cassandra-lucene-queries-with-udt-e5d1a10d2b9c
  5. https://medium.com/rahasak/deploy-multi-data-center-elassandra-cluster-c6cb4abf50d1
  6. https://medium.com/rahasak/migrate-cassandra-data-to-elassandra-1d22adfbf11a
  7. https://medium.com/rahasak/dynamically-add-new-fields-to-elassandra-b1c67b6b6cbd

--

--