Migrate data from cassandra to elassandra

With sstableloader

About elassandra

Elassandra is a distributed storage which built with combining elasticseach with cassandra. It comes elasticsearch as a cassandra plugin. Basically it has cassandra API as well as elasticsearch API. When data save on cassandra it will automatically index on elasticsearch. It’s an ideal and powerful solutions to achieve full text search on cassandra. Read more about elassandra from here.

Scenario

I have existing cassandra cluster with a keyspace and several tables. I wanted to migrate the existing data on this cassandra cluster into elassandra cluster. Cassandra keyspace name is zchain. It has a table named users. In this post I’m gonna show how I have migrated the existing data on the cassandra zchain.user table to elassandra zchain.user table.

To migrate data from cassandra to elassandra I’m gonna use cassandra sstableloader command. sstableloader command can be used to bulk load the data to cassandra SSTable. According to documentation it streams a set of SSTable data files to a live cluster. It does not simply copy the set of SSTables to every node, but transfers the relevant part of the data to each node, conforming to the replication strategy of the cluster.

Cassandra cluster

Following are the cql commands that I have used to create cassandra zchain keyspace and users table.

I’m running this cassandra cluster on top of docker. Following is the docker-compose I have used to run cassandra.

Cassandra keeps SSTable data on disk(inside cassandra data directory). zchain keyspace related data is on data/zchain directory inside the cassandra data directory. Since I’m running cassandra as docker I have volume mapped the cassandra data directory into my host machines /private/var/services/casssandra directory. All the cassandra data directory content on my host machine.

It contains the zchain.users table data on data/zchain/users-6a095200ee2011e8bd6d6d2c86545d91 directory.

With sstableloader I’m going to load the data on data/zhcina/users-6a095200ee2011e8bd6d6d2c86545d91 directory in to elassandra(sstableloader command will migrate all the data on zchain.users to elassandra zchain.users table)

Elassandra cluster

To migrate data from cassandra to elassandra I need to create same keyspace and table on elassandra. Following are the cql commands to create them on elassandra.

In elassandra we have to use NetworkTopologyStrategy as the replication strategy. The replication factor specifies with data center name DC1. I’m running elassandra with docker. Following is the docker-compose file I have used to run elassandra.

Migrate data

The ssatblaloader command exists inside elassandra container. I’m using that command to load the data from data/zhcina/users-6a095200ee2011e8bd6d6d2c86545d91 directory to elassandra zchain.users table. To do that first I have copied cassandra zchain.users table content on my host machine into elassandra container.

Then I have connected to elassandra container and executed following ssatblaloader command.

Elasticsearch index

Now all of the data migrated to zchain.users on elassandra. Following is the content of zchain.users elassandra table. This data not yet indexed on elasticsearch.

To index them on elasticsearch we need to create a elasticsearch mapping for cassandra table and flush the data from cassandra to elasticsearch. Following is the curl command to do that.

In here I’m creating a users elastic index with the data on zchain.users table on elassandra. We can view this data with following curl command.

Reference

  1. https://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated
  2. https://medium.com/@itseranga/elassandra-936ab46a6516
  3. https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsBulkloader.html
  4. http://thelastpickle.com/blog/2018/04/03/cassandra-backup-and-restore-aws-ebs.html