Migrate data from cassandra to elassandra
With sstableloader
About elassandra
Elassandra is a distributed storage which built with combining elasticseach
with cassandra
. It comes elasticsearch as a cassandra plugin. Basically it has cassandra API as well as elasticsearch API. When data save on cassandra it will automatically index on elasticsearch. It’s an ideal and powerful solutions to achieve full text search on cassandra. Read more about elassandra from here.
Scenario
I have existing cassandra cluster with a keyspace and several tables. I wanted to migrate the existing data on this cassandra cluster into elassandra cluster. Cassandra keyspace name is zchain
. It has a table named users
. In this post I’m gonna show how I have migrated the existing data on the cassandra zchain.user
table to elassandra zchain.user
table.
To migrate data from cassandra to elassandra I’m gonna use cassandra sstableloader
command. sstableloader
command can be used to bulk load the data to cassandra SSTable
. According to documentation it streams a set of SSTable data files to a live cluster. It does not simply copy the set of SSTables to every node, but transfers the relevant part of the data to each node, conforming to the replication strategy of the cluster.
Cassandra cluster
Following are the cql
commands that I have used to create cassandra zchain
keyspace and users
table.
I’m running this cassandra cluster on top of docker. Following is the docker-compose
I have used to run cassandra.
Cassandra keeps SSTable data on disk(inside cassandra data directory). zchain keyspace related data is on data/zchain
directory inside the cassandra data directory. Since I’m running cassandra as docker I have volume mapped the cassandra data directory into my host machines /private/var/services/casssandra
directory. All the cassandra data directory content on my host machine.
It contains the zchain.users
table data on data/zchain/users-6a095200ee2011e8bd6d6d2c86545d91
directory.
With sstableloader
I’m going to load the data on data/zhcina/users-6a095200ee2011e8bd6d6d2c86545d91
directory in to elassandra(sstableloader command will migrate all the data on zchain.users to elassandra zchain.users
table)
Elassandra cluster
To migrate data from cassandra to elassandra I need to create same keyspace and table on elassandra. Following are the cql commands to create them on elassandra.
In elassandra we have to use NetworkTopologyStrategy
as the replication strategy. The replication factor specifies with data center name DC1
. I’m running elassandra with docker. Following is the docker-compose file I have used to run elassandra.
Migrate data
The ssatblaloader
command exists inside elassandra container. I’m using that command to load the data from data/zhcina/users-6a095200ee2011e8bd6d6d2c86545d91
directory to elassandra zchain.users
table. To do that first I have copied cassandra zchain.users table content on my host machine into elassandra container.
Then I have connected to elassandra container and executed following ssatblaloader command.
Elasticsearch index
Now all of the data migrated to zchain.users on elassandra. Following is the content of zchain.users elassandra table. This data not yet indexed on elasticsearch.
To index them on elasticsearch we need to create a elasticsearch mapping for cassandra table and flush the data from cassandra to elasticsearch. Following is the curl
command to do that.
In here I’m creating a users
elastic index with the data on zchain.users
table on elassandra. We can view this data with following curl command.