Migrate data from cassandra to elassandra
Elassandra is a distributed storage which built with combining
cassandra. It comes elasticsearch as a cassandra plugin. Basically it has cassandra API as well as elasticsearch API. When data save on cassandra it will automatically index on elasticsearch. It’s an ideal and powerful solutions to achieve full text search on cassandra. Read more about elassandra from here.
I have existing cassandra cluster with a keyspace and several tables. I wanted to migrate the existing data on this cassandra cluster into elassandra cluster. Cassandra keyspace name is
zchain. It has a table named
users. In this post I’m gonna show how I have migrated the existing data on the cassandra
zchain.user table to elassandra
To migrate data from cassandra to elassandra I’m gonna use cassandra
sstableloader command can be used to bulk load the data to cassandra
SSTable. According to documentation it streams a set of SSTable data files to a live cluster. It does not simply copy the set of SSTables to every node, but transfers the relevant part of the data to each node, conforming to the replication strategy of the cluster.
Following are the
cql commands that I have used to create cassandra
zchain keyspace and
I’m running this cassandra cluster on top of docker. Following is the
docker-compose I have used to run cassandra.
Cassandra keeps SSTable data on disk(inside cassandra data directory). zchain keyspace related data is on
data/zchain directory inside the cassandra data directory. Since I’m running cassandra as docker I have volume mapped the cassandra data directory into my host machines
/private/var/services/casssandra directory. All the cassandra data directory content on my host machine.
It contains the
zchain.users table data on
sstableloader I’m going to load the data on
data/zhcina/users-6a095200ee2011e8bd6d6d2c86545d91 directory in to elassandra(sstableloader command will migrate all the data on zchain.users to elassandra
To migrate data from cassandra to elassandra I need to create same keyspace and table on elassandra. Following are the cql commands to create them on elassandra.
In elassandra we have to use
NetworkTopologyStrategy as the replication strategy. The replication factor specifies with data center name
DC1. I’m running elassandra with docker. Following is the docker-compose file I have used to run elassandra.
ssatblaloader command exists inside elassandra container. I’m using that command to load the data from
data/zhcina/users-6a095200ee2011e8bd6d6d2c86545d91 directory to elassandra
zchain.users table. To do that first I have copied cassandra zchain.users table content on my host machine into elassandra container.
Then I have connected to elassandra container and executed following ssatblaloader command.
Now all of the data migrated to zchain.users on elassandra. Following is the content of zchain.users elassandra table. This data not yet indexed on elasticsearch.
To index them on elasticsearch we need to create a elasticsearch mapping for cassandra table and flush the data from cassandra to elasticsearch. Following is the
curl command to do that.
In here I’m creating a
users elastic index with the data on
zchain.users table on elassandra. We can view this data with following curl command.