The best way to manage schema migrations in Cassandra

Cobli Brasil
Cobli
Published in
3 min readMay 29, 2017

--

by Daniel Miranda, Lucas Brunialti, Tobin Fulton

Applying schema migrations in relational databases (SQL) is common practice. Schema migration tools help replicate environments (local, dev and prod) and facilitate continuous integration. Moreover, these tools allow for easy documentation, including versioning database changes.

Nevertheless, in NoSQL systems this practice is not that common. So we decided to open source a tool, cassandra-migrate, to better manage cassandra schema migrations.

Cobli is a B2B startup in the Internet of Things (IoT) space based in São Paulo, Brazil. We specialize in building state of the art software solutions for companies to better manage their vehicle fleets.

We use OBDII devices to connect these vehicles to the internet, collecting telematics data such as geolocation, speed, and acceleration as well as engine data such as rpm, pedal angle, and voltage. We then show companies how they can optimize their commercial fleet operations based on the insights we generate.

In order to do this, we have to deal with a lot of data.

We adopted Cassandra (C*) as a database to handle these large data streams coming from OBDII devices. We also use Scala and Spark to analyze and run machine learning models on the data, saving all results in Cassandra.

We now receive over ~1.5M events per day, growing exponentially every month as more and more companies throughout Brazil and the world buy Cobli’s software solutions. :)

Since we add new features every month, we’ve found it quite difficult to manage C* schema migrations, especially within running Cassandra production clusters. We’ve historically had difficulties replicating schema across environments, documenting and versioning schema migrations as well as syncing schema migrations with the product codebase.

The cassandra-migrate has helped us manage all of these complexities by supporting setting a baseline version of the schema. :)

Also, although Cassandra does not have ddl schema transactions, it was possible to use lightweight transactions (CAS) to avoid multiple changes at the same time. This is especially useful in automating deployments.

Summarizing, the cassandra-migrate tool has the following features:

  • Written in Python for easy installation
  • Does not require cqlsh, just the Python driver
  • Supports baselining an existing database into versions
  • Supports unique environments for multiple profiles
  • Supports partial advancement
  • Supports locking for concurrent instances using Lightweight Transactions
  • Verifies stored migrations against configured migrations
  • Stores content, checksum, date and state of every migration
  • Supports deploying with different keyspace configurations for different environments

Here is an example of one of our deploys:

$ cassandra-migrate -H <host> -u <user> -P <pwd> statusINFO:Migrator:Database is already up-to-date
Keyspace: <ksp>
Migrations table: database_migrations
Current DB version: 14
Latest DB version: 14
## Applied migrations
# Name State Date applied Checksum
-- -------------------- --------- ------------------- ------------
1 v001_initial.cql SKIPPED 2017-05-19 17:13:09 430f7620381f
2 v002_event.cql SKIPPED 2017-05-19 17:13:09 cd7784863a09
3 v003_geof.cql SKIPPED 2017-05-19 17:13:10 62175ca4e94c
4 v004_routing.cql SKIPPED 2017-05-19 17:13:10 ae14c51a0d78
5 v005_ds.cql SKIPPED 2017-05-19 17:13:11 b9ef6a787adf
6 v006_fgp.cql SKIPPED 2017-05-19 17:13:11 a3f7d24794fc
7 v007_routing.cql SKIPPED 2017-05-19 17:13:11 c04e48715769
8 v008_usr_l_log.cql SKIPPED 2017-05-19 17:13:12 8b9ddb31f469
9 v009_user_lang.cql SKIPPED 2017-05-19 17:13:12 81f9bba8fa91
10 v010_dev_st.cql SKIPPED 2017-05-19 17:13:12 d425beae9684
11 v011_user_subs.cql SKIPPED 2017-05-19 17:13:13 6d1bf7a0e5c2
12 v012_device_acc.cql SKIPPED 2017-05-19 17:13:13 0ab035a35f14
13 v013_fix_cache.cql SKIPPED 2017-05-19 17:13:13 391d876b24ab
14 v014_vehicle.cql SUCCEEDED 2017-05-19 17:24:26 770fb320764d

By open sourcing the cassandra-migrate tool, we hope to help others with similar problems as well learn from other users in different environments. So if you decide to use cassandra-migrate, please get in touch! :)

And by the way, we’re hiring!

--

--

Cobli Brasil
Cobli

Uma solução completa para gestão de frotas. Nós construímos o futuro da logística levando inovação a operações de campo de todo o país.