Replicating NPM to Riak

I did this crazy thing to replicate NPM to Riak

Robert Oroszi
2 min readFeb 15, 2014

Last night I had this crazy to replicate NPM’s CouchDB database to Riak. There are a couple of blog posts on the internet how to do continuous replication between two CouchDB instances, but I’m not really into CouchDB and MongoDB. They are fine, but the replication in both those NoSQL databases are a pain in the ass. There are some modules which address this problem: https://www.npmjs.org/package/npm-replication-watcher, https://github.com/npm/couch-readonly-replica. As you can see how many people are struggling with replication (including the guys at Nodejitsu and even NPM Inc.).

So last summer I took two weeks off and did a deep database benchmarking including MongoDB, CouchDB, Riak and a little bit of Cassandra. The first two ones are easy setup, easy to use. The latter ones aren’t in my point of view, but I was amazed how awesome they are regarding the replication (and I was even pulling off the power cable from the computers, hitting the power button and `kill -9`-ing the processes). This was one the reason why I did the replication.

Also I always wanted to have my own NPM registry☺.

So I found @dominictarr’s awesome level-couch-sync module which powers npmd, I rewrote some parts of it to be a much more generic solution for syncing from CouchDB. Basically it’s just an EventEmitter, which emits an event when some change happened in CouchDB. You can grab it from here: https://github.com/oroce/couchdb-sync or from NPM.

After that the replication was easy as hell, I had to handle the data events and put every documents to Riak, you can even dig into the source code, it’s less than 100 LOC: https://github.com/oroce/replicate-npm-to-riak.

So what’s working:

  • my registry without attachments works fine (`npm —registry url view debug` is working great)
  • my registry is only partial currently, I‘m dealing with some timeouts but you can install some packages (`npm install —registry url install debug`)

What’s on my TODO list?

  • proper server
  • instead of putting data into Riak, trying out Riak Cloud Storage
  • create a demo using MooseFS (at purposeindustries we are heavily using it across multiple servers)
  • replication between multiple Riak nodes
  • multiple Riak nodes with haproxy and keepalived
  • module uploading to Riak (using as a private registry)
  • searching

You wanna give a try? http://registry.oroszi.net:8008/riak/docs, please keep in mind this is an Intel Atom server which is located in my living room with a 80/25 Mbit/s internet connection. Here’s an example gist to try out: https://gist.github.com/oroce/9021938.

If you have any question drop me an issue on github or ping me on twitter (@oroce).

--

--