How we use RavenDB

The short story of a long relationship between Bookster, a public library, and RavenDB, not-your-usual nosql database.


is a public library with an online experience: you can borrow physical books from us using a web platform and we’ll deliver them to you (we’ll even get them back!).

We take great pride in the quality of our selection, with thousands of titles that are read-worthy, in various categories such as personal development, business, fiction and more.

In the two years that have passed since the launch of our service, the number of subscribers has grown exponentially, to more than 15.000 at the moment, putting a lot of pressure on the infrastructure. At the beginning of this year our MVP was starting to show its limitations, so it was high time to invest in rebuilding the system with scalability and performance in mind.

It’s important to note that the original Bookster platform used a WordPress backend with a classical MySql database as the core CMS, while the frontend was jQuery driven. It has worked well for our users, especially while the load was less than a couple of thousand users. But as more and more accounts were created, we began to notice performance was dropping severely.

We quickly figured out that the bottleneck wasn’t MySql, but the WordPress engine, which would, at times, issue so many queries that it would literally bring the database machine CPU to the max. The scenarios where it would create so much load are related to the way WordPress stores and reads post information by using meta values linked to entities (an anti-pattern often called select n+1 whereby for each item several other queries are needed to fully load its details).

While it is perfectly fine for a single post view to load several attributes, when browsing our full catalogue of thousands of books, the same calls per book would quickly amount to more than tens of thousands (coupled with multiple users on the website).

This was a design problem that could be solved by issuing queries differently, but at that moment we knew we could do more than just optimize queries. Being a public library, we also need great search capabilities, as well as various ranking information about the books (think imdb.com). For comparison, the original search was built on top of MySql with simple queries (such as “like %search_term%”) and had no suggestions or more advanced term matching.


This is where RavenDB came into play: we’d complement the relational store (MySql) with a NoSql secondary store that can provide us more features, faster and more effectively.

Search

We built a synchronization service bridge that pulls data out of MySql into Raven and recreates the full book catalogue in json. This store can now be indexed using various Lucene indexes (tweaked to our needs) and we are happy to report that search is now much faster and more relevant at the same time. The boost feature of the query index came in very handy, as we’d first look for a book by name, then by author and then by category (matches are more relevant in this order).

An even more exciting feature was the introduction of search suggestions for our users, combined with search term highlighting, into an interface that closely resembles the search experience at Amazon. It’s fast, intuitive and offers a helping hand for typos or approximate search terms.

Bulk Inserts

While the search improvements were a quick win, another aspect of our reengineering involved creating ranked collections of materials, for each of our users. A recommendation engine is fed with consumption data (Apache Mahout) and it produces a set of n (e.g. 100) recommendations for you to read next. This data would mean at least 15.000 x 100 = 1.5 million inserts in a relational database for all of our users.

It would be also be pretty slow to store it, no matter how well you could perform bulk inserts. However, it is not needed in a relational format, so we also use Raven to save user specific documents that contain the next-day recommendations. When any of the 15.000 users logs in, we pull out the document, check the recommendations against real time stock data and give him a fast (and by that I mean milliseconds) set of available books.

Serialized complex entities

There’s also a third model that fits really well with Raven, namely highly hierarchical data models, that would otherwise require multiple tables to store in a denormalized fashion. To give you an example, imagine the home page of IMDB and split it into various sections: featured movies, tops, trending, news, etc. Suppose that you would want to control the rendering of such an interface by defining templates on the server and serving personalized layouts to your users, depending on their profile.

In our case, we ended up with a model that has a hierarchical depth of 4 (starting from root, going into sections, for each section there would be some form of content, the content would have attributes). As mentioned before, storing this in a relational model is perfectly achievable, although not without some overhead during the read/write phases. Instead, we chose the json serializable format and we store the entire layout into a single record in Raven, making reads and writes atomic and effortless. It literally requires 2 lines of code for the whole thing.

Our Conclusions

All in all, Raven is a great product for the three scenarios we needed:

  1. Search (indexing with boosting, search suggestions, search term highlighting)
  2. Bulk insert substitution with document based collections
  3. Relational model substitution with json serialization of hierarchically complex entities

Performance and stability were great during our 4 months of usage in production (so far) and there’s a solid management interface that allows you to control pretty much everything you need. Last but not least, documentation was spot on, although I may be biased since I have used Lucene before and was already familiar with some of the search related information.

The decision to use RavenDB instead of any other NoSql store was half due to its great C# integration and half related to the trustworthy name of its creator — Ayende— who had previously contributed to a great deal of projects I used successfully, among which NHibernate and NHibernate Profiler. That trust was well placed, as our experience can confirm ;)