Three Traps To Avoid When Using a NoSQL Document Store
In my previous article, I spoke about three advantages to using a NoSQL document store in a web application. If you’ve recently decided to forge ahead and build a new application with NoSQL, then you may find the following guidelines helpful in avoiding some potential (and potentially serious) performance bottlenecks. And if you already have experience with a similar project, the following might be food for thought.
When working with a NoSQL document store, it is important to keep in mind that every view must examine every document in the store at least once. When a view is created, its map and reduce functions are applied to all the documents in the store. And when a document is added to the store, every view is updated accordingly. Therefore, it is beneficial to the performance of your system to aim for a balance amongst:
- the number of views;
- the complexity of the views;
- the complexity of the web/application server, and;
- adds/updates of documents vs. retrieval of documents.
The following three pitfalls serve to exemplify the four above-mentioned concerns.
1. Creating Too Many Views
When working with a SQL database, it’s easy to build a large collection of queries that perform various CRUD (Create, Read, Update, Delete) tasks. A unique SQL query can be used for every little variation in how data needs to be handled. Unfortunately, it’s just as easy to carry on thinking like this after switching to NoSQL.
Every view that you add to a NoSQL document store must initially examine every document in the store. That’s not really an issue on its own and is perfectly fine for applications that will be doing a lot more fetching than storing. However, since every new or updated document triggers an update of all the views, having a lot of views for applications that frequently add or update documents can harm the document store’s performance.
Where frequent adds and updates are required, you might consider fetching documents by ID and doing filtering, sorting, and processing on the application server, while relying on only a small number of specialised views that add real value by simplifying the application server.
2. Using Views Only as Filters
A SQL database divides different data entities into different tables. For example, customers, products, sales, invoices, etc. would all be separate tables in a hypothetical online store’s database. If one blindly maps this concept to a NoSQL document store, then one might have a different document type for each of the previously mentioned entities. However, since all documents are lumped together in the store, each document type will need to have an attribute, say, type, that describes what kind of entity the document is. From this point, it’s easy to set up a view that returns a collection of documents where type is “customer” or “product” and so on.
This is not necessarily bad practice but if filtering is all that your views do, then you’re not really using the full potential of a NoSQL document store. Worse still is that your application server is probably processing the documents itself. For instance, say you want the documents in a particular sort order — your server is suddenly doing the sorting whereas a SQL query could have done it for you. In other words, you’re writing out in code what you could have written in a query or a view!
If filtering is all you need in some cases, then this is perfectly acceptable. But if you find yourself flattening lots of SQL queries into simple views plus server-side code, ask yourself if you’re adding value or just adding more code.
3. Overloading The Document Store or The Application Server
A NoSQL view does not have to be small in its capabilities — its map function can be large and sophisticated enough to do some really heavy lifting. One of the web applications developed by the company I work for allows a user to create calendar events that repeat periodically. For example, “08:30 to 09:30, every Tuesday for the next three months”. Instead of computing all of the events (one for each Tuesday from some starting date to an ending date) and then storing each event in the database, only the event specification is stored. A colleague of mine wrote a view that then expands the specification, producing an event for each day on which the event should appear.
The benefit of this approach is that the view does this once for an event specification and then every time a client looks at the calendar, the relevant events are fetched straight from the view instead of having to be computed on the fly every time. When you can apply the same approach to data processing that requires a lot of computing time, the savings can be substantial.
Nevertheless, the more complex and time-consuming a view is to execute, the more time the document store will spend updating views whenever a document is added or updated. If the benefit of performing a large computation on the document store is not obvious, then it’s probably better to do it on the application server. The other side of this coin is that if you find that the application server is processing the same data for multiple requests, then perhaps the computation would be best performed by the document store.
To attain high overall performance of the system, one must carefully design the document store views and the application server code that uses them.
- How is my data used: is it frequently created/modified, frequently retrieved, or both?
- Where is processing time better spent: the document store or the application server?
- Do I need to retrieve all the documents, a subset, or event just a computed value?
- Am I adding value or am I writing an elaborate version of SELECT * FROM users; ?