Week 6.3: SQL or noSQL?

Published in

Kent’s CS3216 Blog

5 min readOct 23, 2016

[This is a long overdued blog post, serving as the 3rd part of the series on Happ, our Assignment 3’s project. Previous part is here.]

I’ve been asked this questions oh so many times but I could never give a definitive satisfying answer. So today I will try to provide some comparisons between the two, and towards the end I’ll briefly explain which should be the more suitable database for Happ.

Let’s clarify a few myths first.

Myth #1: noSQL is better/worse than SQL

No, neither of them is.

Some projects are better suited to using a relational SQL database. Some need the flexibility of noSQL. Some can even use both for more specialized purposes.

Nowadays modern application frameworks can be configured to readily use multiple databases in a matter of minutes.

Myth #2: NoSQL = No SQL

Rather than being limited to tables in relational databases, there are various ways for NoSQL databases to store data. Therefore it’s preferable to understand NoSQL as Not Only SQL.

Here are the main 5 categories, with some of the more well-known representatives:

Document: MongoDB, CouchDB, Firebase Database
Key-value: Redis, Firebase Remote Config, Aerospike (very interesting key-value storage that utilize the zero-copy capability of SSDs to maximize reading/writing speed)
Column: HBase (familiar to those working with big data)
Graph: Neo4j. Those peeps taking CS3233 — Competitive Programming will know graphs are cool.
Multi-model: databases that support at least two models, like FoundationDB.

All these NoSQL variants further illustrate the point that there are multiple types of database that fit different requirements, so the question “SQL or noSQL” without a specific context is just meaningless.

Myth #3: The “C” in ACID and CAP is the same

That’s actually a pretty common misconception. Let’s step back for some theoretical juicy bits first.

Here are definitions, along with the corresponding links for terms, exactly as Wikipedia list on the pages:

ACID = Atomicity, Consistency, Isolation, Durability
CAP = Consistency, Availability, Partition tolerance

Notice how Consistency in both abbreviations points to the same link https://en.wikipedia.org/wiki/Consistency_(database_systems)? Ouch. In reality, they are two completely different concepts.

Consistency, in the ACID sense, means that the database will make sure data will follow whatever rules and constraints defined. In the relational world, it means integrity constraints must be fulfilled. Some examples would be:

Setting NOT NULL for a given field.
An INTEGER field is guaranteed to not allow string data.
A foreign key (referential integrity constraint) that prevents you from deleting a row in table department if table employee has records that refer to a department row.

NoSQL databases do not support integrity constraints. Referential and other types of integrity constraint enforcement must be manually built into the application itself. Thus these databases can never be fully ACID compliant; strong consistency is traded for higher scalability, better performance for huge stream of data, and continuous availability.

MongoDB Fun fact #1: MongoDB does not provide ACID consistency, even for the documentation level, in direct contrast to what they claim in the official FAQ.

However, when you hear NoSQL people talking about consistency, most likely they are not talking about the “C” in ACID, but rather about a limitation that a distributed database has to overcome.

A distributed database normally replicates a piece of data to multiple machines. When that piece of information is updated on one of the machine, it will take an inevitable time lag (usually milliseconds) before the updated notice reaches all other machine instances. Although the lag is small, it’s not negligible: there’s a possibility that you can get old information that hasn’t been updated on a replica. Traditionally in the relational world, you might have heard of this as a dirty read.

MongoDB Fun fact #2: To advocate their misleading ACID-compliant claim, the MongoDB FAQ instead brings out distributed consistency.

Some other authors would suggest linearizability as a better alternative for distributed consistency, but it’s okay as long as we know clearly what we’re talking about. Distributed consistency, along with two other aspects, forms the essence of the CAP theorem.

The CAP theorem states that in a distributed system, you can simultaneously provide at most two out of three guarantees Consistency, Availability or Partition tolerance. Partition tolerance is not a choice we can trade off, as explained quite thoroughly here:

For a distributed (i.e., multi-node) system to not require partition-tolerance it would have to run on a network which is guaranteed to never drop messages (or even deliver them late) and whose nodes are guaranteed to never die. You and I do not work with these types of systems because they don’t exist.

Some NoSQL databases choose to sacrifice distributed consistency for availability, making them AP. Nevertheless, they do provide an alternative (weaker) guarantee known as eventual consistency, which simply means the replicas will share the same data after some finite time duration.

Here’s a nice visual guide that summaries versions of NoSQL and how they fit in CAP:

The case of Happ

Happ, being a location-based app, needs to store a large number of geospatial objects. In fact, there are several geospatial solutions for both relational and noSQL alike:

PostGIS for PostgreSQL
Geospatial indexes for MongoDB
Geodis for Redis

and many more.

Initially we chose PostgreSQL + PostGIS for Happ’s MVP (to reuse codes from earlier projects and save time), but in retrospect noSQL would be a better choice.

The most important objects in Happ are happs (user-contributed posts) and events. Although they share the same purpose of providing information to place markers on the map, their main attributes are different. So in our database design, we extracted the two objects’ common attributes into another table: thing. Each row from tables happ and event must refer to a thing.

Admittedly, it was difficult to implement this in the backend, but luckily the design benefited the frontend a lot. For example, users can vote or comment on both happs and events. With this abstraction, it was possible to defer these actions to the more abstract concept thing (e.g. /thing/vote and /thing/comment), in a way standardizing the API.

As we will possibly continue development of Happ after the semester ends, we might consider replacing PostgreSQL with a document-based alternative as the main database. This way, we can store both happs and events under the same thing collection, simplifying backend implementation.

However, our time playing with PostgreSQL did give us some insights of the database, which will be the topic of my next blog post in this series. Stay tuned.

Week 6.3: SQL or noSQL?

The case of Happ

Written by Kent Nguyen