NoSQL vs postgreSQL
To NoSQL or to SQL. When considering which is more practical for modern-day applications, the specificity of the application is what matters. Think about which apps you use most often in your day to day. Social media surely tops it with popular sites like Facebook, Twitter, messenger, or Instagram, Next up is most likely entertainment- Netflix, YouTube, etc. All of these applications use NoSQL in some way.
The other main benefit of NoSQL is its scalability. Think about the scale trajectory of Amazon. When it was initially operating out of a garage, it didn’t have many different distribution centers or customers to handle. If Amazon was still running its services on SQL today, it would be a nightmare to manage. Some may argue that perhaps a company should use SQL when it is just starting out and eventually transition to NoSQL once it reaches a certain size. However when a company is small, NoSQL could also be considered because it is easier to catch mistakes. There are fewer customers, less data, any discrepancies could be spotted.
Another concern is speed. NoSQL is more efficient than SQL in processing data. Its method of clustering different datasets together on different files allows users to parse certain sets of data as opposed to having to search through all known data. There is also a practical example of this.
Consider the New York City bus system. There are over 300 bus routes and 5,800 vehicles. Each of these vehicles has a destination, ID, expected arrival time, scheduled arrival times, latitude, and longitude. Now imagine that every 10 minutes, data is collected on each of these variables; for an entire month. Assuming the best case where there is only ever 1 bus running on each route, in a given 10 minute window, there are 1,800 unique data points generated. With a quick calculation, that produces 259,200 unique data points per day. I recently tried pulling the longitude and latitude data for all buses in the Manhattan area in a single hour on the weekend using postgreSQL.
It took nearly 4 and a half minutes. Imagine if I also wanted to pull that data for buses in Brooklyn. Such a process could have taken 9 minutes. When the same process is attempted on Google’s bigdata table, it runs much faster.
It took only 16 seconds. That’s over 85% more time efficient! These discrepancies are apparent on even smaller data filters. Say I want to filter the longitude and latitude data on buses which run on the M86-SBS route, only when they are in the westbound direction. The SQL filter for that was run.
It took 46 seconds. Certainly faster than the SQL filter of all Brooklyn buses. But could it beat the NoSQL model?
Nope. The NoSQL model took 17 seconds. The NoSQL model was still faster that the SQL model even on a smaller data set. When it comes to processing data, regardless of the size of the dataset, the NoSQL model is always faster.
Speed helps play a critical role when considering use of databases. For researchers who might want to improve New York City’s bus system, it helps process data more efficiently, meaning they can provide insights on future projects before ground is broken. For users doing online shopping or looking at social media, slow retrieval of data can break the users’ experiences.
In summary, the main advantages of NoSQL over SQL are its speed and scalability. It appears that NoSQL is a clear winner over SQL. But are there situations where SQL may be more efficient? Think of situations where data on a single entity should be pulled, and you want to make sure that all the data related to that entity is not only reliable, but also accessible. In those situations SQL would win. What situations are those? The best example is also the one that most impacts everyone: healthcare.
When a patient arrives in the hospital, doctors need access to all of that patients’ history because every piece of data about a patient impacts the type of care they need to receive. In this example, the relational model works better than the NoSQL model.
At a more meta-level, the key difference between choosing SQL vs NoSQL comes down to what you care more about. The individual row of your data or the column. Because NoSQL splits data up its more likely to be fragmented- which isn’t a problem assuming you already know which fragment you want. But if you want to pull all the data related to a single entity, then SQL is the right choice for you.
To revisit the original question of which model to select for a “modern day application”, it depends. Some data is just too important to risk compromising and the relational model would operate as an excellent safeguard. At the same time, NoSQL’s strength lies in its ability to scale up and support large amounts of data, indicating an application which is meant to be quickly accessible by multiple people throughout the day. Those types of applications are used by us the most in a day to day setting, giving NoSQL a slight edge when we imagine a “modern day application”.