NoSQL Storage
Visit systemdesign.us for System Design Interview Questions tagged by companies and their Solutions. Follow us on YouTube, Facebook, LinkedIn, Twitter, Medium, Notion, Quora
A NoSQL data storage system is a type of database that stores and serves data in a relatively unstructured or loosely structured format. NoSQL databases are purpose-built for specific data models and largely have very flexible schemas.
Development of NoSQL databases has been a natural consequence of proliferation of distributed systems in general. NoSQL databases which are a subset of distributed systems derive a lot of advantages and disadvantages from similar concepts that drive the design of a distributed system. Another larger trend that helped the emergence of NoSQL is lower cost of storage. SQL databases were optimized for reducing data duplication, which made them very complex and harder to maintain. NoSQL databases are simpler in comparison and development happens relatively faster.
What exactly does NoSQL mean?
To be completely truthful, NoSQL is a marketing term more than it is a technical one. These databases are not new, but they have become far more popular than they ever were. NoSQL means some combination of non-SQL, non-relational or not-only-SQL depending on the context. But the main purpose of this term is to let users know that these databases are different than relational or SQL databases (RDBMS) that have traditionally been more popular.
SQL or Structured Query Language is a language used to interact with a relational database system. And data stored in these RDBMS is also structured. In contrast, NoSQL databases do not rely on rigid structures to store data and can have a variety of flexible data models. This has helped many applications that want to store their unstructured data in a more performant, scalable and flexible manner. The need for such a database has arisen because of rapid growth in unstructured data like user chats, messaging, time series and large data blobs like videos and images.
NoSQL can store structured data and can also be queried using a SQL. So the name NoSQL sounds like a misnomer. This is important to know because there are projects now that use NoSQL and relational databases together.
How does a NoSQL database work?
As a very high level generalization, a NoSQL database is a document oriented database where JSON is the default format for storing information. This helps is reducing the dependency on design of object oriented schema and also removes the overhead of avoiding data duplication. Overall this helps in simplification of application development.
In practice, NoSQL databases use a variety of data models for storing, maintaining and accessing information. They are optimized for storing massive amounts of data with very low latency. A lot of SQL databases cannot achieve this without significant customizations. NoSQL databases achieve this through use of data models that are flexible and relaxing some of the restrictions like consistency.
Data can be stored in NoSQL databases without defining the complete and detailed schema upfront. This provides the ability to move and iterate software development quickly, as the schema can be defined as you go. This also provides flexibility in terms of how the data is queried.
What are the different types of NoSQL database?
There are four most popular types of NoSQL databases today. These types have been developed mostly by differentiating the format in which information is stored.
- Document — Data is stored in documents in JSON like objects. JSON is a data format defined by pairs of fields and values. As with JSON, the values in these documents can take variety of formats that are being used by the developers within the application. Specific fields in the JSON can be indexed if required.
- Key-Value — These are very simple to work with since each item in the database is a key-value pair. Conceptually this can be thought of as a distributed HashMap, where a value can be retrieved only through the key under which it is stored. Clearly, the queries that you can run on this type of database are generally very simple.
- Wide Column — This type conceptually stores data similar to RDBMS, but slightly different. Data is stored in tables, however each row is not required to have the same set of columns, kind of like a two-dimensional key-value stores. The schema for these is designed primarily around the data you want to retrieve with each query.
- Graph — Data here is generally best represented in a Graph. The database stores this data using graph node and query relationships between that data using edges. This is very useful in visualizing, analyzing and finding relationships between different pieces of the graph data.
What are the benefits of using NoSQL database?
Following are some of the benefits of using a NoSQL database. Although they might vary depending of the type of the NoSQL database, it is a good starting point in understanding trade offs.
- Flexibility — The greatest benefit NoSQL provides is a lot of flexibility in terms of schema definition and data retrieval. It allows people to freely and rapidly innovate their application. This allows the complexity of the schema to be matched with the complexity required for the application, and developers are not bogged down trying to adhere to constraints of the database.
- Scalability — Most NoSQL databases are inherently designed to be horizontally scalable. This means adding more hardware generally is enough to linearly scale out the system for more users, albeit the query patterns have been designed correctly.
- Performance — Since NoSQL databases are horizontally linearly scalable, fast and predictable performance can be maintained as the data and traffic into the system increases. In most cases this can be achieved through automatic provisioning of resources and does not require manual intervention like sharding.
- Availability — NoSQL databases are distributed and data can be replicated as and when it becomes available. The replication can be across multiple servers, data centers or even cloud providers. This distribution reduces the latency for users.
- Functionality — Newer distributed applications have seen unprecedented scale in terms number of users, amount of data stored and bandwidth required for things like video and game streaming. NoSQL is an ideal choice for a lot of these applications.
What are the downsides of using NoSQL database?
As we’ll learn again and again while working with distributed systems, everything comes at a cost. Here are a few of the downsides of working with NoSQL database.
- Relaxed ACID constraints — The main benefit of a NoSQL database, which is a distributed nature of the database, does not allow ACID properties to be maintained easily. This makes a lot of the operations (like handling transactions) using NoSQL quite challenging.
- Complexity — As the application becomes more complex, developers have to worry about managing this complexity. For example, queries that require multiple joins across multiple different tables.
- Responsibility — In traditional SQL databases, the complex and optimized database schema allows the database engine to optimize certain queries and data storage. In NoSQL databases, users must be aware of these issues that might occur and optimize the design and queries on the application side.
Other useful links:
- https://www.oracle.com/database/nosql/what-is-nosql/
- https://www.ibm.com/cloud/learn/nosql-databases
- https://aws.amazon.com/nosql/
- https://www.couchbase.com/resources/why-nosql
- https://www.ibm.com/cloud/blog/sql-vs-nosql
Visit systemdesign.us for System Design Interview Questions tagged by companies and their Solutions. Follow us on YouTube, Facebook, LinkedIn, Twitter, Medium, Notion, Quora