Learning NoSQL — NoSQL Database Designing
This article covers the most important topic in NoSQL as how to efficiently structure collections/documents.
In part-I, SQL vs NoSQL a basic concept was introduced along with the MongoDB fundamentals. This article covers the most controversial, debatable and important topic in NoSQL.
Modeling in NoSQL vs SQL:
I will try to explain a concept with multiple examples. Modeling scenarios may differ in situations depending on requirements. In modeling a schema, it must be noted that MongoDB manages documents with a maximum size of 16MB.
In NoSQL, either define a collection of nested objects or multiple collections where each contains a simple object definition. In the following few cases, we discuss SQL to NoSQL schema transformation.
Case A: User Address:
Design Explanation:
- The address has been moved as an array is flexible enough to accommodate additional entries in the future.
- Each item in the address array has a limited number of fields, no need to create a separate collection.
Case B: Region, Country, State, and City Model — Approach I:
Design Explanation:
- 1-M relationships are advised to handle in a single model where data size does not increase drastically. Therefore region, country, and state are handled in a single document which is kind of a defined set of data.
- Cities are kept in a separate collection for 2 reasons:
- For some countries, The number of cities is huge and this will make document size much bigger.
- To avoid region unnecessary nested and complex modeling.
Schema - A vs B — Better Query Performance:
- The city collection provides wider range of options to query data from region collection efficiently and faster, based on “region_id”, “country_id” and “state_id”.
- Cities can be grouped easily based on “region_id”, “country_id” or “state_id” more efficiently.
Case C: Region, Country, State, and City Model — Approach II:
Design Explanation:
- This schema has less nested objects therefor more clarity
- It is much easier for applications such as NodeJS apps to handle queries based on predefined schemas.
Schema — A vs B — Better Query Performance:
- B does contain redundancy but in NoSQL, performance is preferred to normalization.
- The data in the above example is relatively limited, but consider a business case where millions of documents are involved, the performance of Schema-A is no match for Schema-B throughput.
Case Online Users, Product and Orders:
Country and city references are ommited in the following address schema for the purpose of simplicity, focusing on M:M relationship:
Design Explanation:
- M:M relationship of order and product is handled only by one collection.
Conclusion:
- Do not over complicate a structure by extensive nested objects
- (1:M) & (M:M) relationships are easier to transform into a single collection if document sizes are reasonable.
- Split nested objects into separate collections, either for simplicity or size
- Normalization is not a priority in NoSQL. Its power comes with its flexibility. NoSQL is the best option for unstructured data, its focus is on how to respond with quick throughput.
Upcoming:
Next article focuses on the MongoDB and few tools installation to start exploring about MongoDB in detail.