An Overview of Databases — Part 2: Non-Relational DB vs Relational

10 min readJul 24, 2024

Part 1: DBMS Flow
Part 2: Non-Relational DB vs Relational
Part 3: CAP and BASE Theorem
Part 4: How to choose a Database?
Part 5: Different Solutions for Different Problems
Part 6: Concurrency Control
Part 7: Distributed DBMS
>> Part 7.1: Distributed DBMS (Apache Spark, Parquet + Pyspark + Node.js)
Part 8: Clocks
>> Part 8.1: Clocks (Causal Consistency With MongoDB)
>> Part 8.2: Clocks (MongoDB Replica and Causal Consistency)
Part 9: DB Design Mastery
Part 10: Vector DB
Part 11: An interesting case, coming soon!

We are going to talk about the most important parts of relational and non-relational databases. This will give you a broad understanding. If you want to learn more, you should read reference books. We will not repeat what is in the books here. In the near future, I will write more detailed posts about real challenges and how to solve them.

Relational DBMSs (RDBMSs)

Relational databases are based on the relational model, which organizes data into tables (or relations) consisting of rows and columns. Key characteristics of RDBMS include:

1. Fixed Schema
RDBMS requires a predefined schema, meaning that the structure of data must be defined before storing any data.

2. ACID Properties
RDBMSs provide strong consistency by enforcing ACID (Atomicity, Consistency, Isolation, Durability) properties, ensuring reliable transactions and data integrity.

3. Normalization
Data is organized into tables with relationships established through foreign keys. Normalization reduces data redundancy and improves data integrity.

Main Key Components

Tables (Relations): The primary structure in a relational database. Each table has a unique name and consists of rows (records) and columns (attributes).
Rows (Tuples): Each row represents a single record in the table.
Columns (Attributes): Each column represents a specific attribute of the record, and all entries in a column are of the same data type.
Primary Key: A unique identifier for each record in a table. Ensures that each row can be uniquely identified.
Foreign Key: An attribute in a table that creates a link between two tables. It is a primary key in another table, establishing a relationship between the tables.
Indexes: Indexes are used to speed up the retrieval of data. They can be created on one or more columns of a table.
Views: A view is a virtual table based on the result-set of an SQL query. Views can simplify complex queries, encapsulate data access, and provide a level of security by restricting access to a subset of data.

ERD (Entity-Relationship Diagram):

An ER diagram is a graphical representation of an entity-relationship model. It depicts the entities in the database, their attributes, and the relationships between entities. More on this topic here.

Relationships

One-to-One: A single row in Table A is related to a single row in Table B.
One-to-Many: A single row in Table A is related to multiple rows in Table B.
Many-to-Many: Multiple rows in Table A are related to multiple rows in Table B, typically implemented using a join table.

Normalization

Normalization reduces redundancy and increases data integrity by ensuring that data is stored in a structured manner, following normal forms (NF). Here’s how to normalize:

1NF (First Normal Form): Ensure that each column contains only atomic, indivisible values and each row is unique.
2NF (Second Normal Form): Fulfill 1NF and remove partial dependencies, ensuring that non-key attributes are fully functionally dependent on the primary key.
3NF (Third Normal Form): Fulfill 2NF and remove transitive dependencies, ensuring that non-key attributes are not dependent on other non-key attributes.
Read more in details here

Non-Relational DBs

NoSQL stands for “Not Only SQL.” These databases are designed to handle large amounts of unstructured, semi-structured, or structured data. The main characteristics of NoSQL databases are:

1. Schema-less
NoSQL databases do not require a fixed schema, which means you can store different kinds of data in the same database.

2. Scalability
NoSQL databases are designed to scale out by distributing data across multiple servers. They are horizontally scalable, allowing you to add more servers to handle increased load.

3. Variety of Data Models

NoSQL databases support various data models, including key-value, document, columnar, and graph databases.
Key-Value Databases: Store data as a collection of key-value pairs. Examples: Redis, DynamoDB.
Document Databases: Store data as JSON, BSON, or XML documents. Examples: MongoDB, CouchDB.
Column-Family Stores: Store data in columns rather than rows. Examples: Cassandra, HBase.
Graph Databases: Store data as nodes and edges, representing entities and their relationships. Examples: Neo4j, OrientDB.

4. High Performance
NoSQL databases are optimized for fast read and write operations, making them suitable for real-time applications and large-scale data processing.

5. Types of Consistency
Many NoSQL databases offer eventual consistency, meaning that data will become consistent over time, as opposed to immediate consistency.

Denormalization

Unlike Relational DBs, most of the times we need to denormalize our data when we are modeling the application using a non-relational DB.
Denormalization is the process of optimizing read performance of a database by adding redundant data. This is basically the opposite of normalization, where the goal is to reduce data redundancy and improve data integrity.

Improves read performance by reducing the number of queries needed to retrieve related data.
Often used in read-heavy applications where performance is critical.
Can lead to data redundancy, which requires careful management to maintain consistency.
We will go into more detail in the DB Design Mastery posts.

Key Differences

1. Data Model:

NoSQL: Flexible data models (key-value, document, column-family, graph).
RDBMS: Strictly table-based with predefined schemas.

2. Schema Flexibility:

NoSQL: Schema-less, allowing for dynamic and varying data structures.
RDBMS: Fixed schema that must be defined before storing data.

3. Scalability:

NoSQL: Horizontally scalable, making it easier to distribute data across multiple servers.
RDBMS: Vertically scalable, relying on enhancing the capacity of a single server.

4. Consistency:

NoSQL: Eventual consistency, but some NoSQL databases provide options for strong consistency.
RDBMS: Strong consistency due to ACID properties.

5. Query Language:

NoSQL: No standard query language; varies by implementation (e.g., MongoDB uses MQL, Cassandra uses CQL).
RDBMS: Standardized on SQL.

6. Use Cases:

NoSQL: Suitable for big data, real-time applications, content management, and scenarios where schema flexibility is needed.
RDBMS: Suitable for transactional and analytical systems, where data integrity and complex queries are crucial.

Real World Edge Case of NO SQL DBMSs

NoSQL databases, while flexible and scalable, present certain edge cases and challenges that developers must consider. Here are some key edge cases and considerations:

1. Schema Evolution and Flexibility

Issue: NoSQL databases are often schema-less or schema-flexible, allowing for easy evolution of data models.
Edge Case: This flexibility can lead to inconsistent data if not managed properly. Different records in the same collection might have varying fields, making data validation and querying complex.

2. Data Consistency

Issue: Many NoSQL databases prioritize availability and partition tolerance over strict consistency (following the CAP theorem).
Edge Case: In distributed systems, achieving eventual consistency can lead to stale reads, where data retrieved is not the latest. This can be problematic for applications requiring real-time accuracy.

3. Transactional Integrity

Issue: Traditional relational databases support ACID (Atomicity, Consistency, Isolation, Durability) transactions.
Edge Case: NoSQL databases often provide limited transactional support, sometimes only at the document level (e.g., MongoDB) but not across multiple documents or collections. This can complicate multi-step operations that need to be atomic.

4. Data Duplication and Denormalization

Issue: NoSQL databases often encourage denormalization (embedding data) to optimize read performance.
Edge Case: Data duplication can lead to update anomalies where a change in one location requires multiple updates across different records. Failure to do so can result in inconsistent data.

5. Query Complexity

Issue: NoSQL databases provide powerful but sometimes limited query capabilities compared to SQL.
Edge Case: Complex queries, especially those requiring joins, can be difficult to implement efficiently. Some NoSQL databases might not support joins natively, requiring workarounds like manual joins in application code, which can be less efficient and more error-prone.

6. Indexing Constraints

Issue: Proper indexing is crucial for performance in any database.
Edge Case: NoSQL databases may have limitations on the number of indexes or the types of indexed queries they support. Misconfigured indexes can degrade performance or cause operational issues, such as increased write latency.

7. Scalability and Sharding

Issue: NoSQL databases are designed to scale horizontally by adding more nodes.
Edge Case: Sharding introduces complexity in data distribution and retrieval. Uneven shard distribution can lead to hotspots, where some shards handle more data and traffic than others, impacting performance.

8. Backup and Restore

Issue: Ensuring data durability and recoverability is crucial.
Edge Case: Backing up distributed NoSQL databases can be complex due to the distributed nature of data. Coordinating a consistent snapshot across all nodes and ensuring a reliable restore process can be challenging.

9. Security

Issue: Securing data in any database is critical.
Edge Case: NoSQL databases may offer less mature security features compared to traditional relational databases. Implementing robust authentication, authorization, and encryption can be more complex and require additional configuration and management.

10. Handling Large Documents

Issue: NoSQL databases can store large documents or records.
Edge Case: Extremely large documents can lead to performance issues during reads and writes. Databases might have limits on document size (e.g., MongoDB’s 16MB document size limit), necessitating careful design to avoid hitting these limits.

11. Migration and Integration

Issue: Migrating from a relational database to a NoSQL database, or integrating with other systems, can be complex.
Edge Case: Differences in data models and query languages can complicate migration efforts. Ensuring data integrity and consistency during the transition requires careful planning and possibly custom tooling.

12. Time-to-Live (TTL) and Data Expiry

Issue: Some NoSQL databases support TTL indexes to automatically expire data.
Edge Case: Incorrectly configured TTL settings can lead to unintentional data loss. Understanding and monitoring TTL behavior is crucial to ensure important data is not inadvertently deleted.

Recommending some references

A Music Streaming Service on PostgreSQL and MongoDB

Version 1: Without Normalization

In this version, data is not normalized, which means there may be redundancy and potential inconsistencies.

CREATE TABLE Users (
    user_id SERIAL PRIMARY KEY,
    username VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    password VARCHAR(255) NOT NULL,
    subscription_level VARCHAR(50)
);
CREATE TABLE Artists (
    artist_id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    genre VARCHAR(255),
    bio TEXT,
    albums JSONB -- Storing albums as a JSON object
);
CREATE TABLE Albums (
    album_id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    release_date DATE,
    artist_info JSONB -- Storing artist info as a JSON object
);
CREATE TABLE Tracks (
    track_id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    duration INTEGER,
    album_info JSONB,
    artist_info JSONB
);
CREATE TABLE Playlists (
    playlist_id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    user_id INTEGER REFERENCES Users(user_id),
    tracks JSONB
);
CREATE TABLE UserHistory (
    user_id INTEGER REFERENCES Users(user_id),
    track_info JSONB,
    played_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, track_info, played_at)
);

Version 2: With Normalization

CREATE TABLE Users (
    user_id SERIAL PRIMARY KEY,
    username VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    password VARCHAR(255) NOT NULL,
    subscription_level VARCHAR(50)
);

A Music Streaming Service on MongoDB

Lets take a look into a sample, I presume we have a music streaming service and we want to model the early version of the system, Please note when we talk about (semi)unstructured data, it is up to you how to design your model, maybe you want to embed user’s playing history into the user model directly, maybe you just want to add references. There will be no classic normalization. We will go into more detail in the DB Design Mastery posts.

Users:
  - _id: user1
    username: "john_doe"
    email: "john_doe@example.com"
    password: "hashed_password"
    subscription_level: "premium"
    listened_tracks:  
      - track_id: track1
        played_at: "2023-10-01T12:30:00Z"
      - track_id: track2
        played_at: "2023-10-02T14:45:00Z"
    preferences:     
      favorite_genres: ["Rock", "Progressive Rock"]
      notification_settings: 
        email: true
        push: false

Artists:
  - _id: artist1
    name: "Pink Floyd"
    genre: ["Rock", "Progressive Rock"] 
    bio: "Pink Floyd was an English rock band formed in London in 1965."
    albums:                  
      - album_id: album1
        title: "The Dark Side of the Moon"
        release_date: "1973-03-01"
      - album_id: album2
        title: "The Wall"
        release_date: "1979-11-30"

Albums:
  - _id: album1
    title: "The Dark Side of the Moon"
    release_date: "1973-03-01"
    artist: 
      id: artist1
      name: "Pink Floyd"
    tracks:                      
      - track_id: track1
        title: "Time"
        duration: 412
      - track_id: track2
        title: "Money"
        duration: 382
  - _id: album2
    title: "The Wall"
    release_date: "1979-11-30"
    artist: 
      id: artist1
      name: "Pink Floyd"
    tracks:                    
      - track_id: track3
        title: "Another Brick in the Wall, Part 2"
        duration: 235
      - track_id: track4
        title: "Comfortably Numb"
        duration: 384

Playlists:
  - _id: playlist1
    name: "Pink Floyd Favorites"
    description: "A collection of my favorite Pink Floyd songs."
    user_id: user1
    tracks:                     
      - track_id: track1
        added_at: "2023-09-30T12:00:00Z"
      - track_id: track3
        added_at: "2023-09-30T12:30:00Z"
    shared_with:              
      - user_id: user2
      - user_id: user3

Tracks:
  - _id: track1
    title: "Time"
    duration: 412
    album:
      id: album1
      title: "The Dark Side of the Moon"
    artist:
      id: artist1
      name: "Pink Floyd"
  - _id: track2
    title: "Money"
    duration: 382
    album:
      id: album1
      title: "The Dark Side of the Moon"
    artist:
      id: artist1
      name: "Pink Floyd"
  - _id: track3
    title: "Another Brick in the Wall, Part 2"
    duration: 235
    album:
      id: album2
      title: "The Wall"
    artist:
      id: artist1
      name: "Pink Floyd"
  - _id: track4
    title: "Comfortably Numb"
    duration: 384
    album:
      id: album2
      title: "The Wall"
    artist:
      id: artist1
      name: "Pink Floyd"