An Overview of Databases — Part 2: Non-Relational DB vs Relational

Saeed Vayghani
10 min readJul 24, 2024

--

Part 1: DBMS Flow
Part 2: Non-Relational DB
vs Relational

Part 3: CAP and BASE Theorem

Part 4: How to choose a Database?
Part 5: Different Solutions for Different Problems
Part 6: Concurrency Control
Part 7: Distributed DBMS
>> Part 7.1: Distributed DBMS (Apache Spark, Parquet + Pyspark + Node.js)
Part 8: Clocks
>> Part 8.1: Clocks (Causal Consistency With MongoDB)
>>
Part 8.2: Clocks (MongoDB Replica and Causal Consistency)
Part 9: DB Design Mastery
Part 10: Vector DB
Part 11: An interesting case, coming soon!

We are going to talk about the most important parts of relational and non-relational databases. This will give you a broad understanding. If you want to learn more, you should read reference books. We will not repeat what is in the books here. In the near future, I will write more detailed posts about real challenges and how to solve them.

Relational DBMSs (RDBMSs)

Relational databases are based on the relational model, which organizes data into tables (or relations) consisting of rows and columns. Key characteristics of RDBMS include:

1. Fixed Schema
RDBMS requires a predefined schema, meaning that the structure of data must be defined before storing any data.

2. ACID Properties
RDBMSs provide strong consistency by enforcing ACID (Atomicity, Consistency, Isolation, Durability) properties, ensuring reliable transactions and data integrity.

3. Normalization
Data is organized into tables with relationships established through foreign keys. Normalization reduces data redundancy and improves data integrity.

Main Key Components

  1. Tables (Relations): The primary structure in a relational database. Each table has a unique name and consists of rows (records) and columns (attributes).
  2. Rows (Tuples): Each row represents a single record in the table.
  3. Columns (Attributes): Each column represents a specific attribute of the record, and all entries in a column are of the same data type.
  4. Primary Key: A unique identifier for each record in a table. Ensures that each row can be uniquely identified.
  5. Foreign Key: An attribute in a table that creates a link between two tables. It is a primary key in another table, establishing a relationship between the tables.
  6. Indexes: Indexes are used to speed up the retrieval of data. They can be created on one or more columns of a table.
  7. Views: A view is a virtual table based on the result-set of an SQL query. Views can simplify complex queries, encapsulate data access, and provide a level of security by restricting access to a subset of data.

ERD (Entity-Relationship Diagram):

An ER diagram is a graphical representation of an entity-relationship model. It depicts the entities in the database, their attributes, and the relationships between entities. More on this topic here.

Relationships

  1. One-to-One: A single row in Table A is related to a single row in Table B.
  2. One-to-Many: A single row in Table A is related to multiple rows in Table B.
  3. Many-to-Many: Multiple rows in Table A are related to multiple rows in Table B, typically implemented using a join table.

Normalization

Normalization reduces redundancy and increases data integrity by ensuring that data is stored in a structured manner, following normal forms (NF). Here’s how to normalize:

  1. 1NF (First Normal Form): Ensure that each column contains only atomic, indivisible values and each row is unique.
  2. 2NF (Second Normal Form): Fulfill 1NF and remove partial dependencies, ensuring that non-key attributes are fully functionally dependent on the primary key.
  3. 3NF (Third Normal Form): Fulfill 2NF and remove transitive dependencies, ensuring that non-key attributes are not dependent on other non-key attributes.
  4. Read more in details here

Non-Relational DBs

NoSQL stands for “Not Only SQL.” These databases are designed to handle large amounts of unstructured, semi-structured, or structured data. The main characteristics of NoSQL databases are:

1. Schema-less
NoSQL databases do not require a fixed schema, which means you can store different kinds of data in the same database.

2. Scalability
NoSQL databases are designed to scale out by distributing data across multiple servers. They are horizontally scalable, allowing you to add more servers to handle increased load.

3. Variety of Data Models

  • NoSQL databases support various data models, including key-value, document, columnar, and graph databases.
  • Key-Value Databases: Store data as a collection of key-value pairs. Examples: Redis, DynamoDB.
  • Document Databases: Store data as JSON, BSON, or XML documents. Examples: MongoDB, CouchDB.
  • Column-Family Stores: Store data in columns rather than rows. Examples: Cassandra, HBase.
  • Graph Databases: Store data as nodes and edges, representing entities and their relationships. Examples: Neo4j, OrientDB.

4. High Performance
NoSQL databases are optimized for fast read and write operations, making them suitable for real-time applications and large-scale data processing.

5. Types of Consistency
Many NoSQL databases offer eventual consistency, meaning that data will become consistent over time, as opposed to immediate consistency.

Denormalization

Unlike Relational DBs, most of the times we need to denormalize our data when we are modeling the application using a non-relational DB.
Denormalization is the process of optimizing read performance of a database by adding redundant data. This is basically the opposite of normalization, where the goal is to reduce data redundancy and improve data integrity.

  • Improves read performance by reducing the number of queries needed to retrieve related data.
  • Often used in read-heavy applications where performance is critical.
  • Can lead to data redundancy, which requires careful management to maintain consistency.
  • We will go into more detail in the DB Design Mastery posts.

Key Differences

1. Data Model:

  • NoSQL: Flexible data models (key-value, document, column-family, graph).
  • RDBMS: Strictly table-based with predefined schemas.

2. Schema Flexibility:

  • NoSQL: Schema-less, allowing for dynamic and varying data structures.
  • RDBMS: Fixed schema that must be defined before storing data.

3. Scalability:

  • NoSQL: Horizontally scalable, making it easier to distribute data across multiple servers.
  • RDBMS: Vertically scalable, relying on enhancing the capacity of a single server.

4. Consistency:

  • NoSQL: Eventual consistency, but some NoSQL databases provide options for strong consistency.
  • RDBMS: Strong consistency due to ACID properties.

5. Query Language:

  • NoSQL: No standard query language; varies by implementation (e.g., MongoDB uses MQL, Cassandra uses CQL).
  • RDBMS: Standardized on SQL.

6. Use Cases:

  • NoSQL: Suitable for big data, real-time applications, content management, and scenarios where schema flexibility is needed.
  • RDBMS: Suitable for transactional and analytical systems, where data integrity and complex queries are crucial.

Real World Edge Case of NO SQL DBMSs

NoSQL databases, while flexible and scalable, present certain edge cases and challenges that developers must consider. Here are some key edge cases and considerations:

1. Schema Evolution and Flexibility

  • Issue: NoSQL databases are often schema-less or schema-flexible, allowing for easy evolution of data models.
  • Edge Case: This flexibility can lead to inconsistent data if not managed properly. Different records in the same collection might have varying fields, making data validation and querying complex.

2. Data Consistency

  • Issue: Many NoSQL databases prioritize availability and partition tolerance over strict consistency (following the CAP theorem).
  • Edge Case: In distributed systems, achieving eventual consistency can lead to stale reads, where data retrieved is not the latest. This can be problematic for applications requiring real-time accuracy.

3. Transactional Integrity

  • Issue: Traditional relational databases support ACID (Atomicity, Consistency, Isolation, Durability) transactions.
  • Edge Case: NoSQL databases often provide limited transactional support, sometimes only at the document level (e.g., MongoDB) but not across multiple documents or collections. This can complicate multi-step operations that need to be atomic.

4. Data Duplication and Denormalization

  • Issue: NoSQL databases often encourage denormalization (embedding data) to optimize read performance.
  • Edge Case: Data duplication can lead to update anomalies where a change in one location requires multiple updates across different records. Failure to do so can result in inconsistent data.

5. Query Complexity

  • Issue: NoSQL databases provide powerful but sometimes limited query capabilities compared to SQL.
  • Edge Case: Complex queries, especially those requiring joins, can be difficult to implement efficiently. Some NoSQL databases might not support joins natively, requiring workarounds like manual joins in application code, which can be less efficient and more error-prone.

6. Indexing Constraints

  • Issue: Proper indexing is crucial for performance in any database.
  • Edge Case: NoSQL databases may have limitations on the number of indexes or the types of indexed queries they support. Misconfigured indexes can degrade performance or cause operational issues, such as increased write latency.

7. Scalability and Sharding

  • Issue: NoSQL databases are designed to scale horizontally by adding more nodes.
  • Edge Case: Sharding introduces complexity in data distribution and retrieval. Uneven shard distribution can lead to hotspots, where some shards handle more data and traffic than others, impacting performance.

8. Backup and Restore

  • Issue: Ensuring data durability and recoverability is crucial.
  • Edge Case: Backing up distributed NoSQL databases can be complex due to the distributed nature of data. Coordinating a consistent snapshot across all nodes and ensuring a reliable restore process can be challenging.

9. Security

  • Issue: Securing data in any database is critical.
  • Edge Case: NoSQL databases may offer less mature security features compared to traditional relational databases. Implementing robust authentication, authorization, and encryption can be more complex and require additional configuration and management.

10. Handling Large Documents

  • Issue: NoSQL databases can store large documents or records.
  • Edge Case: Extremely large documents can lead to performance issues during reads and writes. Databases might have limits on document size (e.g., MongoDB’s 16MB document size limit), necessitating careful design to avoid hitting these limits.

11. Migration and Integration

  • Issue: Migrating from a relational database to a NoSQL database, or integrating with other systems, can be complex.
  • Edge Case: Differences in data models and query languages can complicate migration efforts. Ensuring data integrity and consistency during the transition requires careful planning and possibly custom tooling.

12. Time-to-Live (TTL) and Data Expiry

  • Issue: Some NoSQL databases support TTL indexes to automatically expire data.
  • Edge Case: Incorrectly configured TTL settings can lead to unintentional data loss. Understanding and monitoring TTL behavior is crucial to ensure important data is not inadvertently deleted.

Recommending some references

A Music Streaming Service on PostgreSQL and MongoDB

Version 1: Without Normalization

In this version, data is not normalized, which means there may be redundancy and potential inconsistencies.

CREATE TABLE Users (
user_id SERIAL PRIMARY KEY,
username VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
password VARCHAR(255) NOT NULL,
subscription_level VARCHAR(50)
);
CREATE TABLE Artists (
artist_id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
genre VARCHAR(255),
bio TEXT,
albums JSONB -- Storing albums as a JSON object
);
CREATE TABLE Albums (
album_id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
release_date DATE,
artist_info JSONB -- Storing artist info as a JSON object
);
CREATE TABLE Tracks (
track_id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
duration INTEGER,
album_info JSONB,
artist_info JSONB
);
CREATE TABLE Playlists (
playlist_id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
user_id INTEGER REFERENCES Users(user_id),
tracks JSONB
);
CREATE TABLE UserHistory (
user_id INTEGER REFERENCES Users(user_id),
track_info JSONB,
played_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (user_id, track_info, played_at)
);

Version 2: With Normalization

CREATE TABLE Users (
user_id SERIAL PRIMARY KEY,
username VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
password VARCHAR(255) NOT NULL,
subscription_level VARCHAR(50)
);

A Music Streaming Service on MongoDB

Lets take a look into a sample, I presume we have a music streaming service and we want to model the early version of the system, Please note when we talk about (semi)unstructured data, it is up to you how to design your model, maybe you want to embed user’s playing history into the user model directly, maybe you just want to add references. There will be no classic normalization. We will go into more detail in the DB Design Mastery posts.

Users:
- _id: user1
username: "john_doe"
email: "john_doe@example.com"
password: "hashed_password"
subscription_level: "premium"
listened_tracks:
- track_id: track1
played_at: "2023-10-01T12:30:00Z"
- track_id: track2
played_at: "2023-10-02T14:45:00Z"
preferences:
favorite_genres: ["Rock", "Progressive Rock"]
notification_settings:
email: true
push: false

Artists:
- _id: artist1
name: "Pink Floyd"
genre: ["Rock", "Progressive Rock"]
bio: "Pink Floyd was an English rock band formed in London in 1965."
albums:
- album_id: album1
title: "The Dark Side of the Moon"
release_date: "1973-03-01"
- album_id: album2
title: "The Wall"
release_date: "1979-11-30"

Albums:
- _id: album1
title: "The Dark Side of the Moon"
release_date: "1973-03-01"
artist:
id: artist1
name: "Pink Floyd"
tracks:
- track_id: track1
title: "Time"
duration: 412
- track_id: track2
title: "Money"
duration: 382
- _id: album2
title: "The Wall"
release_date: "1979-11-30"
artist:
id: artist1
name: "Pink Floyd"
tracks:
- track_id: track3
title: "Another Brick in the Wall, Part 2"
duration: 235
- track_id: track4
title: "Comfortably Numb"
duration: 384

Playlists:
- _id: playlist1
name: "Pink Floyd Favorites"
description: "A collection of my favorite Pink Floyd songs."
user_id: user1
tracks:
- track_id: track1
added_at: "2023-09-30T12:00:00Z"
- track_id: track3
added_at: "2023-09-30T12:30:00Z"
shared_with:
- user_id: user2
- user_id: user3

Tracks:
- _id: track1
title: "Time"
duration: 412
album:
id: album1
title: "The Dark Side of the Moon"
artist:
id: artist1
name: "Pink Floyd"
- _id: track2
title: "Money"
duration: 382
album:
id: album1
title: "The Dark Side of the Moon"
artist:
id: artist1
name: "Pink Floyd"
- _id: track3
title: "Another Brick in the Wall, Part 2"
duration: 235
album:
id: album2
title: "The Wall"
artist:
id: artist1
name: "Pink Floyd"
- _id: track4
title: "Comfortably Numb"
duration: 384
album:
id: album2
title: "The Wall"
artist:
id: artist1
name: "Pink Floyd"

--

--

Saeed Vayghani

Software engineer and application architecture. Interested in free and open source software.