MongoDB Schema Design : Guidelines

Published in

PeerIslands Engineering Blogs

10 min readFeb 13, 2023

When designing a MongoDB schema, it’s important to consider several key factors to ensure your schema is efficient, scalable, and flexible. Here are some general guidelines and limitations to keep in mind:

Guidelines

Denormalize data: MongoDB is a NoSQL database, meaning that it’s designed to store semi-structured data. In many cases, this means denormalizing data so that related information is stored together in the same document. This can result in faster query performance and reduced data duplication
Choose the right data type: MongoDB supports a variety of data types, including strings, numbers, dates, arrays, and nested documents. Choose the data type that is most appropriate for the information you are storing, and consider using arrays or nested documents to store related information.
Use embedded documents: Embedding related data within a single document is a key feature of MongoDB. This can make it easier to query and update related information, and also helps reduce the need for joins
Consider data modeling patterns: MongoDB supports a variety of data modeling patterns, including embedded data, referenced data, and hybrid approaches. Consider which pattern will work best for your use case and choose accordingly.
Use indexes: MongoDB uses indexes to improve query performance. Make sure to create indexes on the fields that you will be using for queries, and consider using a compound index if you will be querying on multiple fields.
Use sharding: Sharding is a feature in MongoDB that allows you to distribute your data across multiple servers. This can be useful for improving performance and scalability, especially for large datasets.
Plan for scalability: MongoDB is designed to be scalable, but it’s important to plan for scalability when designing your schema. Consider how your schema will need to change as your data grows and your needs evolve, and design your schema accordingly.

Limitations

Size Limitations: MongoDB documents have a maximum size limit of 16 MB by default. When designing your schema, it’s important to keep in mind the size of the data that you’ll be storing in each document and to split documents into smaller chunks if necessary.
Index Limitation: MongoDB has limits on the number of indexes that can be created on a single collection. When designing your schema, it’s important to consider the number of indexes that you’ll need and to optimize your schema to reduce the number of indexes required.
Array Limitation: MongoDB supports arrays, but there are limitations on the size of arrays that can be stored in a single document.The maximum size of a single document in MongoDB is 16 MB, so it’s important to ensure that your arrays do not exceed this limit. When storing large arrays, it’s recommended to split the arrays into smaller chunks or to store them as separate documents.
Date and Time Precision: MongoDB stores dates and times with a precision of milliseconds. If you need to store dates and times with a higher precision, you need to store the dates and times as a string or a custom data type.
String Limitation: The maximum size of a string in MongoDB is limited by the maximum size of a document. If you need to store large strings, it’s recommended to store them as files rather than as strings in the database. This can help reduce the amount of memory used by the database and improve query performance.
Time Zone Support: MongoDB stores dates and times in the UTC time zone by default. If you need to store dates and times in a different time zone, you need to convert the dates and times to UTC before storing them in the database.

Schema Models

In MongoDB, there are several common schema models that you can use to structure your data:

Flat Model: In this model, all fields are stored in a single document. This model is best for small datasets with few fields.
Embedded Model: In this model, related data is stored as an embedded document within the main document. This model is best for small to medium-sized datasets with a one-to-one or one-to-few relationship between the data.
Referenced Model: In this model, related data is stored in separate documents and referenced by the main document. This model is best for large datasets with a one-to-many relationship between the data.
Hybrid Model: In this model, data is stored using a combination of embedded and referenced documents. This model is best for datasets with a mixture of one-to-one, one-to-few, and one-to-many relationships between the data.

Lets look at the Embedded & Referenced Model in detail

Embedded Models

Mongo Schema : Embedded Document

Embedded data is a data modeling pattern in MongoDB where related information is stored within a single document. This can help reduce the need for joins and improve query performance, as all the related information is stored together in the same document.

For example, consider a simple blog application that needs to store information about authors and their articles. One way to model this data in MongoDB is to store each article and its associated author information within a single document. This way, when you query for an article, you can retrieve both the article and its author information in a single request, without having to perform a join.

Here is an example of what a document in this schema might look like:

{
  "_id": ObjectId("5f9c8b1f2b4953f7cc6a0d20"),
  "title": "A Guide to MongoDB Schema Design",
  "author": {
    "name": "John Doe",
    "email": "john.doe@example.com",
    "bio": "John Doe is a software engineer with a passion for databases."
  },
  "content": "In this guide, we will explore the best practices for designing a MongoDB schema..."
}

As you can see, the author information is stored within the same document as the article, as a nested document. This makes it easier to retrieve both the article and author information in a single query and also reduces the need for joins.

It’s important to note that while embedded data can be a useful data modeling pattern in MongoDB, it’s not always the best option for every use case. In some cases, it may be better to store related information in separate documents and use references to connect the documents.

Mongo Schema : Embedded Array Documents

An embedded array is a data modeling pattern in MongoDB where an array of related information is stored within a single document. This can be useful when you need to store multiple instances of a related entity within a single document.

For example, consider a simple blog application that needs to store information about authors and their articles. If an author can have multiple articles, you could store each article as an element in an array within the author document.

Here is an example of what the author document might look like:

{
  "_id": ObjectId("5f9c8b1f2b4953f7cc6a0d21"),
  "name": "John Doe",
  "email": "john.doe@example.com",
  "bio": "John Doe is a software engineer with a passion for databases.",
  "articles": [
    {
      "title": "A Guide to MongoDB Schema Design",
      "content": "In this guide, we will explore the best practices for designing a MongoDB schema..."
    },
    {
      "title": "Introduction to NoSQL Databases",
      "content": "NoSQL databases are a type of database that differ from traditional relational databases..."
    }
  ]
}

As you can see, the author document contains an embedded array of articles, where each element in the array represents an article. This can help reduce the need for joins and improve query performance as all the related information is stored together in the same document.

It’s important to note that while embedded arrays can be a useful data modeling pattern in MongoDB, they are not suitable for every use case. For example, if an article needs to be shared across multiple authors, it would be better to store the article as a separate document and use references to connect the articles and authors. The choice between embedded arrays and references will depend on the specific needs and requirements of your use case.

Referenced Models

Mongo Schema : Referenced Document

Referenced data is a data modeling pattern in MongoDB where related information is stored in separate documents and connected using references. In this pattern, each document contains a reference to another document, which can be used to retrieve the related information.

For example, consider a simple blog application that needs to store information about authors and their articles. Instead of storing the author information within the same document as the article (as in the case of embedded data), you could store each author as a separate document and use references to connect the authors and their articles.

Here is an example of what the author's documents might look like:

{
  "_id": ObjectId("5f9c8b1f2b4953f7cc6a0d21"),
  "name": "John Doe",
  "email": "john.doe@example.com",
  "bio": "John Doe is a software engineer with a passion for databases."
}

And here is an example of what an article document might look like:

{
  "_id": ObjectId("5f9c8b1f2b4953f7cc6a0d22"),
  "title": "A Guide to MongoDB Schema Design",
  "author_id": ObjectId("5f9c8b1f2b4953f7cc6a0d21"),
  "content": "In this guide, we will explore the best practices for designing a MongoDB schema..."
}

As you can see, the article document contains a reference to the author document using the author_id field. To retrieve the author information for an article, you would need to perform a query to retrieve the author document based on the author_id field.

Referenced data can be a useful data modeling pattern in MongoDB when you need to store large amounts of information about a particular entity and you don’t need to retrieve all of that information in every query. By using references to connect the related documents, you can improve query performance and reduce the amount of data duplication in your database.

Mongo Schema : Referenced Array Documents

A referenced array is a data modeling pattern in MongoDB where related information is stored in separate documents and connected using references stored in an array. In this pattern, a document contains an array of references to other documents, which can be used to retrieve the related information.

Here is an example of what the author document might look like:

{
  "_id": ObjectId("5f9c8b1f2b4953f7cc6a0d21"),
  "name": "John Doe",
  "email": "john.doe@example.com",
  "bio": "John Doe is a software engineer with a passion for databases.",
  "articles": [
    ObjectId("5f9c8b1f2b4953f7cc6a0d22"),
    ObjectId("5f9c8b1f2b4953f7cc6a0d23")
  ]
}

And here is an example of what an article document might look like:

{
  "_id": ObjectId("5f9c8b1f2b4953f7cc6a0d22"),
  "title": "A Guide to MongoDB Schema Design",
  "author_id": ObjectId("5f9c8b1f2b4953f7cc6a0d21"),
  "content": "In this guide, we will explore the best practices for designing a MongoDB schema..."
}

As you can see, the author document contains a referenced array of articles, where each element in the array is a reference to an article document. To retrieve the articles for an author, you would need to perform a query to retrieve the article documents based on the references stored in the articles array.

Referenced arrays can be useful when you need to store large amounts of information about a particular entity but don’t need to retrieve all of that information in every query. By using references to connect the related documents, you can improve query performance and reduce the amount of data duplication in your database.

Read Heavy vs Write Heavy Applications

When designing a MongoDB schema, it’s important to consider the read vs write ratio of your application to ensure that your schema is optimized.

For read-heavy applications, where the majority of operations are read operations, the focus is on optimizing read performance. To achieve this, you need to ensure that your schema is optimized for fast querying and indexing. This may involve denormalizing your data to reduce the number of joins required for querying and to ensure that the data can be retrieved with a single query.

For write-heavy applications, where the majority of operations are write operations, the focus is on optimizing write performance. To achieve this, you need to ensure that your schema is optimized for fast writes, even if this means sacrificing some read performance. This may involve normalizing your data to ensure that writes can be performed quickly, even if this results in slower read operations.

Read-Heavy Applications:

Denormalize your data: To reduce the number of queries required to retrieve data, consider denormalizing your data.
Use indexes: To optimize querying performance, use indexes on the fields that will be used in your queries.
Consider using embedded documents: To reduce the number of joins required to retrieve data, consider using embedded documents.
Pre-aggregate data: To reduce the amount of data that needs to be processed, consider pre-aggregating data.

Write-Heavy Applications:

Normalize your data: To optimize write performance, consider normalizing your data to reduce the amount of data that needs to be written with each operation.
Use write concerns: To ensure that writes are performed as quickly as possible, consider using write concerns to configure the write behavior of your operations.
Use bulk operations: To optimize write performance, consider using bulk operations to perform multiple writes in a single operation.

Conclusion

The final decision on the schema design will depend on the specific requirements and goals of your application. Here are some factors to consider when making a conclusive decision on schema design:

Data Volume: Consider the volume of data you expect to store and whether it can be stored on a single server or if it will require sharding.
Data Access Patterns: Consider the most common read and write operations your application will perform and design the schema to support these operations efficiently.
Data Relationships: Consider the relationships between your data and decide whether to use an embedded, referenced, or hybrid model to store the data.
Query Performance: Consider the query performance requirements of your application and use indexes to optimize query performance.

In conclusion, it’s important to carefully consider these factors when making a final decision on your MongoDB schema design.