Analyzing Data Storage: Regular Collection vs Time Series Collection

Kushagra Kesav
CodeX
Published in
3 min readApr 9, 2023

--

In the digital age, data storage has become an essential part of our lives. With the continuous generation of large amounts of data, it is crucial to select the right storage solution that can efficiently store and retrieve data.

MongoDB is a popular document database used by many organizations due to its scalability, flexibility, and ease of use. One of the significant features of MongoDB is its ability to store time-series data efficiently using the Time Series Collection, which is a specialized bucketing pattern designed explicitly for this purpose.

Temperature time-series data — Picture by danh

Time-series data are generated from various sources with monotonously increasing timestamps, such as IoT data, application logging, infrastructure monitoring, DevOps, stock trading, and cryptocurrency.

While storing data in a regular collection, MongoDB stores all the documents in the collection without any specific indexing, except for the default index, which is on _id. In contrast, the Time Series Collection uses a combination of bucketing patterns and data compression to store data efficiently.

To compare the disk usage of the two collections, let’s insert the following data using the script below:

const intervalMs = 1000; // 1 second
const docsPerMinute = 60;
const sensorData = [];
let count = 0;
let timestampMs = Date.now() - (10 * 24 * 60 * 60 * 1000); // set timestamp to 10 days ago

while (timestampMs < Date.now()) { // run loop until current time is reached
const doc = {
timestamp: new Date(timestampMs),
measure: -20 + Math.floor(Math.random() * 100),
metadata: {
unit: "Celsius"
}
};
sensorData.push(doc);
count++;
if (count >= docsPerMinute) {
db.sensor.insertMany(sensorData, {ordered: false});
sensorData.length = 0;
count = 0;
}
timestampMs += intervalMs;
setTimeout(() => {}, intervalMs);
}

This script generates random temperature data for the last 10 days and will insert it into sensor collection. It creates a document with a timestamp, a measure, and metadata every second, with a total of 60 documents per minute. Later we will insert the same data into the regular collection. We can then compare the disk usage of the two collections to analyze their efficiency in storing data.

{
"timestamp": {
"$date": "2023-03-30T11:07:25.142Z"
},
"metadata": {
"unit": "Celsius"
},
"measure": 26,
"_id": {
"$oid": "64329c6da9d1ea10e8973dc9"
}
}
....so on

To demonstrate the difference in storage efficiency between regular and time-series Collections, we inserted around 864K documents in both collections in MongoDB 6.0.5 and checked their respective sizes using the ‘db.coll.stats()’ command.

The Time Series Collection’s size was around 1.05 MB, while the regular collection’s size was approximately 16.83 MB. This indicates that the Time Series Collection took less than one-sixteenth of the space required by the regular collection to store the same number of documents.

Time series collection vs Regular collection

The specialized data structure of the Time Series Collection enables efficient storage and retrieval of time-series data, which is achieved through the use of column compression. Starting in MongoDB 5.2, time series collections use column compression, which significantly improves practical compression, reduces overall disk storage, and improves read performance.

Even, if we compare the totalSize (storageSize + totalIndexSize) of the regular collection, it is 26.2 MB as per the following stats:

  count: 864000,
storageSize: 16879616,
totalIndexSize: 9322496,
totalSize: 26202112,
indexSizes: { _id_: 9322496 },
avgObjSize: 87

For the time-series collection, the total size is 1 MB, which includes:

  count: 864000,
size: 1403590,
storageSize: 1048576,
totalIndexSize: 0,
totalSize: 1048576,

Using the Time Series Collection in MongoDB to store time-series data can optimize storage space and improve query performance. Yet, for non-time-series data, a regular collection will be a better option.

However, by following the best practices, we can further enhance the performance of our time-series collection. For more information on best practices, please refer to the MongoDB Documentation.

--

--