How to prepare a Python date object to be stored in MongoDB and run queries by dates and range of dates.
How I managed to insert dates in MongoDB and run queries between days.
Introduction
I think one of the most tedious things in Computer Science is dealing with datetime and timezone. If Computer Science were just ordering records by day or querying DBs for records created between ranges of days, I would call myself out of the game.
I learned that the way a timestamp is stored makes the difference and could allow you to run the right query or gives you one of those days.
My case was pretty simple, yet it took me hours to sort it out. We had a MongoDB collection with timestamps stored as a String. ( first big mistake ). Moreover, the structure of each record was a nested JSON ( second mistake) that is not the best one when it comes to query documents. Easy to visualise, not as easy for querying.
So, suppose we have to store actions made by a user and, for each action, we want to store a timestamp.
The initial data format looked like
{
"user_email": "jakebrown@test.com",
"delivery": {
"box1": {
"97cc5e852cf44fc38fdb55dc006d1d01": {
"owner_name": "Peter Parker",
"owner_email": "peter.parker@spiderman.com",
"registration_date": "2021-03-01T18:12:07+00:00"
}
},
"box2": {
"a7bd1e34f809410590679cd2c1172cd3": {
"owner_name": "Dylan Dog",
"owner_email": "dylan.dog@test.com",
"registration_date": "2021-03-02T12:12:01+00:00"
},
"box3": {
"b787fdeb892d406196fda2ca9a15074b": {
"owner_name": "Nathan Never",
"owner_email": "nathan.never@alpha.com",
"registration_date": "2020-12-02T18:12:07+00:00"
}
},
"box4": {
"1f8741da011640988359fec4d47225f3": {
"owner_name": "Jerry Drake",
"owner_email": "jerry.drake@misterno.com",
"registration_date": "2021-02-02T14:07:07+00:00"
}
}
}
}
}
There were two problems with the above structure. The first was that the registration_date value was stored as String. It didn’t allow me to query results by a range of dates. Dates stored in String format are nice to display and, assuming the date matches a standard format, are relatively easy to convert back to a datetime object but I couldn’t run a query like
{“registration_date”: {“$gte”: ‘2021–03–03T18:12:07+00:00’, “$lte”: ‘2021–03–02T18:12:07+00:00’}}
The second problem was that it was quite tricky to query by the ID ( e.g. 1f8741da011640988359fec4d47225f3), which was the key of each nested dictionary. In this regards, this article is quite explicative and helped me to understand the error I made in the initial data design.
Refactor data so to follow the MongoDB way
After going around the documentation of MongoDB, I realise that the right way of storing a timestamp is a JavaScript Date type. Period.
As reported in the official documentation,
MongoDB has no special datetime object and it uses the JavaScript Date type, which is stored in BSON form. Internally, Date objects are stored as a signed 64-bit integer representing the number of milliseconds since the Unix epoch (Jan 1, 1970). The official BSON specification refers to the BSON Date type as the UTC datetime. BSON Date type is signed. Negative values represent dates before 1970.
Date() returns the current date as a string in the mongo shell.
new Date() returns the current date as a Date object. The mongo shell wraps the Date object with the ISODate helper. The ISODate is in UTC.
MongoDB automatically wrapped dates with the function: ISODate(). ISODate() is a built-in function that wraps the native JavaScript Date object providing a convenient way to represent dates in a human-readable format preserving the full use of dates queries and indexing.
So, out of the box, MongoDB gives a default ISO representation for dates. Should you need a format different from the ISO one, you need to convert yourself on the client-side.
Perfect. So, I first decided to modify the type of registration_date key to datetime only to realise that, with my original data structure, it was impossible to get back only the records that match my query’s criteria. It was a nested structure, so all the structure were returned. Then I decided to modify the structure and have it simpler. More records to store, but easier to query.
To insert the registration_date as a Python datetime I used the function datetime.today()
payload_to_insert = {
"user_email": "jakebrown@test.com",
"owner_name": "Peter Parker"
"owner_email": "peter.parker@spiderman.com",
"box_id" : "a7bd1e34f809410590679cd2c1172cd3",
"registration_date": datetime.today().replace(microsecond=0)
}
The registration_date is now a datetime object, it is stored as JavaScript Date type by MongoDB and then it is compatible with what MongoDB requires to run queries on dates.
Moreover, the ID that was a key now is a value of an attribute. I fixed the data structure and made it easy to query.
For a double check, we can run a query from the MongoDB shell to see that the registration_date’s format is Date().
db.test.find()
{ "_id" : ObjectId("..."), "registration_date" : new Date("2021-03-02T14:51:15+0000") }
Now, let’s play with Python and the datetime function to run a few queries.
Query by dates range
Probably one of the most common queries ever, “give me all the results between two days on the calendar”. That’s why components like this https://material.angular.io/components/datepicker/examples exist.
We have learned our lesson and we know we need datetime objects. So, let’s convert a string to a datetime first and then create our query.
from datetime import datetime, timedeltafrom_date = datetime.strptime(from_date, '%Y-%m-%d')
to_date = datetime.strptime(to_date, '%Y-%m-%d')criteria = {"$and": [{"registration_date": {"$gte": from_date, "$lte": to_date}}, {"owner_email": "peter.parker@spiderman.com"}]}
then we can use the criteria in our find()
import json
from pymongo import MongoClient
from bson.json_util import dumps
from datetime import datetime, timezone
from datetime import timedelta
def query_by_range_of_dates(mongo_url, db_name, collection_name, owner_email, from_date,
to_date):
client = MongoClient(mongo_url)
db_instance = client[db_name]
from_date = datetime.strptime(from_date, '%Y-%m-%d')
to_date = datetime.strptime(to_date, '%Y-%m-%d')
criteria = {"$and": [{"registration_date": {"$gte": from_date, "$lte": to_date}}, {"owner_email": owner_email}]}
return json.loads(dumps(db_instance[collection_name].find(criteria)))
Query by a specific day
What if I want all the actions made on a given day? For querying by a given day, we need to use a range from 00:00 to 23:59:59 of the same day.
from datetime import datetime, timedeltafrom_date_str = day_to_query + "T00:00:00+00:00"
to_date_str = day_to_query + "T23:59:59+00:00"
from_date = datetime.strptime(from_date_str, '%Y-%m-%dT%H:%M:%S%z')
to_date = datetime.strptime(to_date_str, '%Y-%m-%dT%H:%M:%S%z')
criteria = {"$and": [{"registration_date": {"$gte": from_date, "$lte": to_date}}, {"owner_email": "peter.parker@spiderman.com"}]}
N.B.: I need to specify the UTC timezone because by default MongoDB stores Date() objects as ISOFormat().
Query with timedelta
Another useful way for querying dates is to go back from a given number of days.
from datetime import datetime, timedeltatoday = datetime.today()
days_back = today - timedelta(days=30)
criteria = {"$and": [{"registration_date": {"$gte": days_back}}, {"owner_email": peter.parker@spiderman.com}]}
To sum it up
MongoDB dates can cause some frustration when it comes to date to store and to query. Also, a nested data structure is not our best friend when we need to query it. I have learned that it’s not always true that simple is easy. Storing a date as a String is simple and easy to visualize. However, it’s not easier to query.
The following is a list of resources that helped me
- https://docs.mongodb.com/manual/reference/operator/aggregation/dateFromString/
- https://www.tutorialspoint.com/how-to-search-date-between-two-dates-in-mongodb
- https://www.bogotobogo.com/python/MongoDB_PyMongo/python_MongoDB_pyMongo_tutorial_Range_Queries_Counting_Indexing.php
- https://www.mongodb.com/blog/post/building-with-patterns-the-attribute-pattern
- https://developer.mongodb.com/community/forums/t/query-nested-objects-in-a-document-with-unknown-key/14511
- https://julien.danjou.info/python-and-timezones/
- https://stackoverflow.com/questions/47390554/pymongo-date-rangesame-date-with-different-time-query-returns-no-results
- https://www.tutorialspoint.com/How-to-prepare-a-Python-date-object-to-be-inserted-into-MongoDB
System Snapshot
- Python 3.7
- MongoDB 4.4.1
- pymongo 3.11.3