Creating the MongoDB Database Backend for Django
Written by Jib Adegunloye (Senior Software Engineer @ MongoDB)
Django is an open source, high-level Python web framework that encourages rapid development and clean, pragmatic design. It takes care of much of the hassle of web development so you can focus on writing your app without needing to reinvent the wheel. Django is an SQL-based framework, so in order to build a NoSQL database backend for Django, we need careful thought and consideration. Part of our approach requires clever tactics, while other parts just require some good old-fashioned sweat equity.
Unpacking motivations
At MongoDB, we have several Django enthusiasts who have jumped at the idea of backing a long-term solution to combining MongoDB and Django. Having historically provided support for SQL-based open source frameworks like Entity Framework in .NET/C#, Doctrine in PHP, and many more, we are familiar with the territory. However, we still needed a strong narrative from the Django community to justify the organizational cost to develop and maintain this project.
That narrative came last year when we noted a growing presence of MongoDB usage with the Django framework. In the 2021 Django Developers Survey, MongoDB was not listed as a backend database used by Django developers. Then, in the 2022 Django Developers Survey, MongoDB was cited as the most used in the 6% of “other” databases developers used. Finally, in the 2023 Django Developers survey, 8% of respondents reported using MongoDB in their Django applications. This datapoint set off alarm bells for the MongoDB team and motivated us to do more research.
We learned from developers that a common pattern of integrating MongoDB and Django is by connecting to MongoDB using the APIs of PyMongo, the official MongoDB Python driver library, or the open source Object Relational Mappers (ORMs) such as mongoengine
. This differs from the standard way to connect Django to a database by specifying a database backend in the DATABASES setting. As a consequence, to use the pre-built user accounts, authentication, and admin tools requires either an SQL database to manage those parts of the Django application or an abundant amount of developer time to try to integrate MongoDB within each aforementioned component.
The cost of supporting two databases or doing boutique rewrites of the Django framework is significant, so those projects must really need the scalability and flexibility that MongoDB offers. We thought that by offering a more comprehensive compatibility layer, developed and supported by MongoDB, we could free those projects from this burden and allow them to build fully fledged Django apps backed solely by the MongoDB document database.
It felt best to accomplish this with a new library since our team could not find an actively maintained one. Using django-nonrel
’s django-mongodb-engine
as an example, it has not been updated since 2015 and was implemented with Django 1.6, which had long since reached end-of-life. With that library and others like it that are so outdated, the technical debt we would take on felt too high, and the users we would impact felt too low. We wanted to start fresh without any concerns for backward compatibility.
With the understanding of the newfound demand of MongoDB in the Django ecosystem, coupled with our longstanding desire to integrate ourselves in the community, we got to work.
Problem breakdown
We wanted to create a “spork”: an object combining two tools with dissimilar functions but the same end goal to produce a new, more optimized tool that still pays homage to its origins. In the case of making a spork, the applications of forks and spoons are well scoped, however, Django and MongoDB do a multitude of things. If we tried to include every feature we wanted, we risked going down the wrong path, racking up technical debt and painting ourselves into a corner. Conversely if we provided too little functionality, then users may see no purpose in using the spork — leaving us with minimal feedback to iterate on. The crux of our problem became boiling down Django and MongoDB to have their quintessential components have developers say “this is a spork!”
Before we could boil it down, we needed to understand how the two systems must coexist. The key comes from emphasizing the difference in scope between a web framework and a database library. Though it is integral to the framework’s function, Django defines the “Database Backend” as a component of the larger Django framework; other pieces of the codebase — admin, configs, views — will only leverage APIs exposed by the database backend component. This is a boon for us as that means we can follow the pattern of other third-party database backends and “plug” our library into the framework via the DATABASE settings, which is responsible for dictating the library the api calls will reference.
Next came determining our minimum viable product (MVP) — the “quintessential spork”. We wanted to work on only the needs of the library, so we could share with other developers quickly and work on it iteratively. If it were a spork, it would not need to have all the teeth, but it would need to “spear” and “scoop”. Here is what we came up with:
This is by no means an exhaustive list, nor is it a representation of everything we believe makes either the framework or MongoDB indispensable.
django-mongodb
Minimum Requirements
- Able to define MongoDB collections using Django models
- Able to query data using Django’s QuerySets
- Able to use Django’s authentication system (django.contrib.auth)
- Able to use Django’s automatic admin interface (django.contrib.admin)
- Able to use Django’s management commands
- Able to use MongoDB’s Aggregation Pipeline
- Able to use MongoDB Embedded Documents
We believe these minimum requirements give developers many of the qualities they look for when choosing either of these libraries. By virtue of leveraging MongoDB as a backend, you get access to a wealth of MongoDB database experiences in addition to the quality of life wins from our flexible schema, performance wins from our scalability and querying power, and our overall intuitive design patterns that come from being document-based. This angled focus to be on the marriage of best coding practices in both the Django and PyMongo libraries.
Even with this list, it was difficult to have confidence we would have a functioning MVP, so we needed to define additional acceptance criteria. The difficulty came down to complexity — how complicated an application could a developer expect to be able to build with the MVP? In another way, imagine we were creating a packing list for a backpacking trip. We would include the essentials like water and stable footwear, but what about hiking poles? Or a tent? The key for us was to focus on the essentials.
In the end, we took our cues from the official “Writing your first Django app” tutorial offered by the Django Software Foundation. We believe if a developer could get through the whole tutorial using the django-mongodb
library, performing all the functionality it introduced (defining models, creating migrations, creating views and templates, and using the automatic admin interface), we would consider our MVP ready to show early users who could help us improve it.
While we want to support much of the SQL-oriented functionality of Django, we still want the MVP to also capture elements that would emphasize a uniquely MongoDB experience. Our MVP needed to maintain the ease of scalability as well as the flexibility provided by MongoDB. We did not want to lose the spirit of a document with this solution and make something that only replicated SQL commands in NoSQL. To that end, we needed to introduce core elements of what makes MongoDB so powerful. That’s why we chose to support the use of Embedded Documents in our MVP. While we do want to make the integration feel seamless, we also want to promote a product that feels uniquely different from the supported backend databases. Including embedded documents ensures developers can structure their databases and collections the way we teach in MongoDB University. We know that balancing these elements of compatibility and individuality takes more work; however, our commitment to being active and eager maintainers of this project emphasizes how important we feel it is.
Finally, we still face the challenge of being a third-party library; any changes that require updates to the core library will take time to integrate. As such we aim to build a database backend without making any pull requests to Django itself. This approach allows us to iterate faster and release updates without waiting for Django’s feature release cycle.
Now that we had an MVP with itemized, concrete deliverables; we proceeded full steam with “making the thing.”
“Making the thing”
Even though we knew we wanted to make a new library, we had a strong hunch not to create a new library entirely from scratch. Our thinking was that building on an existing open-source Django backend for MongoDB (even if outdated) would save some time compared to writing a backend entirely from scratch. We chose django-nonrel/mongodb-engine since we saw it took the approach we wanted: a database backend that allows using Django models and QuerySets. However, with the last commit occurring over nine years ago in the era of Django 1.6, as well as running against MongoDB 3.2, we knew it would need to be updated to work with the latest versions of Django and MongoDB. Using it as a starting point, we made an updated backend that passed some of Django 5.0.x’s test suite. This work became the initial code commit to the mongodb-labs/django-mongodb-backend repository.
We found recording the number of passing tests in Django’s comprehensive test suite to be the best metric for tracking our progress; however it came with an additional work tax. The framework has a daunting amount of tests and we were unsure which ones to pay attention to. Moreover, the initial commit only managed to pass a handful of the many Django test apps, signaling a large space of work to be at parity with other third party backend implementations. Nevertheless, we understood the test suites give the best objective representation of coding working as intended, so it was integral to familiarize ourselves with every one of them. The best way to do this was through triaging — the process of giving a preliminary assessment of tasks and determining their urgency based on the qualities of the problem. This is inherently time-consuming work as it requires us to look over every test suite — and sometimes individual test cases — to figure out how soon we need to address the test failure. The end result was entirely worth it.
Triaging the Django test suite has helped us to identify bugs and missing features in our backend. As we’re able to add more Django test apps to our continuous integration check, we gain confidence that our backend is moving toward completion.
Triaging the test suite also allows us to identify and document Django features that MongoDB may not be able to support, at least for now. For instance, datetimes in MongoDB don’t support microsecond precision, and, as a result, Django’s DateTimeField
and DurationField
are unable to give the same microsecond precision provided by most SQL databases. Up until 2017, Django’s test suite had a supports_microsecond_precision
database feature flag that backends could set to False if needed. This would adapt some assertions in Django’s test suite to account for this behavior. Unfortunately, the flag was obsoleted when Django dropped support for MySQL 5.5, the last database to use it. So part of our work was reintroducing this feature flag into our Django fork. In the future, we hope to contribute this patch back to Django. (To be clear, this patch only affects Django’s test suite, and django-mongodb-backend
does not require a patched version of Django to run.)
Making queries work
The Django framework comes with a powerful ORM to query the database using an abstraction API, without the need to write SQL queries. For django-mongodb, we were confident we could replicate this experience seamlessly using the MongoDB Query API.
While attempting to add support for more complicated queries to our initial commit, we observed that the NoSQL generation code took a different logical approach than Django’s SQL generation. While it handled negation (NOT) logic and compound logic (AND, OR), the code was complicated and difficult to grasp, and trying to extend it proved difficult.
We wondered if we could refactor it to be more similar to Django’s logic. This effort was successful. Each object in Django’s query tree has an as_sql()
method, and we found that by attaching an as_mql()
method to each class that might be in the tree, we could build the correct MQL by traversing the tree and calling this method for each object.
As we triaged more of the test suite, we frequently encountered errors like “‘<Func> object has no attribute ‘as_mql’”
. If supported by MongoDB, we implemented an as_mql
method for the Func
; otherwise, we raised NotSupportedError
.
As we added support for more complicated queries, we took advantage of MongoDB’s aggregation pipeline, which performs sequential operations in stages. Mutated documents are passed from one stage to the next. You can think of the MongoDB aggregation pipeline as a traditional assembly line with each stage of the pipeline being an assembly line step, while the database documents are the items molded on that line: “$match
all the documents with x, $group
the matched documents by their field value z and sum value y, take the grouped elements and project the groupings in sorted fashion to show the top 10 sums of value y”.
JOINs
One of the biggest questions we faced was whether or not to provide support for queries that use the SQL JOIN operation. MongoDB does have an operator that mirrors a JOIN — $lookup
— but just because MongoDB supports it does not mean we want to promote its usage. Fundamentally, relational databases thrive on having several different tables linked by some shared attribute (primary key, foreign keys, shared enums, etc.). A consequence of this is that JOINs are essential to make these systems function optimally. Conversely, MongoDB thrives when all the information is stored in the same collection. We generally avoid multiple lookup operations between collections and push for a document within a collection to have most, if not all, of the information packaged within it. Thus, whether or not to support JOINS using the $lookup
operator was a question of philosophy.
Ultimately, philosophy gave way to practicality, and we chose to support multi-collection queries as a part of our MVP requirements. We couldn’t support Django’s built-in contrib apps, such as the admin and authentication systems, without them, at least without some fairly invasive patching of Django itself, which we did not want to do. Doing so would delay our work until the next major release of Django and further, and there is no guarantee that Django would accept such patches. Similarly, when thinking about compatibility with the large ecosystem of third-party Django libraries, supporting these queries is necessary.
The implementation was fairly straightforward. We attach an as_mql
function to Django’s Join
data structure, and add the logic of each Join
to a “lookup pipeline” stage in the aggregation pipeline.
Conclusion
“If you want to build a ship, don’t drum up the men to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.” — Antoine de Saint-Exupéry
I think of the above quote often when thinking about this work — Django and MongoDB being the sea, and the new library being our ship — our eyes are acutely fixed on how magnificent it will be when it works. The MongoDB team is passionate about this project, not just because we have seen data points telling us that there is a market fit, but because we like Django a lot.
That joy of developing is what has kept us going in rigorously hashing out our MVP details, poring through lines upon lines of code to make every necessary SQL-to-MongoDB conversion, and grappling with individual tests in Django’s test suite to ruthlessly document what we can and cannot achieve.
We are committed to working on the MongoDB backend for Django, and have made significant strides to ensure that we’ll provide a first-class experience to the open source community.
This is a continual learning experience, and we would love to have similarly passionate developers try the library, give feedback, file issue tickets, or submit pull requests.
Whether you are a MongoDB or Django enthusiast (or both!), please check out the django-mongodb-backend
repository.