Efficient Bulk Create with Django Rest Framework

Learn how to use the ListSerializer with bulk_create to build efficient POST API endpoints with Django Rest Framework

Chris Knorowski
The Startup
5 min readMay 18, 2020

--

Generic rest framework endpoints are typically designed to modify one object at a time. However, you will often find that this can become a huge bottleneck to performance when you need to do thousands of create or updates. In this case, instead of thousands of calls to your endpoint, it is better to do one call that does a bulk create.

In this post, we are going to walk through how to perform efficient bulk creation of model objects using Django’s Rest Framework. A full working Django app with all of this code and unit tests can be found on GitHub here. The Github repo also contains the steps for doing efficient Bulk Updates which will cover in a future post. Also see Part 2: Efficient Bulk Updates with Django Rest Framework to learn about faster updates to your endpoints.

Objectives

By the end of this tutorial, you should be able to

  1. Implement a create API using standard Django Rest Framework.
  2. Perform bulk creates using a ListSerializer.
  3. Optimize the API using bulk_create along with some other clever tricks to give over a 15x speed improvement in performance.

Model Overview

We will reference two models, a Project model which has Task models associated with it. The models are defined in models.py as follows.

Additionally, whenever a new Task is added, the business logic requires that the Project last modified should be updated. To handle the business logic we use the signal API with a post_save and post_delete on the task. This way after a save is called, the signal will also be invoked updating the Project last_modified.

Views

The generic view classes are abstract methods from Django Rest Framework which implement HTML methods POST/PUT/GET/DELETE. For the create endpoint we will use the generics.CreateAPIView which provides the post method handler. The standard create View for a model in Django such as Task is defined below, where we only need to specify the serializer and what the URL API is to invoke the function.

Serializers

Serializers are responsible for taking the user input, validating it and turning it into an object consumable by the database. They also handle converting objects from the database into something that can be returned to the user. Additionally, a serializer also specifies which fields are required and what properties they have. For this project, we create the TaskSerializer which expects create objects to have a name, description, and project id field to create the object.

Since the id for the project is specified in the URL, we use a HiddenField with a CurrentProjectDefault class to specify how to pull the project id from the request and retrieve the Project object from the database. The CurrentProjectDefault class is defined as follows.

With all of these pieces in place, we have completed the basic implementation of our API. Let us now profile the performance. To do that we will use pytest to create a unit test where we generate 10,000 Task objects by calling the API once for each create.

To get the run time of the test, we will set the duration flag when calling pytest.

As you can see it is so slow! Performing 10,000 creates takes around 30 seconds.

ListSerializer

Let’s go ahead and look at how to speed up the performance of the code. The first optimization we will do is to switch to using a ListSerializer. The list serializer will allow you to submit a request for multiple creates in a single call.

To do this we will overwrite the get_serializer method our generic CreateAPIView to check for input data that is a list. If it is, it will set the property of kwargs[“many”]=True. This tells the serializer that is should use the list_serializer_class before calling the individual creates on each Task.

To test the change, we’ll create a test that passes a list of Tasks to be created and only needs to call the API once.

Lets’ go ahead and check the performance from that change.

So just by adding a ListSerializer method we can see that a 3x speed improvement. Still, 13.26s for 10,000 creates is slow. The key to further optimizing the performance of this API is going to be reducing the number of calls to the database.

Consolidate Logic

Currently, our serializer is calling CurrentProjectDefault to get the project that is associated with each Task instance object it is creating. Instead, we are going to modify the post function of our view to pull the project and insert it into the request.data object. This way, we only need to do a single database hit for our project object for all of our Tasks.

We will also need to replace the CurrentProjectDefault field in our serializer with a custom field. We create a custom field named ModelObjectidField which returns just returns the data passed into it.

bulk_create

Next, we will create a BulkListSerializer, which will use the Django’s bulk_create introduced in Django 2.2. This function allows you to perform a bulk create in the database by passing a list of objects. The following code describes the bulk create ListSerializer.

We also need to modify the serializer so that it no longer does a save on the create method, but only returns the new instances. Then after we have created all of the new instances, our BulkCreateListSerializer will call the bulk _reate method which hits the database a single time to perform the creation.

What about our signals?

When doing a bulk_create the signals for the models are no longer triggered. This is actually a good thing as, while convenient, signals can be incredibly inefficient. Instead, we create a update_project_last_modified function that updates the last modified date of the Project after the creation is performed.

Finally, let us go ahead and test the performance of our new function using the bulk create.

As you can see the test now runs in less than 2 seconds. That is a 15x performance improvement comes without adding much complexity to the code.

Summary

With that, we have looked at how you can improve the performance of your Django app using a ListSerializer and the bulk_create functionality introduced in Django 2.2. Those two methods along with paying careful attention to minimizing the number of calls needed to your database can give you huge performance improvements without a lot of extra work.

Again, a full working Django project with all of this code and unit tests can be found on GitHub in the dango_bulk_tutorial repository. I hope you enjoyed this post, I’ll be following up soon with a tutorial on optimizing Bulk Updates as well.

--

--

Chris Knorowski
The Startup

CTO/Cofounder of SensiML. Works at the intersection of physics, software engineering and machine learning.