Asynchronous Downloads in Django with Celery

Tara Yoo
8 min readJan 2, 2020

--

At Compt, we have several complicated, long-running functions that generate data reports. These services started taking longer as we scaled up, and we were getting concerned that our report generation functions might time out, or jam the server for other users.

We didn’t see a lot of guides on implementing async downloads on apps that use Django for both the front and the back end, and hoped this guide would be helpful, especially for junior developers.

For most reports, a user hits a download button, and directly downloads the reports. We previously served them the reports synchronously — the ReportView linked to the download button generates a report and sends it back as the Response. We decided to asynchronize this process. Going forward, the ReportView would immediately send back a 202 accepted response, after starting off the report generation celery task.

Report Generation Before

Diagram of synchronous report generation. Clicking the download button generates a report then responds with the report.

Report Generation After

Diagram of asynchronous report generation. Clicking a download button starts a celery task and responds with a 202 code.

After sending a request to ReportView, we still needed a way to check if the celery task finished and returned the report.

There are several ways to do this. In order to keep this project as simple as possible, we decided to use short polling. In short polling, the client makes a request to the server to see if a task is done in intervals until the server responds that the task is finished.

We found a great article from Pillow’s engineering blog that outlined the short polling process for a Rails / React stack and decided to adapt it for our Django apps. Our final data flow looked like the below:

Data flow diagram of the previous three paragraphs.

Back End Implementation

We started by preparing the back end (Django View, and Model) to perform the following functions.

  1. Receive a request to start a Celery Task
  2. Save the Celery Task results in the database
  3. Return the Celery Task result to the client

Receiving requests to start a Celery Task

We set up an APIView using Django REST framework to create the task then respond to the client with a 202 accepted code, and the task’s id.

from rest_framework.permissions import IsAuthenticated
from rest_framework.response import Response
from rest_framework import views


class ReportView(views.APIView):
"""
Triggers a call to initiate an expense details report task to
asynchronously
generate expense details report.
"""
permission_classes = (IsAuthenticated)

def get(self, request, *args, **kwargs):
# view starts off the task
task = generate_report.delay()
# returns the task_id with the response
response = {"task_id": task.task_id}
return Response(response, status=status.HTTP_202_ACCEPTED)

Saving the Celery Task results in the database — Model Setup

This model saves the JSON output of an asynchronous task, associated with the task id and the requesting user. This is to double check that users only get the results of the task they kicked off.

class AsyncResults(models.Model):
"""
Temporary records of async task_id, the results as a JSON blob
with a status code,
and the user who requested the task.
"""
# the id of the celery task that generated the result
task_id = models.CharField(
blank=False,
max_length=255,
null=False,
verbose_name=_("task id"),
db_index=True)
# the tasks's result - represented as a JSON blob
result = models.TextField(
blank=False,
verbose_name=_("task result"))
created_on = CreationDateTimeField(
db_index=True,
editable=False,
verbose_name=_("created_on"))

Saving the Celery Task results in the database — Celery Task Setup

We use Celery’s shared_task since this is a Django app. A task should save a JSON blob with status_code

500, and the error message so we’d be able to go back and diagnose the problem if there are issues.

import sys
import json
from celery import shared_task
from app_name.models import AsyncResults
# bind the task to itself to ensure that task results get associated # with the correct task id
@shared_task(bind=True)
# for bound celery tasks, you need to pass in self as the first argumentdef generate_report(self, **kwargs):
"""
Task: Generate a data report, store for download, and save the
download URL to AsyncResults model once task finishes running
"""
try:
# function to generate, upload a report to AWS S3 then return
# the report's s3 url
download_url = execute_generate_report()
# get the celery task's task id
task_id = self.request.id
# generate a file name
filename = f"{sensible_report_name()}.csv"
result = {"status_code": 200,
"location": download_url,
"filename": filename}
json_result = json.dumps(result)
AsyncResults.objects.create(task_id=task_id,result=json_result)
except:
# save error messages with status code 500
result = {"status_code": 500,
"error_message": str(sys.exc_info()[0])}
json_result = json.dumps(result)
AsyncResults.objects.create(task_id=task_id, result=json_result)

Return the Celery Task result to the client

We set up another APIView using Django REST framework. This View looks up whether an AsyncResult with the requested task_id exists, and returns either the AsyncResult, a 202 status code, or a 500 error code.

from rest_framework.permissions import IsAuthenticated
from rest_framework import status
from rest_framework.response import Response
from rest_framework import views
class PollAsyncResultsView(views.APIView):
"""
API endpoint that returns whether an Async job is finished, and
what to do with the job.
Once a related Async task finishes, it saves a JSON blob to
AsyncResults table. PollAsyncResultsView looks for a JSON blob
associated with the given task id and returns 202 Accepted
until it finds one.

The JSON blob looks like the below
{ status_code: 200,
location: download url,
filename: download file name }
or if there was an error processing the task,
{ status_code: 500, error_message: error message}
"""
permission_classes = (IsAuthenticated, )
def get(self, request, *args, **kwargs):
task_id = self.kwargs.get("task_id", None)
# there should only be one async_result with the task_id, user
# combination
async_result = AsyncResults.objects.get(task_id=task_id,
user=self.request.user)
if async_result:
load_body = json.loads(async_result.result)
status_code = load_body.get("status_code", None) # if the task produced an error code
if status_code == 500:
return Response(
status=status.HTTP_500_INTERNAL_SERVER_ERROR)
else:
return Response(status=status.HTTP_200_OK, data=load_body)
else:
return Response(status=status.HTTP_202_ACCEPTED)

Front End Implementation

After setting up the back end to create and save Celery Task results, we needed to set up the front end to retrieve and return our task outputs — the reports — to the user. The front end needed to perform the following functions:

  1. Make a GET request to ReportView to start off a celery task, and to return the celery task’s id
  2. Make a GET request to AsyncResultsView to check if the celery task was done
  3. Continue making GET requests to AsyncResultsView until the celery task finishes

We decided to make the GET requests using jQuery’s .get() function. In order to make the get requests, we first needed to set up the urls for ReportView, and AsyncResultsView.

The final front-end code turned out to be a big chunk of javascript code. I’ll break up the code into sections according to its purpose then show the final result.

URL setup

from django.conf.urls import url
from app_name.views import ReportView, AsyncResultView
urlpatterns = [
url(r'api/poll_async_results/(?P<task_id>[0-9A-Za-z_\-]+)/?$',
PollAsyncResultsView.as_view()),
url(r'api/report/?$', ReportView.as_view()),
]

Making a GET request to ReportView to start off a celery task

We created a button with the id download-report-button

<div class="container push-top date_range_elements" id="download_employee_spend_button">
<div class="col-sm-12 text-center">
<button id="download-report-button" class="btn ga-enabled" data-ga-event="clicked_spend_details_dashboard" data-ga-category="Reporting" data-ga-label="Downloaded spend details from Dashboard" role="button" style="margin: auto;" type="button" name="report_type" value="details">
Download Report (CSV)
</button>
</div>
</div>

Clicking the download-report-button triggers a GET request to ReportsView.

$('#download-reports-button').on('click', function(e) {
e.preventDefault();
const url = '/api/report/';
$.get(url, {'additional query data': additional query data })
.done(
// ReportsView will return the task's ID
// This is where we'll pass in the function to make GET
// requests to AsyncResultsView
)
})

Making a GET request to AsyncResultsView to check if the celery task was done

Now we needed a function within the done() block above to make continuous GET requests to AsyncResultsView until the Celery Task finished processing. Below is a simplified version of our end result.

$.get(url)
.done(function pollAsyncResults(data) { // data is the celery task id
context: this
// see the URL setup for where this url came from
const pollAsyncUrl = `/api/poll_async_results/${data.task_id}`
$.get(pollAsyncUrl)
.done(function(asyncData, status, xhr) {
context: this
// if the status doesn't respond with 202, that means that
// the task finished.
if (xhr.status !== 202) {
// stop making get requests to pollAsyncResults
clearTimeout(pollAsyncResults);
// to download - create an anchor element and simulate a
// click
const a = document.createElement('a');
document.body.appendChild(a);
a.style='display: none';
a.href=asyncData.location;
a.download=asyncData.filename;
a.click();
// change the button back to normal and hide the overlay
$('#download-reports-button').text('Download Report(CSV)')
}
// if status is 202, that means async task still processing
else {
$('#download-reports-button').text('Loading...')
// Call the function pollAsyncResults again after waiting
// 0.5 seconds.
setTimeout(function() { pollAsyncResults(data) }, 500);
}
})

Line 29 setTimeout(function() { pollAsyncResults(data) }, 500); triggers pollAsyncResults again 500 milliseconds (0.5 seconds) after the previous iteration of pollAsyncResults finishes. We decided on setTimeout rather than setInterval to avoid multiple pollAsyncResults executing at once.

Final Front End Code

$('#download-reports-button').on('click', function(e) {
e.preventDefault();
// see the URL Setup for where this url came from
const url = '/api/report/';
$.get(url)
.done(function pollAsyncResults(data) {
// bind pollAsyncResults to itself to avoid clashing with
// the prior get request
context: this
// see the URL setup for where this url came from
const pollAsyncUrl = `/api/poll_async_results/${data.task_id}`
$.get(pollAsyncUrl)
.done(function(asyncData, status, xhr) {
context: this
// if the status doesn't respond with 202, that means
// that the task finished successfully
if (xhr.status !== 202) {
// stop making get requests to pollAsyncResults
clearTimeout(pollAsyncResults);
// to download - create an anchor element and
// simulate a click
const a = document.createElement('a');
document.body.appendChild(a);
a.style='display: none';
a.href=asyncData.location;
a.download=asyncData.filename;
a.click();
// change the button back to normal and hide the
// overlay
$('#download-reports-button').text('Download Report(CSV)')
}
// async task still processing
else {
$('#download-reports-button').text('Loading...')
// Call the function pollAsyncResults again after
// waiting 0.5 seconds.
setTimeout(function() { pollAsyncResults(data) }, 500);
}
})
// see PollAsyncResultsView in View Setup. If the celery
// task fails, and returns a JSON blob with status_code
// 500, PollAsyncResultsView returns a 500 response,
// which would indicate that the task failed
.fail(function(xhr, status, error) {
// stop making get requests to pollAsyncResults
clearTimeout(pollAsyncResults);
// add a message, modal, or something to show the user
// that there was an error the error in this case
// would be related to the asynchronous task's
// error message
})
})
.fail(function(xhr, status, error) {
// add a message, modal, or something to show the user
// that there was an error
// The error in this case would be related to the main
// function that makes a request to start the async task
})
})

We added catches for failures for both the main function (the one that kicks off the async task), and the async task results. It’s not in the code example, but Compt handles errors by showing a modal with the error message inside.

How did you implement asynchronous tasks to download files? Please give feedback or corrections if you know a better way to implement this feature.

--

--