How to know that file was fully downloaded or not with Python and Nginx

Artem Hoholiuk
Go Wombat
Published in
4 min readFeb 13, 2020

Introduction

A few weeks ago, I got an interesting task on my current project: to find out if a file was completely downloaded from a server and write information about this event in the database by using API. I started looking for a solution and found one on StackOverflow. It suggested using the Nginx server. For API service, I used Django and Django Rest Framework. After starting implementing the described solution I realized it had many disadvantages. I’m gonna show my solution to the task.

First of all, you need to create a few microservices:

  • web (for managing API requests);
  • nginx (as you understand it will be an Nginx web-server);
  • logger (a part responsible for gathering and sending logs to web-part).

Also, it is good practice to use docker containers for this purpose.

Nginx part

Now it is time to write the nginx.conf file. We will store all the information about a request, so specify the name of the label, in our case, ‘download’. Then, describe the format of your future log file: an IP, time of a request, a request, a status code, a number of bytes which were sent, the name of the file. To know more, check Nginx documentation. Traffic is the main parameter because, in the API, you must compare its value with actual file size and find out if the file was downloaded completely (a little trick).

log_format download '{"remote_addr" : "$remote_addr",' 
' "time":"$time_local",'
' "request":"$request", '
' "status":"$status", '
' "traffic":$body_bytes_sent, '
' "uri": "$uri" }';

/media is a directory in the Django app which stores files, so we add an alias to get files from its, that’s why you need to add a proxy_pass to the web application. After this, just add a location for the log file.

location /media {
proxy_pass http://web:8000;
default_type application/octet-stream;
alias /web/media/;
access_log /var/log/nginx/download.checker.log download;
}

Everything else is like in an ordinary Nginx config. Full the config:

worker_processes auto;

events {}

http {

log_format download '{"remote_addr" : "$remote_addr",'
' "time":"$time_local",'
' "request":"$request", '
' "status":"$status", '
' "traffic":$body_bytes_sent, '
' "uri": "$uri" }';

server {
listen 80; # the port your site will be served on
charset utf-8;

client_max_body_size 700M; # max upload size

# configs for the Django app
location / {
proxy_pass http://web:8000;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

location /media {
proxy_pass http://web:8000;
default_type application/octet-stream;
alias /web/media/;
access_log /var/log/nginx/download.checker.log download;
}
}
}

In the same directory, add the simple Dockerfile for the Nginx:

FROM nginx

WORKDIR /nginx_logs
COPY nginx.conf /etc/nginx

EXPOSE 80

logger part

Now, the turn of the logger service. Create in the root the logger directory, add to it logs directory, and download.py file inside. You need to read data in real-time from the log file. Use the Popen class from the subprocess python module:

log = subprocess.Popen(
['tail', '-F', 'download.checker.log'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)

while True:
send_logs(log)

send_logs function sends data about a request to web application, that’s why you need to install requests module for this. All example is uploaded below:

import subprocess
import json
import requests
import logging


def send_stats(log: subprocess.Popen):
data = json.loads(log.stdout.readline(), encoding="utf-8")
payload = {
"traffic": data["traffic"],
"file": data["uri"].split("/")[-1],
# you can add your fields
}

try:
response = requests.post(
url="http://web:8000/api/v1/stats",
json=payload
)
response.raise_for_status()
except Exception as e:
logging.error(e)
else:
logging.info(f"Log was sent to web API {payload}")


if __name__ == "__main__":
log = subprocess.Popen(
['tail', '-F', 'download.checker.log'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)

while True:
send_stats(log)

And docker file. Nothing ordinary here, but it deserves special attention because it uses the slim variant of Python image, which saves significant resources.

FROM python:3.8-slim-buster

WORKDIR /logs
RUN pip install requests
COPY /logs /logs

CMD ["python", "download.py"]

docker-compose file

On the final step, specify docker-compose file with shared volume. Why do you need it? It is simple. Because your log file is in the Nginx container and logic for sending data in the other. So, create a special volume shared_logs and connect containers.

version: '3.7'
services:

web:
# your application with API

nginx:
build:
context: ./nginx
links:
- web
volumes:
- "shared_logs:/var/log/nginx/"
ports:
- "80:80"

logger:
build:
context: ./logger
links:
- web
volumes:
- "shared_logs:/logs"

volumes:
shared_logs:

Conclusion

As you see, it is a simple task, but it may take a long time if you don’t know where to start. I hope you liked my article. Thank you for reading :)

--

--