Sitemap
Better Programming

Advice for programmers.

Serving Static Assets in Django With Kubernetes

9 min readAug 21, 2020

--

Image source: Author

This article focuses on a strategy to deploy static assets, specifically assets generated with the create-react-app toolchain. The assets are served using Django deployed in a Kubernetes cluster on the Google cloud GKE platform.

The technologies used include Django, React, Docker, gcloud and its related CLIs, e.g. gsutil. It explores an example of a Docker build pipeline that generates the image and pushes the static assets to a gcloud bucket. The article borrows heavily from Google’s own guide titled “Running Django on Google Kubernetes Engine(referenced as Django GKE for the rest of the article).

It was found that the GKE guide had some blind spots, or rather specific use cases, that were not covered in the guide, and some of these gaps are addressed here. Anything not mentioned in this article is assumed to be covered in the Django GKE guide (e.g., creating GKE clusters as well as buckets and databases) so refer to it if you need to have the whole picture.

The article is divided into three main sections: the React App section, the Django section, and the build and deploy section. Everything in this article can be found in the django_react_k8s GitHub repository. As such, we will only look at the core points as opposed to going into detail on how to set it all up.

React App Setup

In the django_react_k8s repository, the static assets can be found in the ./static folder and were created using the following command.

npx create-react-app static

This installs a standard React application, but we need to make some modifications in order to serve static assets to Django in a production setting. The first set of modifications is to install packages that allow us to customize the create-react-app config. There packages can be installed by running:

yarn add -D react-app-rewired customize-cra webpack-bundle-tracker@0.4.3 write-file-webpack-plugin
  • The first package is the react-app-rewired which overrides the create-react-app webpack configs without having to eject it.
  • The customize-cra works in conjunction with react-app-rewired and provides useful utilities to perform the overrides.
  • The webpack-bundle-tracker spits out some stats about the webpack compilation process to a file. In this case, the stats file will provide an interface that makes Django aware of the different assets that have been generated, including their locations on disk or in the cloud. This is useful as filenames tend to be dynamically hashed during the production build process, and it will be quite cumbersome to keep hardcoding the changes in the Django app. Note the version to install is 0.4.3 due to a bug with the package in newer versions (see issue 227).
  • The write-file-webpack-plugin is a handy package that writes the generated static files to the file system during the development cycle. Using yarn start allows static content to be also accessed and served from Django locally.

The next set of customization has to do with the React package.json file. It needs to be updated as follows (read the react-app-rewired for more detail).

"scripts": {
"start": "react-app-rewired start",
"build": "react-app-rewired build",
"test": "react-app-rewired test",
"eject": "react-app-rewired eject",
}

In order to actually override the webpack settings, we need to create a config-override.js file within the static folder. It should look like this (check customize-cra documentation for more detail).

This should be quite straightforward if you have used React before, or if you read through the docs for the individual packages. The important modifications are in the config.output.publicPath setting. In production, the publicPath should point to the Google cloud bucket, and otherwise, in the development cycle, should be the original setting. In addition, the variable REACT_GCLOUD_BUCKET is passed in during the npm build process and is useful as it allows for different bucket locations to be set, e.g., one for staging and another for production deployments. Changing the output.path to the buildPath allows Django to find the static files.

The optimization.splitChunks.name, optimization.runtimeChunk, and output.futureEmitAsset needed to be configured to get the setup to work in Django. I didn’t have time to investigate the root cause as to why this was the case. If you know, perhaps add a comment as to why.

Django Setup

In addition to the django-admin startproject command and the boilerplate code generated, some modifications were made. The only Python package required is the django-webpack-loader, which allows for the transparent use of webpack in Django. This can be installed by running pip install django-webpack-loader. The documentation for how to set it up can be found in the django-webpack-loader GitHub page. However, for this use case, the modifications to the settings file should be as follows:

INSTALLED_APPS = [
...,
'webpack_loader'
]
REACT_STATIC_PATH = STATIC_PATH / 'dancelogue' / 'build'
REACT_STATS_PATH = DANCELOGUE_STATIC_PATH / 'webpack-stats.json'
WEBPACK_LOADER = {
'DEFAULT': {
'CACHE': not DEBUG,
'BUNDLE_DIR_NAME': str(REACT_STATIC_PATH),
'STATS_FILE': str(REACT_STATS_PATH),
'POLL_INTERVAL': 0.1,
'TIMEOUT': None,
'IGNORE': [r'.+\.hot-update.js', r'.+\.map']
},
}

First, add the webpack_loader package to the installed apps. Next, configure the loader behavior using the WEBPACK_LOADER variable setting. Within the dictionary, the BUNDLE_DIR_NAME should point to the React build folder, i.e., to static/build/ as opposed to the root static folder as we are only interested in the output from the React build process. Within the build folder, the webpack-stats.json can be found that was generated using webpack-bundle-tracker from the React section. This configuration allows Django to serve the hashed files generated during yarn start or yarn build.

Additionally, Django needs to know the location to collect the static files to and where to serve them from in production. This is done by adding the following to the settings.py file.

STATICFILES_DIRS = [ REACT_STATIC_PATH / 'static' ]
STATIC_ROOT = BASE_DIR / 'collectstatic'
if IS_DEPLOYED:
bucket = os.environ.get('GCLOUD_ASSET_BUCKET')
STATIC_URL = f'https://storage.googleapis.com/{bucket}/static/' else:
STATIC_URL = '/static/'

The STATICFILES_DIRS tells Django where the static files of interest are located and is set to the React build path. which should resolve to static/build/static. It’s a bit convoluted, but it is what it is. Next, the STATIC_ROOT is quite straightforward and just tells Django where to collect the static files to. The important bit is to give Django the location of where to serve the static files from when deployed, as indicated by the IS_DEPLOYED variable. Here the GCLOUD_ASSET_BUCKET is set as an environment variable and is useful in separating staging from production buckets. The python-dotenv package can be used to store and read the environmental variables. Check django_react_k8s for more detail about where the env file should be under ./env/.env. Note the use of python-f-strings formatting, so at a minimum, Python 3.6 is needed.

The last modification that’s needed is to the base HTML file of the Django application, as follows.

{% load render_bundle from webpack_loader %}<!DOCTYPE html>
<html lang="en">
<body>
<div id="root"></div>
{% render_bundle 'vendor' %}
{% render_bundle 'main' %}
</body>
</html>

More of the available configuration can be found by reading the django-webpack-loader documentation.

Testing Setup

Testing the application in the local environment is quite straightforward. First, run yarn start to make sure it’s working where the default create-react-app page can be seen running on http://localhost:3000/. The next step is to run python manage.py runserver, which should run on the default port 8000. Navigating to http://localhost:8000/ should show the exact same page as the default create-react-app page.

Building and Deploying

In this section, there is overlap with the Django GKE guide, which goes into greater depth on how to deploy to Kubernetes. This section will focus more on making sure the docker image is built up correctly, ready to serve static assets once deployed to the cluster.

Docker

The first thing to look at is the Dockerfile, which is as follows:

FROM python:3.6.11-slim
ENV PYTHONUNBUFFERED 1
RUN mkdir /django_react_k8s
WORKDIR /django_react_k8s
RUN pip install --upgrade pip
COPY requirements.txt /django_react_k8s/
RUN pip install -r requirements.txt
COPY . /django_react_k8s/
CMD python manage.py runserver 0.0.0.0:8000

It’s a fairly standard configuration, so I won’t go into detail about it.

The next file is .dockerignorefile. This is important as it’s useful in keeping the docker image small and preventing sensitive data from going to the file.

... # standard excludes e.g. .git and such
# custom
*Dockerfile*
**/node_modules
static
!static/build/webpack-stats.json
collectstatic
# env

This includes common ignore patterns, as well as some project-specific ones. The main excludes are the node_modules, for obvious reasons.

The static folder is ignored, with the exception of the webpack-stats.json file as indicated by the exclamation mark. The webpack-stats.json is the only static file the docker image needs to contain. Everything else, including the location of the built static assets, is contained in the stats file as defined in the publicPath. This means each Docker container should in theory have a unique stats file if there was an npm build step in between. In practice, I usually also add the env file to the ignore file, as it contains sensitive data such as API keys and such, and then serve it as a Kubernetes secret or config map as needed.

Build scripts

The following is a basic example of a build and deploy pipeline.

The build phase steps are as follows:

  • Sanitize the React build folder using rm -rf static/build/*. This prevents old files from getting copied over during the collectstatic phase and keeps things lean.
  • Sanitize the Python environment as well by removing all the compiled files.
  • Run the npm build command, i.e., REACT_GCLOUD_BUCKET={get_bucket(env)} npm run — prefix static build. The bucket for the static files is provided as an environmental variable, i.e., REACT_GCLOUD_BUCKET. Since the command is run outside the static folder, the --prefix argument is needed to indicate the build source, i.e., location of the static files. This generates the static files under ./static/build/ and only the ./static/build/webpack-stats.json is copied into docker image.
  • The docker image is then built. which is a fairly standard process, i.e., docker build -t {get_docker_image(tag, env)} . .

The deploy phase is as follows:

  • Sanitize the static folder to prevent old files from being transferred to the server, i.e., rm -rf collectstatic/*.
  • Collect the static files usingpython manage.py collectstatic -i node_modules — noinput. It’s important to ignore the node_modules as it’s no longer required and adds significant redundancy to the bucket during the upload process.
  • The files are copied to the gcloud bucket using gsutil -m cp -Z -a public-read -r ./collectstatic/* gs://{get_bucket(env)}/static. This part differs from the prescribed method in the Django GKE guide, where the suggestion is to do an rsync, which only copies the changes, as opposed to a cp command which does a full copy and replace. Even though rsync works, cp is superior in this use case purely because of the -Z argument. According to the documentation, -Z applies gzip content encoding to file uploads. Not only does it apply gzip encoding, but whenever a user requests the files, Google is clever enough to return the correct file based on the content type provided by the client. Thus there is no longer any need to play with the webpack config file to generate gzipped assets, or mess around with the nginx (or similar server) config to serve gzipped assets — all this is done for you. A second point of difference is the -a public-read argument. This is not mentioned in the Django GKE guide, but it makes the individual objects you copy to the bucket publicly available while everything else that was in the bucket remains private.
  • Finally, the docker image is pushed to the bucket, i.e., docker push {get_docker_image(tag, env)} .

And that’s the build and deploy phase.

Kubernetes

This part will briefly highlight the Kubernetes deploy phase. As it has been covered in the Django GKE guide, there is not going to be a lot of detail about it. For simplicity sake, the example in this case will be a single pod and a service pointing to the pod, where the specification files are:

It’s quite straightforward and can be deployed to a Minikube cluster (see my previous article on Docker and Minikube for reference) or a GKE cluster, as highlighted in the Django GKE guide.

The files are in the sample repo for this article and can be deployed to the cluster by running the following command from the root folder:

kubectl apply -f k8/

If you are following the GKE guide and have deployed to the cluster, in order to see the deployed pod in action without creating a load balancer and such, run the following port forwarding command:

kubectl port-forward <name_of_pod> 8000:8000

And you should see the same result as in the development environment on http://localhost:8000/.

Conclusion

This article has been a good overview on one of the ways to deploy static assets in a Django and Kubernetes application. This strategy is not only tied to Kubernetes but also can be used in a bare-metals installation as well.

The main focus has been to separate the static content from the Django application context and thus create a separation of concerns where some of the responsibility is offloaded to the Google cloud platform. The platform has been optimized to deliver content so the application can focus on other traffic. There is no need to worry about cache invalidation as the static files are hashed. Thus, whenever a new deployment is rolled out containing the webpack-stats.json file, provided that the assets defined in the file exist in the bucket, the user will always get the new assets while the old assets will continue existing in case there needs to be a rollback on a deployment.

If you have any questions or anything needs clarification, you can book a time with me on https://mbele.io/mark

--

--

Mark Gituma
Mark Gituma

Written by Mark Gituma

Ask me anything or request a 10 minute video call on https://mbele.io/mark