Modern Django Settings

The Gorleston Psalter: detail of a marginal scene of a man plowing with oxen, with a butterfly above (http://blogs.bl.uk/digitisedmanuscripts/2012/10/more-gorleston-psalter-virility-profane-images-in-a-sacred-space.html)

There is an abundance of advice on Django settings on the web, but in my opinion, not always correct. I’d like to discuss what I think is an acceptable modern way to do settings. Firstly, let’s look at assumptions for the kind of apps I’m talking about:

  • Commercial software that aims to achieve security, scalable performance, manageability, etc.
  • Containers (like Docker/Kubernetes)
  • Microservices

If you are doing a small app that does not have such requirements, you might use a simpler settings scheme than the one I suggest below. But how many times does it happen that you start a small project and it becomes a big project and you wish you’d used big project practices?

Fundamentals of settings

Two fundamental rules for Django settings::

  • Load customisations from the environment
  • Reduce complexity: use simple assignments without dependencies or side effects

You should stick to those two rules under all circumstances. Furthermore, I’ll express an opinion about two other things:

  • Use a deployment identifier, a string to identify your deployment
  • Use a key/value store for some configuration

I’ll get to each of these in turn.

Load from the environment

There is a pattern that many Django developers recommend where you have multiple Python settings files depending on whether your deployment is development, test, production, etc. Don’t do this. Load variables that are different depending on the deployment via the environment. For clarity, that means this:

SITE_URL = os.environ.get('SITE_URL')

If you find it difficult for some reason to set environment variables, use an environment file and load that. There are some excellent PyPi modules that will do this for you (like https://github.com/theskumar/python-dotenv).

The multiple python file pattern has been around a long time and is very persistent. To be avoided. Check https://12factor.net/config if you want an independent authority. Normally, in a significant project, your devops team will set the environment for you with deployment automation scripts.

Use simple assignments

Your settings module needs to load in every case via from django.conf import settings in numerous circumstances that require it to not have side-effects or dependencies. Don't try to get things from AWS for instance (your AWS secret key is part of settings!), contact other remote 3rd party services, create directories, etc. Just assign variables. Nothing else (with one potential exception mentioned below).

When working in a team or with multiple teams that depend on the settings file, every developer needs to be confident that nothing crazy is happening while loading settings. You should not have to be familiar with every line of the settings module. Humans should be able to follow the logic of how settings are defined. It is better to not modify manage.py because that is going to result in unpredictability for the rest of the team or developers new to the team.

You might use a settings module like this:

BASE_DIR/mysite/settings/
__init__.py # most settings are here
logging.py # logging settings
celery.py # celery settings

Logging and celery settings are imported in __init__.py. Split it out further depending on what you think you require to stay sane. Note, these are not alternative settings. We are just splitting things out to make management of settings easier.

Use a deployment identifier

Use an environment variable to identify the deployment. There are two axes to building this string:

  • deployment level: local, test, development, staging, production
  • deployment range: application for customer A, application for customer B, etc.

Generally, you can have many local environments but they are isolated from one another. You mostly have only one devel, one test, one stage, etc. But you can have any number of services if, for instance, you achieve multi-tenancy via deployment of different processes on different nodes. Let's look at an example:

  • A: Inventory service
  • B: Product ordering service
  • C: Purchasing interface

You might have one of A and B but several instances of C for different customers. Your deployment ids will look like this for your devel deployment:

inventory-devel-0001 
product-devel-0001
purchase-devel-0001
purchase-devel-0002
purchase-devel-0003

or

inventory-prod-0001 
product-prod-0001
etc.

for production.

We follow this pattern:

<app>-<level>-<range>

Where:

  • app: name of the application or service, like inventory, product, purchase, etc.
  • level: local, devel, stage, prod
  • range: incremented number

You can use any string but I’d suggest a few rules:

  • Embed a unique id of the application, deployment level and deployment range
  • Use a delimited string (‘-’ in the example above)
  • Sluggable: make it something that can be used in a url without urlencoding or an email address

You might use the Kubernetes namespace for this purpose. However you do it, every service instance has a unique id. You might use this for a variety of purposes:

  • Looking up service urls
  • Namespace for AMQP queues
  • Reporting events in logs
  • Namespace for file storage in your cloud file system
  • etc.

Configuration service

This brings us to key/value stores that can be used as a configuration service. This is software that stores pieces of data against a key. Usually, they will store either strings and numbers or possibly just strings. You might decide you only ever store JSON strings. There are many key/value stores as open source projects. But we are interested in those with specific qualities, like simplicity, read performance, persistent storage (not just in-memory caching of values), reliability (clustering), hierarchical keys. Here are some well-known examples:

etcd is massively popular. Consul can be paired with Vault, another Hashicorp project that can use Consul as a datastore for encrypted information. Apache Zookeeper is popular with Java projects.

This might appear to some to conflict with the Django dictum to define settings all in one place. But there are good reasons for using a key/value store as a configuration management service:

  • Keep information out of code that does not belong there (including out of helm charts or deployment automation scripts)
  • Update configuration without redeployments
  • Autodiscovery: let services register availability and endpoints dynamically

There are other services like Redis, Riak, etc. that are key/value stores. But the above are specifically designed to support the use-case of a configuration service. You might keep urls for REST services for all your external integrations in the configuration store. Those annoying endpoints that change all the time won’t require redeploying your service. You really do not want to keep these in the database. The database, for one thing, is only available once settings have been loaded. Configuration information should not be mixed with application information.

Consul is configured via the environment. Therefore, you can call it during loading of settings. Yes, this conflicts with the rule to not call remote services while loading settings. But this does work and is sometimes practical as long as you remember that if you are loading a Django setting from the configuration service, you might lose the ability to change that setting without at least restarting the process (uwsgi, gunicorn, etc.) depending on when you load it. Some configuration is better retrieved just-in-time (bearing in mind iterative semantics).

Don’t forget that settings are loaded when you run ./manage.py and therefore the configuration service would need to be available at that time. Calling the configuration service during unit tests is not desirable and those settings should not be variable in that case anyway. So, you might want to wrap calls to the configuration service to return static dict values while unit tests are running. You might also want to wrap it to store JSON and decode when you read values. Consul is very easy to use (etcd is very similar):

curl http://localhost:8500/v1/kv/product-devel-001/AWS_ACCESS_KEY
[ { 
"LockIndex": 1,
"Session": "1c3f5836-4df4-0e26-6697-90dcce78acd9",
"Value": "HVPRXHALTNVZQTJYERCM",
"Flags": 0,
"Key": "product-devel-0001/AWS_ACCESS_KEY",
"CreateIndex": 13,
"ModifyIndex": 19
} ]

This is an example of a hierarchical key (like with “folders”). Consul is perfect as a registry of service urls for your own services and external APIs. Other variables are less useful in the configuration service. You might not store the DEBUG variable in Consul because you would still need to restart your uwsgi worker or recreate your k8s pod if you change that.

Such services also have web UIs for easy browsing and editing of values. You may or may not wish to have open to the world (think about that from a security point of view).

You should not change any Django framework setting after it is assigned. Remember why we use upper case as a convention in Python for constant definitions? Clearly, however, there are other variables, like external service urls that might change as per the above examples.

Antipatterns

For an overview of different techniques, see https://code.djangoproject.com/wiki/SplitSettings. Most of these are not what I would do. They are too complex in some cases. They also often conflict with the idea that one team (devops) does deployment automation and another team does development (developers). Generally, devops owns your environment and you (the developer) own the Python code. Even if you start as a single developer on a project, it is wise to construct your project as though it could be managed by a full team with differentiated roles.

My opinion is that settings should be simple and easy to understand regardless of project sophistication. The following are anti-patterns for me:

Testing if a setting is available:

if hasattr(settings, "SOMEVARIABLE"):
....

Conditional loading:

try: 
from settings.local import *
except ImportError:
pass

Any reference to local() and global()

locals()[setting] = getattr(config_module, setting)

I’m not saying such techniques can never be justified under any circumstances. But if you are starting a project and looking for a settings pattern, keep things as simple as possible for as long as possible.

Summary

In summary:

  • Customise via the environment.
  • Use simple assignments without dependencies or side effects
  • Identify the deployment: deployment identifier string.
  • Use a configuration management service.

Settings needs to be a linear set of one-off assignments to variables and nothing else. It’s the first thing that happens when a runtime or manage.py is executed. Don't try to refer to any other part of the project within settings. Avoid exotic solutions that make it hard to figure out what value a setting might have.

Resist the temptation to modify manage.py to do something that is unexpected. Any modification from the default that Django configures for you is probably bad.