Patterns for Continuous Integration with Docker on Travis CI

Part 3 of 3: Python tools for tagging & testing

Jamie Hewland
MobileForGood
8 min readNov 28, 2017

--

In the third and final installment of this blog series we will look at two tools we’ve created at Praekelt.org for working with Docker images in a continuous integration pipeline. Neither of these tools is strictly specific to Travis CI, but that is where we have used them and they form critical parts of of our CI pipeline.

Before reading further, be sure to check out part 1 of the series for the basics of using Docker with Travis CI and part 2 where we describe the “Docker repo” pattern.

docker-ci-deploy (a.k.a. DCD)

The first of the two tools we’ll look at is called docker-ci-deploy, or “DCD” for short. DCD is a small Python script that can tag Docker images and push them to a registry.

We hinted at this tool all the way back in part 1 as a solution to adding version information to image tags. In part 2, it starts to become clear why such a tool is necessary — tagging images becomes complicated easily. In the versioned Docker image example in part 2, the deploy section of the Travis file started to get a bit complicated:

This is how one can achieve the same result using DCD:

That’s much cleaner — DCD runs all the docker tag and docker push commands for us!

This is still quite a simple example, but the power of DCD becomes more obvious when we do some more complicated versioning. The documentation for the official Docker images maintained by the Docker team describes how images should be tagged with version information:

For example, given currently supported XYZ Software versions of 2.3.7 and 2.2.4, suggested aliases would be Tags: 2.3.7, 2.3, 2, latest and Tags: 2.2.4, 2.2, respectively.

Docker tags work a bit like Git tags — they are just pointers, in this case pointing to specific Docker images. They can also be updated to point to different images. This is perhaps best shown using a diagram:

Docker tags as new versions are released: 1.2.3 (1), 1.2.4 (2), and 1.3.0 (3)

In the diagram, the tags with a complete version, e.g. 1.2.3, always point to the same image, but the less precise tags like 1.2, 1, and latest are updated to point to the latest version in their series. This means that if the project practices proper semantic versioning, you can run a command like:

> $ docker pull acme-corp/cake-service:1.2

…and always get the latest 1.2.x version with the latest bugfixes but (hopefully) none of the bigger, riskier changes that might have been added in 1.3.0.

docker-ci-deploy was built with this in mind and includes various different configuration options to achieve these kinds of versioning patterns.

The best way to try this out is to use DCD’s dry-run functionality, which prints out the docker commands that will be run rather than executing them. For example:

> $ image=acme-corp/cake-service
> $ dcd --dry-run --version 1.2.3 --version-semver $image
docker tag acme-corp/cake-service acme-corp/cake-service:1.2.3
docker tag acme-corp/cake-service acme-corp/cake-service:1.2
docker tag acme-corp/cake-service acme-corp/cake-service:1
docker push acme-corp/cake-service:1.2.3
docker push acme-corp/cake-service:1.2
docker push acme-corp/cake-service:1

That’s about it for docker-ci-deploy! It’s very simple but we’ve found it to be very useful. Check out the project repo for full documentation.

Seaworthy 🌊 🚢

The second tool is a very new piece of software we’ve only started using recently but we’re quite excited about its potential. Seaworthy is a test harness for Docker containers. In other words, it allows you to write Python tests that make use of Docker containers. This makes it possible to test the Docker images that you are building in your CI pipeline.

There are several properties of your Docker image that you might (should) want to verify, such as:

  • Startup/shutdown: Does the container perform the tasks it needs to on startup? Does it shut down cleanly when receiving a signal?
  • Interaction points: Can the interaction points (e.g. database connections, volumes, etc.) be configured and do they work? What happens if these other services are unavailable?
  • Logging: Does the container log to the correct file descriptors (stdout/stderr) or files? Is the log format as expected?
  • Configuration: Can the application be configured using environment variables or other container-friendly methods? Do configuration changes take effect as expected?
  • Permissions: Are the expected processes run in the container as the correct user/group? Are file permissions are correct?

These are just some areas that can be tested with Seaworthy. We’re looking forward to new uses that people find.

I will introduce how to use Seaworthy with a complete example. At Praekelt.org, we are big users of Django — a popular Python web framework. The typical infrastructure requirements for a simple Django project would involve a reverse proxy (usually Nginx) and a database (usually PostgreSQL). My talk from PyConZA 2017 explored this architecture in detail, but the following is a simplified diagram of such a setup:

A basic Django deployment

It’s not important to understand all the details of this — I’m just using it as an example to show off Seaworthy. There are 3 different containers. The Nginx container (1) receives incoming HTTP requests and proxies them to the web application container (2). The web application container in the centre contains the actual application: Django served using Gunicorn. This Django application needs a database to store its data in, and that’s where the final container comes in — the PostgreSQL database container (3).

There are two Docker volumes that are shared between the Nginx and web application containers. The “socket volume” contains Gunicorn’s Unix socket. Nginx initiates HTTP requests to Gunicorn via this socket. The “static files” volume is used to share Django’s static files with Nginx, which Nginx can serve to users efficiently.

All the code for this would be far too much to share in a blog post so we’re just going to pick out a few interesting snippets from a larger repository linked below.

Docker resource definitions

The first step in using Seaworthy is to define the containers and other Docker resources that will be used in your tests. We do this using what Seaworthy calls “definitions”. Definitions provide a way to define everything about a Docker resource before that resource is actually created in Docker. We don’t have time to go through all the definitions for all the containers and volumes, so we’ll only cover the web application container definition here.

The DjangoContainer class

Above, we subclass ContainerDefinition, and in the constructor, __init__(), we pass the name of the container, the Docker image, and “WAIT_PATTERNS” to the super class. WAIT_PATTERNS is a list of regular expression patterns that will be matched against the container log output to determine when the container has fully started.

The base_kwargs method is overridden. This method provides a way to adjust the parameters that the container is created with. In this case, we configure the volume mounts for the two volumes and configure the database connection using an environment variable.

Finally, we need to configure the test framework we are using to use our definition. For this example we will use pytest. Seaworthy doesn’t have to be used with pytest but it includes a few useful integrations that make pytest a good choice.

We create two VolumeDefinition instances named socket and static. We create an instance of DjangoContainer named django and pass in the two volumes, as well as the database connection URL (which happens to come from a PostgreSQLContainer instance).

Next, we use the pytest_fixture method to create — you guessed it — pytest fixtures. These are named socket_volume, static_volume, and django_container. This means that if we write a test that, for example, takes a parameter called django_container, pytest will automatically pass in our DjangoContainer instance— all prepared so that the container will actually be created and running in Docker. Finally, we define the other Docker resources that the container needs to run using the dependencies keyword argument.

We’ll need a number of fixtures to run our tests with a number of dependencies between them:

A graph of all the pytest fixtures and their dependencies

Writing tests

Now that we have our fixtures, let’s test them!

This test takes a parameter called nginx_container which pytest will fill with an Nginx container definition from a pytest fixture. If you recall the architecture diagram from earlier, the way we make HTTP requests to the web application is via Nginx. There’s a handy method in ContainerDefinition called http_client() which returns a Requests-based HTTP client that can be used to make requests against the container via a published port.

Here, we request the Django Admin page at the /admin path. We check that we get roughly what we expect in response. These are very basic assertions but this actually hides a lot of complexity. In order to make our request, Nginx communicated with Django via the socket volume. Also, Django needed a database in order to start up and serve the request, and it needed to look for static files to link to in the static volume. So we actually touched on all the Docker resources in the entire architecture making this request!

Debugging

With all these moving parts, it’s quite common for things to go wrong in unexpected ways. For example, when I wrote the above test for the first time it failed:

What went wrong? Well, the best way to find out was to drop into a Python debugger shell, which can be done easily by passing the --pdb option to pytest.

The test failed and pytest gave me a Python debugger shell. The first thing I did was sanity check that I still got a 400 status code — and, yes, I did. Next, I tried to see what was actually in the response’s text and was greeted with a giant chunk of HTML. Unformatted HTML is hard to read, so I decided to try view the page in my web browser by getting the address of the Nginx container’s published port using the get_first_host_port() method.

Accessing http://0.0.0.0:32772/admin with my browser got me:

A Django debug page telling us we’re not allowed here

The exception message we’re given is Invalid HTTP_HOST header: ‘0.0.0.0:32772’. You may need to add ‘0.0.0.0’ to ALLOWED_HOSTS.. Reading up on the ALLOWED_HOSTS setting, we see that Django will, by default, not allow us to access it from the address 0.0.0.0.

We can fix this by updating our Django settings to fetch the ALLOWED_HOSTS setting from an environment variable, as shown in this pull request. After making the changes, our test passes!

Conclusion

We’ve demoed docker-ci-deploy and given a sneak peak of some of Seaworthy’s functionality. These tools are open source and we hope that others find them helpful for tagging and testing their Docker images.

This brings us to the end of this blog series. Once again, be sure to check out part 1 and part 2 and thank you for reading!

--

--