Paving the Way, part 2 — Docker all the things!

Published in

Aller Media Tech blog

6 min readOct 16, 2018

In these first installments of Paving the Way we’ll focus on the basics on which the whole modernization is built upon. Since what we want to build is repeatable, scalable and reliable infrastructure, the first thing we need to do is to make sure whatever we are building runs in Docker containers, as this is the ground-level requirement for modernization.

The aim of this article is to describe our decision making process, what direction we decided to go and sprinkle some Docker best practices on top. Enjoy.

Read the previous part here: Paving the Way, part 1 — Getting started with modernization

Why use Docker?

If you aren’t aware of the benefits Docker provides, here is a recap: Docker containers allows us to build things that run in isolated environments and behave the exact same way regardless where they are run, be it Linux server in the cloud or Windows laptop on your desk. It’s a way to standardize the environment.

For production this allows for your deployments to become immutable and repeatable, so that they behave the same way regardless on where they are running or when they were started. In practice, this means that recovering any misbehaving application instances becomes a simple action of throwing away the erroring container and spinning up a new one, scaling is just adding or removing containers and deployments & rollbacks are just changing the container specification that you want to run, automation takes care of the rest. As a bonus, since containers are isolated from each other, it doesn’t matter where you run them, in the same server, bunch of different servers or combination of both. For example, we ended up running in environment where servers come and go dynamically according utilization. On top of that, we are currently working on a setup where automation actively replaces servers with more cost effective options, while containers themselves flow seamlessly between servers and this is only possible with containers. More on that in the future, though.

In development, containers allow you to run the same production setup in your local machine down to version numbers of native libraries, which (mostly) removes the oh-so-annoying “Works On My Computer” -issues since everyone is running their development code in standardized environment. It also means that on-boarding new developers becomes fast and simple since you can use docker-compose to describe and create the whole development environment. For example, we have a rule that you should be able to start every local setup (databases et al. included), with just running docker-compose up (though some level of configuration may apply, fe. deciding which ports to use for HTTP traffic).

Registry

For storing the Docker images, we ended up using Docker Cloud because it is the default repository in the Docker cli, meaning you could just do docker push allermedia/my-fancy-project:v1.0.1 instead of having to adding the whole URI of the repository every time, which introduces more possibilities for human errors. At the point of selection we also weren’t exactly sure which cloud provider we eventually end up with, so Docker Cloud allowed us to keep the application images separate from platform providers.

Now however, that we are squarely in bed with AWS and all Docker related tasks are done in the CI, choosing another repository (such as AWS ECR) would make sense, especially since it doesn’t require any extra configuration when running in AWS services.

Dockerus Containeriosa!

Creating the Docker magic where your application runs in a container is actually quite simple, all the spell needs is a Dockerfile. However, there are few rules you should follow in order to get the full benefits down the line.

One process per container. This allows for proper separation of concerns, isolating errors and making logging, troubleshooting and scaling easier. For example in the case of Wordpress running on php-fpm, you would have two separate containers, one running nginx and one running php-fpm.

Must not store state internally. Since in the proper environment, containers (and even the servers they run on) come and go as scaling and errors happen containers must always come up to the same state as already running containers are in. Optimally, this applies to the state of the dependencies as well, fe. database priming, which are needed for things like onboarding new developer or spinning up a new stack for feature branch.

Application must be configurable through environment variables. As Docker images are essentially locked in time, we can’t change any configuration files after it has been built. However, since we can still change the environment variables the container runs with, any configuration variables in the image must be overridable through environment variables. For example, in Node.js applications we use nconf configuration management for this purpose.

Dockerfile

Dockerfile is the basics of making anything run in Docker — It describes what should be done (files copied over, libraries installed) in order to build a container for your application to run in.

Build from specific image tag. Dockerfile’s FROM attribute should always point to a specific image tag, as if no tag is provided Docker defaults to latest tag, which should be pointing to latest build and as such can change between two consecutive builds & result to different outcomes. (FROM node vs. FROM node:8.9.3).

Remember the cache. One thing to remember is build caches — Essentially any RUN or COPY/ADD command creates a cache image, which can be used to speed up subsequent builds. For example, calling RUN npm install before copying any application code creates a cache image which, when only application files are changed, Docker can then use in the next build, speeding up the process dramatically.

Example of a Dockerfile for Node.js application:

# Use alpine base-image for smaller container footprint
FROM node:8.9.3-alpine# Create application directory and set it as cwd
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app# Copy package.json(s) over and run install
COPY package*.json ./
RUN npm install# Copy rest of the application code over
COPY . .# Run npm start script
CMD ["npm", "start"]

Docker-compose

Whereas Dockerfile describes a single container, docker-compose describes whole system the container(s) runs in — Load balancers, databases etc. Depending where you want to run your system, you could use docker-compose for the production setup as well, however we use it only for development purposes, as everything happening in AWS is defined in CloudFormation which covers far wider scale of things.

The docker-compose setup itself is split in two different files: docker-compose.yml and docker-compose.override.yml.

docker-compose.yml

docker-compose.yml is used for defining a working system with sensible default configurations, this might include things like databases or worker containers.

Don’t expose ports. Default configuration should however exclude exposing any ports as they can’t be overwritten in the later stage (as separate port lists are merged, not overwritten) and may result on port reservation conflicts on development environments.

Example docker-compose.yml for Node.js server:

version: "3"
services:
  app:
    build:
      context: "."
      dockerfile: "Dockerfile"
    volumes:
      - "/usr/src/app/node_modules" # Set node_modules data volume
    environment:
      - "cache:enabled=false" # Set dev configuration variables
    command: "npm run dev" # Override default CMD

Note that there is data volume defined for node_modules directory, which prevents host mapping of the source files (in docker-compose.override.yml) overwriting modules installed into the container itself. This is to ensure any native modules, compiled for Docker environment, are retained through host mapping. (Though this creates bit annoying side-effect that the state your local node_modules is not mirrored to the container, though more on this later.)

docker-compose.override.yml

docker-compose.override.yml is used for configuring the docker-compose setup, such as exposing ports suitable for your local environment and overriding container config variables through environment variables.

Add to .gitignore. docker-compose.override.yml itself should be in git ignore as it essentially contains environment specific configurations (fe. ports for any local environment). Our repositories always have an example configuration provided in docker-compose.override-example.yml file.

Set host volume mapping. As Virtualbox setups (Docker Toolbox) have different host drive names for host volume mappings, if there is a possibility that someone needs to use Virtualbox, the host volume mappings should only be done in docker-compose.override.yml so that it can be environment specific.

Example docker-compose.override-example.yml file:

# Docker-compose overrides for local development setup
#
# To enable, copy the file into docker-compose.override.yml
version: "3"
services:
  app:
    volumes:
      - ".:/usr/src/app" # Map source volume from host
    ports:
      - "8080:80" # Expose to system port 8080

Doned. Now what?

Once all of this is done you should now have standardized development setup that runs the same on every computer, has single command spin-up (docker-compose up) and is fully configurable to suit any specific development machine. More importantly, you now have the foundation of running the application in most of the modern infrastructures in fully automated fashion.

Next part of the series will be focused around CI automation which powers all builds for the new infrastructure.

Next part: Paving the Way, part 3 — Automating tests and builds

— By Mikko Tikkanen, Technology Lead at Aller Media Finland