MySQL Databases That Don’t Retain Data

A database’s purpose in life is to persist information, and making a database that can’t do this seems a little… weird. But sometimes we might want a database that is a known entity; think testing. Docker seems like a great tool to accomplish this, as containers are the same every time you start them by default.

However, MySQL Docker images are built with their prime directive in mind: they persist data using volumes. So how do we use database containers and control the information in them for testing purposes? There is more than one way to pet a cat, as nobody says, so we’ll look at some ways of controlling our databases and some of the trade-offs involved.

Also, there is an issue in the official Docker MySQl repo about whether there should be a volume by default that may change the calculus on some of this in the future.

My use case

We had a faculty member who sadly passed away unexpectedly. This faculty member had gathered some exceptional data on mongooses and stored it in a Drupal site that, to my knowledge, had gone defunct. I was given a SQL dump and a Drupal filesystem and asked to make some meaning out of it. For our current use, suffice it to say I decided to make a static image of the MySQL database to hold the data so I could put the site up in its original state wherever whenever. This is why some of the example files/images/containers are called ‘mongoose’.

Disclaimer:

A number of practices in this article, particularly setting database passwords and root access with ENV variables in our Dockerfile, are not appropriate for production. Please don’t publish your secrets accidentally because of this blog.

The Same for Everything

Some aspects of our Dockerfile will remain the same for the different methods. Mostly this will be the environment variables that create our database and base user for it. You can change these to literally anything you want your database/user/passwords to be (as long as it agrees with any application trying to access said database).

Also, in all cases, we will copy mongoose.sql, a SQL dump of the Drupal site, into the docker-entrypoint-initdb.d directory of our image. This directory exists for the purpose of bootstrapping a database. You can drop a .sql or .sh file in here and MySQL will load it on startup. It is the official way to bulk load data into a MySQL database on startup.

ENV MYSQL_USER=drupal \
MYSQL_PASSWORD=drupal \
MYSQL_DATABASE=drupal \
MYSQL_ROOT_PASSWORD='' \
MYSQL_ALLOW_EMPTY_PASSWORD=yes
COPY mongoose.sql /docker-entrypoint-initdb.d/mongoose.sql

Stop Persisting Data

The first thing we want to do is make sure MySQL isn’t persisting data from one session to the next. This will ensure that we won’t run into data that is sticking around from our last test run.

Option 1 — Change the datadir

MySQL’s Docker container by default writes its data to /var/lib/mysql and creates a volume of that directory on the host system. One way to keep our data from being persisted is to change the datadir directory.

We could do this by creating a custom .cnf file and loading it into our container, but there is a difference in where you put it depending on the version of MySQL you’re using, and creating a custom config file feels a bit heavy handed for what we want (though if you want to set other configs, like memory caps, this might be the option to go with). Instead, we can just pass the --datadir flag. We will do this using the CMD option in our Dockerfile. So a complete Dockerfile to do this might look like this:

FROM mysql:5
ENV MYSQL_USER=drupal \
MYSQL_PASSWORD=drupal \
MYSQL_DATABASE=drupal \
MYSQL_ROOT_PASSWORD='' \
MYSQL_ALLOW_EMPTY_PASSWORD=yes
COPY mongoose.sql /docker-entrypoint-initdb.d/mongoose.sql
RUN mkdir /var/lib/mysql-no-volume
CMD ["--datadir", "/var/lib/mysql-no-volume"]

This Dockerfile

  • Pulls a MySQL 5 image
  • Sets environment variables for our database name and users
  • Copies a sql file to the directory where MySQL will check to build a database
  • Creates an empty directory, then passes an argument to set the datadir to that folder. Remember, if a base image has an ENTRYPOINT command, using CMD in your Dockerfile will pass that as an argument to ENTRYPOINT.

Problem

The only real problem with this method is that MySQL’s base Dockerfile still creates a volume, we just never write to it. So unless we’re careful, we might end up creating a bunch of empty volumes on our host system (if our docker-compose file or run commands don’t specify which volume to use, Docker will create a new unnamed volume).

We can mitigate this by remembering to call docker-compose down -v where the -v tells docker-compose to remove volumes when you remove the container. Or even better make a script or CI to do this automatically. We could also make this into a named volume in our docker-compose.yml so it doesn’t create a new volume every time, it just uses the same one but never writes/reads to it. But if we’re removing or referencing an empty volume very time, we could perhaps just as well remove a filled volume every time (though the consequences of failing to remove an empty volume vs. one that contains a gig of database data is not insignificant). This dangling volume problem is not huge, but it kind of bugs me.

Option 2 — Bootstrap from the original MySQL Dockerfile

The Dockerfiles for the official MySQL images are available for use under a GNU Public License (there is also a more robust Dockerfile at the official Docker repo, but we’ll look at the one from the MySQL repo because it’s shorter/easier to understand; we would follow the exact same steps with either file). So maybe we can remix this just a touch to do what we want.

I have a fork of the MySQL repo with the larger Dockerfile edited a touch to make it work like below if you want to look at that. If you use either of these and drop in your .sql file it should build correctly.

We can fork the MySQL project and edit the Dockerfile (in this case, the file for MySQL 5.7). We have to make fairly minimal alterations to get this to be the static image we want. First, we add the same environment variables from before.

The fix to our dangling volume problem from before is to comment out (or delete) the VOLUME instruction. By getting rid of this command, we don’t even have to worry about changing our datadir because MySQL can write to /var/lib/mysql all it wants, anything written there will not be persisted to our host system.

Finally, we copy our .sql dump into docker-entrypoint-initdb.d to initialize our database when the container starts.

These three alterations give us an image that can hold a MySQL database that will be the same every time you start the container.

At this point, we run docker build -t elylibrary/mongoose:mysql-bootstrapped . from within the 5.7 directory in our forked repo to build an image from our Dockerfile (and you would tag it as whatever you like).

Now we can run our image and use it for our given purposes. For example, if I wanted to combine my MySQL database with the Drupal site it went with, I might use the super-basic docker-compose.yml below to start both containers and link them up (note: again, not production), then I can browse the Drupal site at localhost:8000 (and it can communicate with our MySQL database at localhost:3306).

I prefer this approach; I find it cleaner, even if we do have to take a step back from ‘just pull the image and add to it’ style of Docker-ing; it also might mean you don’t necessarily get security updates automatically.

Startup Time

This is enough to get us running and have a database that will be the same every time we start it up. There is one final step that we can take to cut down on the database startup time; in my super scientific measurements (read: I’m perfectly willing to admit I may be full of crap on this), this reduced startup time of the prefilled MySQL container from ~21 seconds to ~9 seconds. The tradeoff here is that you have to use the commit command, which takes a snapshot of your container at a point in time and makes an image from it. No longer is your Dockerfile the source of truth, meaning this step has to be kept in documentation rather than the executable Dockerfile. YMMV.

Once your bootstrapped container is running, you don’t need the SQL file anymore.

docker container exec container_name rm /docker-entrypoint-initdb.d/mongoose.sql

This command spawns a subprocess in your container and removes your SQL file so it won’t bootstrap on startup.

docker commit container_name elylibrary/mongoose:mysql-prefilled

This command ‘commits’ our container, so all the files in container_name will be written to an image with the tag elylibrary/mongoose:mysql-prefilled or whatever you choose to call it.

Your new container should get into a usable state quicker, but again, YMMV. In my case the committed image size was twice as large as the one built by the Dockerfile (264 v. 528 megs), so take that into consideration.

You can also use the commit command if you run migrations from an application and want to save just those migrations. In that case, you would probably just start up a MySQL container (maybe a forked one with no volumes or .sql files), connect to it, run your migrations, and commit the container. This should give you basically the same end result.

tl;dr

  • To create a reliable MySQL that does not persist data, either clone/fork the MySQL repo or use my starter.
  • Set the ENV variables, COPY your .sql dump, remove any VOLUME statements
  • docker build -t what-you-want/to-tag:your-image .
  • Optional: docker run --name mymysql -p 3306: 3306 what-you-want/to-tag:your-image -> docker container exec mymysql rm /docker-entrypoint-initdb.d/your_sql_dump.sql -> docker commit mymysql tag-of/your-new:image