Docker, file permissions, and line endings

Here at mPort, we have a multi-platform technology team (Windows, macOS, Linux). This brings interesting challenges when collaborating on the same components. Docker is an obvious choice to mitigate those challenges, but it introduces a few more corkers along the way.

This post is about the journey of us trying to fix a problem with file permissions on the CI server leading to having to solve problems with line-endings in text files.

Here’s a quick snapshot of the relevant tools involved:

Build cleanup failing

The Continuous Integration build of the static portion of the mPort web site generates static assets (html, js, and css files). It also runs some acceptance tests and those tests generate results.

The build step and the testing step are each run in Docker containers to encapsulate as much of the dependent state of the infrastructure as possible (for reproducibility). It also means that the output is the same, whether run on a Windows or macOS host (by developers) or a Linux host (by the CI system).

In order to package up the assets as an artifact for the Continuous Delivery pipeline and to share the results of the tests with human viewers later, these files are created on a volume in the Docker container mounted from a path on the host (e.g. --volume $PWD:/usr/src).

The Docker service runs as the root user. The user inside the Docker container is (unless changed) also root. Any files created in the container on volumes mounted on the host are thus owned by root. This is based on the uid of the user both on the host and in the container having a value of 0.

Now my problem was that Jenkins is running as user jenkins, and it is coordinating the docker-compose of the Node.js container running the npm build and the Selenium and Node.js containers running the acceptance tests. All the files created by those containers were appearing in the Jenkins job working directory as owned by root and only readable by members of the root group and any other users.

This is fine up until the point Jenkins starts attempting to clean up old builds and artifacts. It can’t, because the file ownership and permissions don’t allow the jenkins user to delete these files.

Changing ownership

It is certainly possible to make a Docker image where a different user is added and becomes the default user inside containers based on that image. Making the uid of that user match up to the equivalent user on the host, though, is not possible to do a priori. One can, upon starting a container each time, include steps to add a user with a specific uid (matching one on the host). However, there is a different way, and it relies on the ownership of the working directory.

When I look at the contents of this directory on the host, I see something like this:

/var/lib/jenkins/workspace/www$ ls -lsa
total 999
4 drwxr-xr-x 12 jenkins jenkins 4096 Feb 23 06:20 .
4 drwxr-xr-x 7 jenkins jenkins 4096 Feb 23 06:22 ..
4 drwxr-xr-x 8 jenkins jenkins 4096 Feb 23 06:20 .git
4 -rw-r--r-- 1 jenkins jenkins 1759 Feb 23 06:20 README

When I look at the contents of the same directory mounted as volume in a Docker container, I see something like this:

usr/src# ls -lsa
total 999
4 drwxr-xr-x 12 106 111 4096 Feb 23 06:20 .
4 drwxr-xr-x 27 root root 4096 Jan 18 01:01 ..
4 drwxr-xr-x 8 106 111 4096 Feb 23 06:20 .git
4 -rw-r--r-- 1 106 111 1759 Feb 23 06:20 README

On my host, the jenkins user happens to have uid of 106 (and gid of 111). In the Docker container, there is no user with uid of 106 but that information is still valid.

Note that the jenkins user owns the job’s working directory. That means within the Docker container, we know who should own any new files. If the last thing we do before leaving the container is to change ownership, by the time the jenkins user on the host wants to delete them, it already owns them.

This is accomplished by adding the following to the commands executed inside the container:

find . -not -uid $(stat -c "%u" .) -exec chown --reference=. {} \;

This command looks for any files under the current directory not owned by the owner of the current directory (jenkins on the host) and changes the file ownership to the same as the ownership of the current directory (including, by the way, the group value).

Maintaining status

This is great, but the exit status of that command is now the exit status of the command running the container. We actually want to preserve the exit status of the “real” work the container was doing, like the build command or the test runner.

To achieve this, we add a line before and after the change in file ownership:

# the command doing the real work has an exit status here
status=$? # save it
find . -not -uid $(stat -c "%u" .) -exec chown --reference=. {} \;
exit $status # exit with the saved status value

Because those extra commands now make the arguments to the Docker run (or in our case, the command: value in our docker-compose.yml file) particularly gnarly to the parsers involved (the $? is quite problematic), I encapsulated them and the “real” work in shell script files. These script files are within the directory that is mounted as a volume inside the container, and we simply invoke those files while in the container.

Line-ending havoc

Very soon, the Docker containers started having problems on the Windows hosts, with “file not found” errors being reported by the shell in the container when running the script files. Even more oddly, one of the developers only started experiencing issues after restarting Windows (usually the universal method of fixing all problems).

Worked fine on macOS hosts. Worked fine on Linux hosts. It’s a Docker container! Why is it behaving differently on a Windows host?!

The file was clearly there. When running the container interactively, I could see it in a listing. I could vi it. All the lines in the script file could be performed individually.

The carriage returned

Then I did something that made me think there must be \r characters at the end of lines in those files. What, how did they get there?

Remember, these files are on a volume in the container mounted from the host. The host being Windows means text files will, by default, have CRLF (\r\n) line endings rather than Unix’s LF (\n) line endings.

But these files hadn’t been created on a Windows host, they hadn’t even been edited on the Windows hosts. They were created on OS/X, pushed the the repository, and pulled onto the Windows hosts. Oh…

git magic

So as to keep a platform-centric view, git will automatically determine what to do with line endings in text files. This is normally no problem at all. If I create a text file on Unix, push it to the repository, and you pull it on Windows, the extra line-ending character is added in your working copy. If you edit the file and push it, they’ll be there, but when I pull it, my working copy will remove them. This can lead to some pretty crappy change sets, but whatever.

The main problem is that those shell script files weren’t being opened by a Windows process. They were being opened by the shell process on Linux inside the Docker container. The Linux processes expect text files to only have \n line endings. The \r character is not special and is parsed like any other character.

Fortunately, git can be told how to deal with text files, overriding its platform-centric defaults. We could have just used this override on text files with names ending in “.sh”, but we decided to do it for all text files and avoid confusion. Most editors on Windows know what to do with text files that have Unix-style line endings, and certainly the ones the team use are perfectly happy working with them. The Unix emulators on Windows are obviously happy with them. Even powershell is fine with them.

The .gitattributes file in that repository now has the following directive:

# Treat all text files without a CR (as we have Docker)
* text eol=lf

That was a fun few hours!