The gotchas in docker!
I love it. It has no doubt made my life a 1000 times easier. Want to test something out quickly in a linux machine? Spin up a container. Want to ship something out to a peer? Pack it as an image. Want to automate everything on your local? You know the answer.
So far my experience with docker has been all positive. When I say docker I literally mean docker. I have a huge respect for other containerisation engines like containerd. In fact, it is a goal for this to go full blown containerd in at least one project.
Coming back to docker — I do feel there are a few things that shouldn’t have been left unsaid. Not every dev/devops expects things to be happening in a peculiar way. The points I’m going to mention next are something that hasn’t just been noticed by me, but several other peers I interacted with:
- Volumes — “Your user(id) is my user(id)”
When mounting volumes in docker it had been my expectation that any files I mount will be owned by the user I start the container with. Okay I admit that’s an imperfect expectation. If that were the case, it would basically violated the permissions in linux systems. Users with the right to spin up docker containers will then be able to mount files they shouldn’t have access to and therefore read them inside containers. Understood.
But it required a little more poking around to realise the mounting happens based on the user id of the host, not the user of the container. That is, if you mount a file owned by the user_id 123 on the host machine, the owner of that files continues to be user_id 123, even though you might not have any such user in the container. Completely logical! But hard to catch! To add to this, the behaviour is dependent on the host OS. On a Mac, I never face such ownership related issues, but on linux, it’s quite visible. - Disk utilisation — “What’s up with the space?”
This has been an interesting finding. If you have several containers running and check the disk utilisation of your host, you’ll realise that the overlay mounts occupy a whole lot of space. This continues to be the case even when your containers might not be generating as much data.
When we dug deeper into the same, it became clear that this has nothing to do with amount of data generated by the containers.
With the recently introduction of the overlay2 driver, docker leverages layers and mounts to track how the deviation of a container from its base image. I’ll avoid diving into the finer details as this discussion will take a totally different turn. In a nutshell, the overlay mounts are nothing but a way to track the diff of the containers from a shared base image. The mounts themselves occupy zero space. Executingdf -h
gives an illusion as if the mounts are occupying way too much space. The best solution we feel to avoid this illusion is to leveragedocker system -h
— gives a much better picture of disk utilisation by docker.
There are a few other areas I feel things are quite surprising, but very minor ones. While I understand the issues faced by us aren’t really issues, it was quite hard to wrap our heads around the same. One way this can be tackled is better documentation, and seriously docker! The documentation needs some improvement irrespective.
That said I can’t even imagine how complicated things would have been, had docker not existed. So, thank you for that!
Cheers!