Docker image with Tor, Privoxy and a process manager under 15 MB
At a time when we’re obsessed with saving minutes (sometimes seconds) in build/test/deploy times, container image sizes are an elephant in the room. The whole argument of “but the layers are cached!” don’t make much sense. Machines are being autoscaled, base image tags updated, containers getting distributed, docker storage device getting filled, more layers lowering performance, and so on and on.
Large image sizes should no longer be a norm, they are an aberration.
I was recently looking for an image with Tor and Privoxy. My broadband speed is very low and all images clocked around 200 MB, hence I had to end up creating one. It runs both processes on start and weighs under 15 MB. Give it a spin.
docker run -d -p 8118:8118 -p 9050:9050 rdsubhas/tor-privoxy-alpine
curl --proxy http://localhost:8118 https://www.google.com
Thanks to @docker for featuring this in the Docker Weekly!
No low level tricks, just choice of tools.
Container Base Image
But it weighs a ton. The base image alone comes in at 180MB. Brian Christner does an awesome job of visualizing docker base image sizes. Its a pretty normal sight seeing project or third party images take anywhere between 500 MB or even a gig.
Enter Alpine Linux.
Seriously, the base Alpine image is just under 5 MB, and it has a huge repository of packages that you can add once its pulled (hint: use multiple % symbols to filter this list). It comes with zero bloat, not even bash (only sh). I’ve used it for a lot of different projects lately — Node, Ruby and Python — and it packs a punch.
When running multiple processes in a container, people usually resort to Supervisor (the most popular), Circus, God.rb, Foreman or any of the other innumerable process managers. But they have a few caveats.
- They need extra stuff to be installed — Python for supervisor/circus, Ruby for god.rb, Node or various platforms for Foreman, etc. This inflates the size of the image. If you’re writing a Ruby app and need to use Supervisor to start multiple processes, you basically end up installing both Ruby and Python in the image.
- They have additional overhead. Time to bootup services, memory overhead (for e.g. supervisor takes around 20–25 MB of RAM along with the python runtime), CPU overhead, etc.
- Some don’t support piping logs to STDOUT. This makes it problematic with “docker logs”, and you end up yet another service to pipe logs as well.
runit is just much smaller, faster and easier.
- Create a folder for each service with a “run” file that contains the command. Yeah, coming from Procfile or YML formats this can be an eyesore, but it gets the job done.
- Add all these service folders in a single folder (anywhere) in the docker image. I put them under /etc/service/.
- Set the Dockerfile CMD to “runsvdir /path/to/services/”. That’s it, fast and compact multi process management. It runs everything in the foreground and pipes logs to STDOUT as well. So you get to do “docker logs” without running anything else.
runit, with or without Alpine, is a great process manager. Its been around for very long, and also been used since early container environments (baseimage-docker). But I still don’t see a lot of public images leveraging it. If all you need is to start and stop a bunch of services in a container, then consider runit as a first choice.
Commenters have pointed me out to related awesome projects, want to give a shout out to them:
s6 — process supervision similar to runit but seems to have more features
s6-overlay — docker base image with s6, also has an alpine flavor
docker-slim — static and dynamic analysis to create skinny containers