I’m not docker’s biggest fan , but i do see its benefits , although i think somehow it has managed to sort of hide what docker really is and what it really does, but this post isn’t about docker , but about namespaces.
Docker/lxc use a kernel feature that , in simplest terms , allows a process to have isolation a multiple levels (pids / fs / hostnames / etc)
So the question is , if docker/lxc is a process , how much different is it from a normal process , let’s say a clone() of “ls”
the following is a normal ls
We all know the fork() syscall is actually clone , clone copies some parts of the parent process memory to the children etc etc .
This is a clone example for executing an containerized “ls”:
There are pretty much the same , but not quite , there’s some extra arguments , arguments that provide that isolation:
Let’s pick CLONE_NEWPID for example (man 7 namespaces)
Man says “ PID namespaces isolate the process ID number space, meaning that processes in different PID namespaces can have the same PID. PID namespaces allow containers to provide functionality such as suspending/resuming the set of processes in the container and migrating the container to a new host while the processes inside the container maintain the same PIDs.”
So that makes sense , i guess it also helps to keep the number of pids quite low , as if you have a massive host running plenty of containers and plenty of apps pids would go quite high , we verify this by doing:
Somehow I got ps on pid 2 , pretty neat .
So there’s a lot of namespaces arguments you can pass to clone , clone takes a function as an argument (amongst others) you can “containerize” pretty much everything.
there child_main() is just a random C function .
I hope that explains the internals of namespaces a bit , there’s a missing part which is linking namespaces with cgroups. Maybe for another article.