Namespaces in Go - Mount

Ed King
7 min readDec 17, 2016

--

One of the fundamental features of container implementations today is the ability to run containers of differing linux distros on the same host machine. It’s not uncommon, for example, to install Docker on an Ubuntu host and to then start a bunch of containers on that host using BusyBox, CentOS, or any other distro you like the look of.

In this article we will will take a look at what makes this possible - namely a combination of the Mount namespace and the pivot_root system call. Let's start by reviewing the Mount namespace implementation in ns-process as it currently stands. If you’ve not been following along with this series so far, be sure to check out the previous article(s) first.

💁 The following has been tested on Ubuntu 16.04 Xenial with Go 1.7.1

# Git repo: https://github.com/teddyking/ns-process
# Git tag: 3.0
$ go build
$ ./ns-process
>> namespace setup code goes here <<-[ns-process]- # cat /proc/mounts
/dev/sda1 / ext4 rw,relatime,data=ordered 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
# ...

There are a number of mounts already listed in the /proc/mounts file. This may be little surprising given that we’re requesting a new Mount namespace (via the CLONE_NEWNS flag) and have yet to do any explicit Mount namespace setup.

This doesn’t feel very container-like. Our namespaced process should know as little as possible about the host it’s running on, and certainly shouldn’t be able to see a list of all the host’s mounts. So why’s this happening? Fortunately, an explanation can be found within the mount_namespaces(7) man page.

“When a process creates a new mount namespace using clone(2) or unshare(2) with the CLONE_NEWNS flag, the mount point list for the new namespace is a copy of the caller’s mount point list.”

It seems that this is actually intended behaviour, and it explains why /proc/mounts is already populated as soon as our namespaced process starts. With this in mind the question now becomes, “What do we do about it?”. We need some way of clearing the host’s mounts from the new Mount namespace in order to keep them secure and away from prying eyes - we need to pivot_root.

🔄 pivot_root

pivot_root allows you to set a new root filesystem for the calling process. I.e. it allows you to change what / is. It does this by mounting the current root filesystem somewhere else while simultaneously mounting some new root filesystem on /. Once the previous root has been moved, it is then possible to umount it. Thus we have a mechanism for 'clearing' the hosts's mounts from inside a new Mount namespace - we simply pivot away and then umount them!

This is what allows the aforementioned Ubuntu host machine to run a CentOS container. As long as the Ubuntu host has a copy of a CentOS filesystem on disk, we can create a new Mount namespace, call pivot_root pointing to the CentOS filesystem and then run whatever processes we want to inside the 'pivoted' namespace. The processes will believe they’re running on CentOS the entire time.

Incidentally this is where the reexec from the previous article comes in handy. pivot_root must be called from within the new Mount namespace, otherwise we'll end up changing the host's / which is not the intention! And we want all this to happen before the namespaced shell starts so that the requested root filesystem is ready for when it does.

👉 Let’s Go

In Go, pivot_root is implemented via the PivotRoot func found in the syscall package.

func PivotRoot(newroot string, putold string) (err error)

newroot is the path to the desired new root filesystem and putold is a path to a directory in which to move the current root. There are a few restrictions imposed on newroot and putold by the underlying pivot_root sys call that we need to be aware of:

  1. They must both be directories
  2. They must not be on the same filesystem as the current root
  3. putold must be underneath newroot
  4. No other filesystem may be mounted on putold

Most of these are fine but the second point there will require a small workaround, as we’ll see in a moment. We’re also going to need a suitable newroot in which to pivot to.

The process of preparing a newroot filesystem can be quite a detailed and complex one. Take for example Docker’s layered filesystem approach in which many filesystem “layers” are joined together to present a single coherent root. We’re going to do something much simpler, which is to to assume that a suitable root filesystem has already been prepared for use.

# Git repo: https://github.com/teddyking/ns-process
# Git tag: 4.0
$ mkdir -p /tmp/ns-process/rootfs
$ tar -C /tmp/ns-process/rootfs -xf assets/busybox.tar

From now on, ns-process will expect a root filesystem to exist at this path and will raise an error if one can’t be found. Note that although we’re using BusyBox for this particular example, you could just as easily use any other distro.

Now that we have our newroot, let’s write some code to make use of it.

# Git repo: https://github.com/teddyking/ns-process
# Git tag: 4.0
# Filename: rootfs.go
func pivotRoot(newroot string) error {
putold := filepath.Join(newroot, "/.pivot_root")

// bind mount newroot to itself - this is a slight hack
// needed to work around a pivot_root requirement
if err := syscall.Mount(
newroot,
newroot,
"",
syscall.MS_BIND|syscall.MS_REC,
"",
); err != nil {
return err
}

// create putold directory
if err := os.MkdirAll(putold, 0700); err != nil {
return err
}

// call pivot_root
if err := syscall.PivotRoot(newroot, putold); err != nil {
return err
}

// ensure current working directory is set to new root
if err := os.Chdir("/"); err != nil {
return err
}

// umount putold, which now lives at /.pivot_root
putold = "/.pivot_root"
if err := syscall.Unmount(
putold,
syscall.MNT_DETACH,
); err != nil {
return err
}

// remove putold
if err := os.RemoveAll(putold); err != nil {
return err
}

return nil
}

With the pivotRoot func in place, it’s time to put nsInitialisation to good use.

# Git repo: https://github.com/teddyking/ns-process
# Git tag: 4.0
# Filename: ns_process.go
func nsInitialisation() {
newrootPath := os.Args[1]

if err := pivotRoot(newrootPath); err != nil {
fmt.Printf("Error running pivot_root - %s\n", err)
os.Exit(1)
}

nsRun()
}

func main() {
var rootfsPath string
// ...
cmd := reexec.Command("nsInitialisation", rootfsPath)
}

Notice that we’re now passing an argument, rootfsPath, to nsInitialisation. Once reexeced, this argument can be picked up by reading from os.Args[1]. Also notice how the call to pivotRoot comes before nsRun. By doing this, we're ensuring that the new root filesystem will already have been pivoted to before the /bin/sh process starts.

With all that in place, let's run the updated Go program and check to see which mounts, if any, are available to us now.

💁 The following has been tested on Ubuntu 16.04 Xenial with Go 1.7.1

# Git repo: https://github.com/teddyking/ns-process
# Git tag: 4.0
$ go build
$ ./ns-process
-[ns-process]- # cat /proc/mounts
cat: can't open '/proc/mounts': No such file or directory

Ah … now that we’ve pivoted to a new /, we no longer have a /proc! This is actually a good thing as it means we definitely can’t see the host’s mounts anymore, which is one of the main reasons for doing all this work in the first place. But, there’s probably only so far we can get without a working /proc, so let’s add one to our new root.

# Git repo: https://github.com/teddyking/ns-process
# Git tag: 4.1
# Filename: rootfs.go
func mountProc(newroot string) error {
source := "proc"
target := filepath.Join(newroot, "/proc")
fstype := "proc"
flags := 0
data := ""

os.MkdirAll(target, 0755)
if err := syscall.Mount(
source,
target,
fstype,
uintptr(flags),
data,
); err != nil {
return err
}

return nil
}

And just as with pivotRoot, mountProc should be called from nsInitialisation.

# Git repo: https://github.com/teddyking/ns-process
# Git tag: 4.1
# Filename: ns_process.go
func nsInitialisation() {
newrootPath := os.Args[1]

if err := mountProc(newrootPath); err != nil {
fmt.Printf("Error mounting /proc - %s\n", err)
os.Exit(1)
}

if err := pivotRoot(newrootPath); err != nil {
fmt.Printf("Error running pivot_root - %s\n", err)
os.Exit(1)
}

nsRun()
}

Ok, that should now be everything. Let’s try it out.

# Git repo: https://github.com/teddyking/ns-process
# Git tag: 4.1
$ go build
$ ./ns-process
-[ns-process]- # cat /proc/mounts
/dev/sda1 / ext4 rw,relatime,data=ordered 0 0
proc /proc proc rw,nodev,relatime 0 0

That’s looking much better - the host’s mounts are no longer visible to us and we have a new /proc mounted and ready for action. But wait … there is one more thing …

🤔 PID namespace

The changes implemented above have had an unintentional side effect on the PID namespace setup. Prior to mounting the new /proc, running ps inside the namespaced shell would’ve resulted in all the host’s processes being listed. This is because ps relies on /proc to detect running processes and we were still referencing the host’s /proc.

This is obviously a pretty terrible thing to happen from a container perspective! But fortunately now that we have our own /proc(and are requesting a new PID namespace via the CLONE_NEWPID flag), running ps shows only processes that are relevant to us.

# Git repo: https://github.com/teddyking/ns-process
# Git tag: 4.1
$ go build
$ ./ns-process
-[ns-process]- # ps
PID USER TIME COMMAND
1 root 0:00 {exe} nsInitialisation /tmp/ns-process/rootfs
5 root 0:00 /bin/sh
8 root 0:00 ps

📺 On the next…

We’re nearing the season finale of “Namespaces in Go”, but we’re still missing one key piece of configuration - networking. What needs to be done to allow our namespaced shell to talk to the Internets? The answer to this and plenty more coming up, stay tuned…

Update: Part 6, “Namespaces in Go - Network” has been published and is available here.

--

--

Ed King

A Software Engineer currently working on and with Kubernetes.