After receiving quite a few reports from users of www.frida.re of kernel panic upon attaching the second time to a process, I finally got around to debugging the kernel to figure out what was going on.
First, a little background. Frida hooks function calls by rewriting the function’s prologue in memory. In order to do so it has to make the containing memory page writable, patch the code, and later revert it. As shared libraries are mapped and not copied into memory, the kernel can share their memory pages between processes. Those memory pages are copy-on-write, and any local modification will simply give your process its own copy of the memory page in question. Whenever Frida intercepts a function in a shared library, this side-effect occurs. Upon attaching to a process, Frida itself hooks one such function, and as a user of Frida you may be hooking plenty of them as well. Also, every time Frida attaches to a process it probes portions of its address space, which also means parsing the metadata of its loaded shared libraries. This parsing ends up reading some of those same memory pages.
Having been increasingly frustrated by this looming kernel panic but never finding a big enough chunk of time to investigate it properly, an opportunity finally presented itself. I fired up /bin/cat as a guinea pig program in one terminal, and attached to it with Frida once, then detached. Next, with /bin/cat still running, I ran vmmap and copied its output to my debugger machine. Next I requested Frida to attach a second time, and the kernel panic triggered as usual. The machine was now waiting for a debugger to attach, so I fired up lldb and attached to it. A quick look at the call-stack revealed that it was hitting a failing assertion while handling mach_vm_read_overwrite. By looking at the arguments it was clear where it was requested to read from, and how many bytes. Looking back at the vmmap output, I noticed something peculiar. It was asked to read the first pages of a shared library, and unlike all the other libraries, and all other pages of this library, the second page was marked PRV (private) and not COW (copy-on-write). This made perfect sense, because I knew Frida hooked one function in this particular library. “Could it be a bug when handling a read spanning COW and PRV pages?” I quickly wrote a tiny C program to test out this theory, and yep, that was the issue. After simplifying it further I arrived at this:
#include <mach-o/dyld.h>#ifndef __LP64__
# define mach_vm_protect vm_protect
# define mach_vm_read_overwrite vm_read_overwrite
#endifextern kern_return_t mach_vm_protect (vm_map_t,
mach_vm_address_t, mach_vm_size_t, boolean_t, vm_prot_t);
extern kern_return_t mach_vm_read_overwrite (vm_map_t,
mach_vm_address_t, mach_vm_size_t, mach_vm_address_t,
main (int argc, char * argv)
volatile char * library;
const mach_vm_size_t page_size = getpagesize ();
const mach_vm_size_t buffer_size = 3 * page_size;
mach_vm_size_t result_size; library = (char *) _dyld_get_image_header (2);
mach_vm_protect (mach_task_self (),
(mach_vm_address_t) (library + page_size), page_size, FALSE,
VM_PROT_READ | VM_PROT_WRITE | VM_PROT_COPY);
library[page_size]++; /* COW -> PRV transition */
library[page_size]--; /* undo dummy-modification */
result_size = 0; /* panic! */
mach_vm_read_overwrite (mach_task_self (),
(mach_vm_address_t) library, buffer_size,
(mach_vm_address_t) buffer, &result_size);
Compile and run, and observe an instant kernel panic on the latest OS X and iOS. Update: Incorporated improvements from https://github.com/jdmoreira/KernelPanic-10LOC.
Latest Frida from git now has a workaround where we limit our reads to one page at a time. This will be part of the upcoming 1.6.9 release, to be released soon.
Note: I reported this to Apple on the 20th of February 2015, though my impression from past events is that they’re not likely to fix this anytime soon.