Fun building shared libraries in Go
I was working on a problem recently when I thought to myself “what if I could cause read operations on file X to actually come from file Y”.
I’d seen examples of overriding system calls using C and LD_PRELOAD before and ever the keen Gopher, I wondered whether I could do something similar in Go. As luck would have it, I found that as of version 1.5, the new ‘buildmode’ option enables the creation of C shared libraries! I hadn’t used cgo at all before but with a little Googling I was able to establish the basic building blocks for creating ‘custom’ functions. This post will describe my journey to working code.
The basics
My goal was simple: I wanted a command like ‘cat foo’ to actually return the result as if I had run ‘cat bar’. Before I could do this, I had to determine which system call needed to be overridden. I was already pretty sure, but for the sake of this post — let’s take a look at how we can easily do this starting with the ever useful strace.
First, some sample files:
$ echo foo > foo
$ echo bar > bar
Now we run strace:
$ strace cat foo 2>&1 | grep foo
execve("/bin/cat", ["cat", "foo"], [/* 60 vars */]) = 0
open("foo", O_RDONLY) = 3
read(3, "foo\n", 131072) = 4
write(1, "foo\n", 4foo
OK, so we can see that cat uses the ‘open’ syscall which according to the manpage has the following signature:
int open(const char *pathname, int flags)
So, I would need to write my own open function which examines the path, modifies it if appropriate and then returns an integer which is either the file descriptor or an error (-1). For the sake of testing, I would use environment variables to set the ‘from’ and ‘to’ path names.
A first attempt
After some Googling I found some useful blog posts which pointed me in the right direction (see here for example). Ultimately I discovered I’d need to:
- import “C” before anything else
- convert C.char into C.GoString before I can operate on it
- place ‘//export’ above my function for it to be recognized by C
- maintain the same signature as the underlying syscall
With this in mind, here was my starting point:
I should point out here that the syscall.Open function requires a 3rd argument — mode. For the sake of this example I’m setting it to 0600 which is superflous since we’re only reading the file.
To build this into a shared library:
$ go build -o preload -buildmode=c-shared
And now the test. First, here’s what it looks like without any preload:
$ cat foo
foo
Now, we set our ‘from’ and ‘to’ paths and preload our shared library:
$ export LD_FROM=foo
$ export LD_BAR=bar
$ LD_PRELOAD=./preload cat foo
bar
Success!
When open isn’t open
Now, let’s test it with simple Python script which opens a file for reading and prints the first line:
$ cat test.py
import sys
print open(sys.argv[1]).readline().strip()
And now our test:
LD_PRELOAD=./preload python test.py foo
foo
Sad face.. I had assumed this would work in exactly the same same way! Let’s check it with strace:
$ strace python test.py foo 2>&1 | grep foo
execve("/usr/bin/python", ["python", "test.py", "foo"], [/* 60 vars */]) = 0
open("foo", O_RDONLY) = 3
read(3, "foo\n", 4096) = 4
write(1, "foo\n", 4foo
As you can see, this looks just like the output from our trace of the cat command. So, why wasn’t my open function being called!?
A little more Googling revealed that ‘open’ is both a syscall and a library function. This led me to try using ltrace which traces library calls:
$ ltrace python test.py foo 2>&1 | grep foo | grep open
fopen64("foo", "r") = 0x5599a95db140
Ah! So python’s ‘open’ actually calls fopen64 underneath, not the ‘open’ syscall. Here’s the signature for fopen64:
FILE *fopen64(const char *pathname, const char *mode)
This complicated things.. With ‘open’, I could simply pass off to syscall.Open and return the file descriptor integer. Now I needed to return a FILE object. The Go syscall package doesn’t have an equivalent to fopen64 and certainly nothing that returns a FILE object. Some more Googling revealed that I should be able to refer to the FILE object and use the underlying fopen64 system call from C itself. A key difference here is that I would have to convert the new path name from a Go string back into a CString. In order to refer to fopen64, I would need to include stdio.h as well. Here’s what the next iteration looked like:
This didn’t quite do what I wanted:
$ go build -o preload -buildmode=c-shared
./main.go:34:35: could not determine kind of name for C.FILE
./main.go:38:12: could not determine kind of name for C.fopen64
./main.go:37:8: could not determine kind of name for C.free
Despite including the correct header, none of the C types could be found. After some trial and error, it seems that cgo is very particular about formatting. So I changed my imports to look like this:
package main
// #include <stdio.h>
// #include <stdlib.h>
import "C"
import (
"os"
"syscall"
)
This improved things slightly:
$ go build -o preload -buildmode=c-shared
./main.go:39:12: could not determine kind of name for C.fopen64
Hmm, so it found the references to ‘free’ and ‘FILE’ but not ‘fopen64’. In the name of desperation, I changed the reference to ‘C.fopen’ which compiled OK:
$ go build -o preload -buildmode=c-shared
$ LD_PRELOAD=./preload python test.py foo
bar
As you can see, the test was successful. Here’s the working code:
So, this appears to work as intended but I’m concerned about the need to use fopen instead of fopen64 as I believe this would cause problems attempting to read a ‘large file’ (>2GB).
Aiming for completeness
Although I’m not using it, I figured I should replicate fopen as well (in case other tools use it instead of fopen64). To do this I simply copied my fopen64 function and removed the ’64’. This however produced 2 strange results!
First, the python test now produces an error:
$ LD_PRELOAD=./preload python test.py foo
fatal: morestack on g0
zsh: trace trap LD_PRELOAD=./preload python test.py foo
Second, running ‘ls’ under preload causes the process to hang:
$ LD_PRELOAD=./preload ls
<hangs here>
Simply removing the fopen function and rebuilding the shared library fixes this.
Using ltrace, I don’t see any calls to fopen at all, so I’m unsure as to why this is happening or how to debug it further. I’m hoping my question on the ever-useful golang-nuts board will turn up something useful.
When cgo can’t quite make it
While I was writing this post, I had a helpful reply to my golang nuts question pointing out that fopen64 is only defined if the pre-processor macro _LARGEFILE64_SOURCE is declared. In order to do this, you must add a CFLAGS directive at the top of your code like so:
#cgo CFLAGS: -D_LARGEFILE64_SOURCE=1
Now, things really start to break:
go build -o preload -buildmode=c-shared
In file included from _cgo_export.c:3:0:
cgo-gcc-export-header-prolog:44:14: error: conflicting types for ‘fopen64’
In file included from ./main.go:5:0,
from _cgo_export.c:3:
/usr/include/stdio.h:298:14: note: previous declaration of ‘fopen64’ was here
extern FILE *fopen64 (const char *__restrict __filename,
^~~~~~~
_cgo_export.c:37:7: error: conflicting types for ‘fopen64’
FILE* fopen64(char* p0, char* p1)
^~~~~~~
In file included from ./main.go:5:0,
from _cgo_export.c:3:
/usr/include/stdio.h:298:14: note: previous declaration of ‘fopen64’ was here
extern FILE *fopen64 (const char *__restrict __filename,
^~~~~~~
Remember what I said about making sure the function signatures match? Well, as you can see here, cgo has ultimately created a signature with ‘char*’ but the declaration of the function in stdio.h uses ‘const char*’ (Go doesn’t have a const modifier). After some back and forth on golang-nuts with workarounds involving writing an fopen64 function in C and calling into my Go code, alas I wasn’t able to get further than this.
What about writes?
Remember when I said I wanted to impact all read operations? Well… Not all calls to ‘open’ are read-only. The code above would need to be modified to inspect the flags (or mode in the case of fopen) and only modify the file path for non-write calls.
Conclusion
Well, this was a useful learning exercise. It taught me a little more about Go, much more about C and ultimately produced the behaviour I was after.
I would however love to be able to call C.fopen64 — if you know of a way please let me know!