Bash coprocess

My previous post was about the handing of background jobs in Bash. To start a command in the background, one simply needs to append an & to the command.

Commands likefg, [jobspec]% and kill can be used to bring the process to the foreground or send signals to it. However, these commands don’t allow one to communicate with the background process by sending input to the process or reading its output.

coprocesses

Bash versions 4.0 and above offer another way to start asynchronous processes in a subshell. This is done with the help of the coproc keyword.

Usage:

coproc command args
coproc name command

The first syntax is used if one wishes to start one subprocess. To start more than one subprocess, the second syntax is used, where each of the subprocesses is given a specific name. Named subprocesses only work with compound commands.

A bidirectional pipe is established between the executing shell and the coprocess. The input and output file descriptors and the pid of the coprocess are available to the shell.

If the coprocess was launched with an explicit name, the output and input file descriptors are available for use in an array of the same name. If no name is provided, the default name is COPROC.

The first element of this array is the output descriptor of the coprocess and the second element of the array is the input descriptor to the coprocess. The variable name_PID holds the pid of the coprocess.

The return status of coproc is the exit status of the command.

In the aforementioned example, the coprocess is named macaroons so its output and input descriptors is stored in an array named macaroons. The pid of the coprocess is available in a variable called macaroons_PID.

The coprocess array: 63 60
The PID of the coprocess is 8441
The output of the coprocess is Sun Jun 2 12:22:27 PDT 2019

If the coprocess were launched without a name, then the input and output descriptors as well as the process PID is stored in a variable called COPROC.

Gotchas

Since the coprocess runs asynchronously, there are no guarantees around when it will finish. It’s entirely possible for the coprocess to finish before its outputs are processed, closing its write descriptor in the process, in which case any attempt by the shell to read the output will result in an error.

Another point to remember is that the file descriptors of the coprocess are accessible only to the process from which the coprocess was spawned. They are not inherited by subshells (like for example, any command launched within paranthesis is launched in a new subshell; commands launched as a part of a pipeline are each launched in a subshell).

The following example should serve as a case in point:

The print coprocess array: 63 60
The PID of the print coprocess is 11285
The next command will error.
./coproc_subshell.sh: line 13: "${print[1]}": Bad file descriptor

Lastly, Bash might print a warning about starting multiple coprocesses. However, in Bash 5.0 and above, starting multiple coprocesses behaves correctly, despite the warning.

./two_coprocesses.sh: line 7: warning: execute_coproc: coproc [11121:print] still exists
The print coprocess array: 63 60
The PID of the print coprocess is 11121
The PID of the second coprocess is 11122
The PID of the second coprocess is 62 58
The output of the print coprocess is hello world
The output of the second coprocess is goodbye world

Conclusion

As always, there’s a very valid argument to be made for the fact that anything approaching this level of complexity is probably better done in Python.

I have no valid argument in favor of why Bash is preferable here; indeed, for more advanced use cases, descriptor handling can get somewhat fiddly. However, I still reckon it’s useful to know about Bash’s more advanced features to know why and when not to Bash.