xargs, or how to easily parallelise processes in Bash

Orazio Angelini
4 min readJan 13, 2023

Most Linux systems in the world come with the ability to parallelise command line processes out of the box. I recently noticed that this not at all common knowledge.

Easy out of the box method

With xargs, which is included by default in most linux distributions, you can parallelise any process in bash out of the box. The main command is:

# for readability,
# set number of processes in NPROC.
# protip: the `nproc` command outputs
# the number of CPUs on the machine
NPROC=$(nproc)
<any_command_outputting_a_return_separated_list> \
| xargs -I {} -n 1 -P ${NPROC} \
command {}

This will execute commands by substituting each element of the list piped in stdin where the {} is. It will run $NPROC commands in parallel, only starting a new one when any process in the queue is finished. For example:

seq 100 | xargs -I {} -n 1 -P 10 echo {}

Will run the list of following commands in parallel, 10 commands in parallel at a time:

echo 1
echo 2
echo 3
(...)
echo 10 # <-- up to here all processes start instantly
echo 11 # <-- this starts as soon as one above finishes..
echo 12 # <-- and so on
(...)
echo 99
echo 100
  • You can control what character sequence in the command is substituted with the -I flag (in this case we chose {} ). Without the flag, by default the entries from the list are appended at the end of the command.
  • You can control and the number of parallel processes with -P .
  • If you’re parallelising, generally don’t change -n 1. This is because the base use case of xargs, if no flags are given, is to turn a return-separated string list taken from stdin into a space-separated list and append it at the end of the command given. If the list is really long it can be broken with -n in several pieces, usually to avoid going over the maximum number of arguments allowed by the command and shell config (common problem when you’re using e.g. rm on a directory with many files):
$ seq 3
1
2
3

$ seq 3 | xargs echo
# runs the command `echo 1 2 3`
1 2 3

# break in chunks of 3
$ seq 8 | xargs -n 3 echo
1 2 3
4 5 6
7 8

Common gotcha: variable expansion

Any variable expansion in the command is performed by Bash before xargs is run and before the entries from stdin are substituted in the command. Therefore, if you e.g. want to multiply a series of numbers with expansion, this is the wrong command:

$ seq 3 | xargs -n 1 -I {} echo $(( {} * 2 ))
bash: {} * 2 : syntax error: operand expected (error token is "{} * 2 ")
[1] broken pipe seq 3

This is because first bash tries to expand $(( {} * 2 )), that makes no sense because {} has not yet been substituted with any number coming from seq 3, which causes an error. Therefore, it never goes on to run seq or pass anything to xargs.
The quick-and-dirty solution is to run a subshell and let it interpret the command after the substitution has been performed:

$ seq 3 | xargs -n 1 -I {} bash -c 'echo $(( {} * 2 ))'
2
4
6

xargs in this case interprets everything after -I {}as a simple string, then performs the substitution if {}, and finally runs the command with the substituted string, which in this case becomes (for the first line) bash -c 'echo $(( 1 * 2 ))'. Of course this solution speeds things up only if the commands you are parallelising are computationally heavy, because you’re adding the overhead of starting one additional subshell per command run.

One piece of advice: check your commands before running them

In many cases you want to parallelise many operations, and you expect a lot of output, which will make it hard to figure out if everything runs correctly. Before you run a parallelised command with xargs , check what exactly will be done. You can always do this by prepending echo to the command. So instead of:

$ seq 3 | xargs ls

do:

$ seq 3 | xargs echo ls
ls 1 2 3

Another method to check commands before running them is to use the -p flag (interactive mode) which will print each command and require you write y or n to decide whether to start or not the command.

What if I need to do something more complicated?

In that case you can use GNU parallel. It is a much more sophisticated program that allows you fine control over the processes run, their output, and the variable substitutions.

While it is not normally present in most Linux boxes, you can get it on any internet-capable system in a second with:

# download it
wget https://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
# allow execution
chmod 755 parallel
# run it
./parallel

The base use case is very similar to xargs:

<any_command_outputting_a_return_separated_list> \
| parallel -j ${NPROC} <command> {}

It automatically substitutes {}. Two interesting features are that parallel groups the output of any command that produces multiple lines of text (you can disable this with --ungroup ) and it allows you to preview the commands you’re going to run with the --dryrun flag. For the rest of its numerous options, please look into its documentation.

--

--