When life gives you bash, make leMonad

Published in

The Hotels.com Technology Blog

7 min readMar 28, 2019

How fake functional programming turns scripts more readable

Why Bash?

Why not bash? I cannot vouch for the practicality of this choice. Not even in the context of dockerized tools, where its portability is guaranteed by packaging it with a hand picked runtime that promises compatibility. Or exactly because of this. A node.js, or a python script would prove equally trivial to execute. Also both provide fairly high level abstractions, that are easy to deal with. Yet I chose bash; maybe it’s because it is idiomatic to compose short and sweet single purpose bits, that do one thing well; or maybe it’s because of my attraction to the weird an unusual.

Either way, instead of dwelling on the why, consider this a given and instead let’s discuss how to put lipstick on a pig!

artist’s rendering of https://tenor.com/view/miss-piggy-ms-piggy-gif-7619915

Dummy Example

Let’s start with a simple dummy example, and iterate on it. Let’s write a script, that

checks out a git repository
renders mustache templates in certain directories
commits and pushes the changes to a different branch

What’s the use of it? Nothing, but it has enough moving parts to demonstrate the point. It has steps that don’t trivially succeed, and each of the steps strictly depend on the prevous one.

What’s Wrong with Spagetti?

artist’s rendering of https://giphy.com/gifs/buster-keaton-spaghetti-XylffH52LeiA0

Nothing, if it’s short and concise. Except that spagetti is not supposed to be short; seriously, don’t break it in half , it’ll just slide into the pot! But I digress.

repo=$1 
path=$2git clone $repofor path in $(find "$repo/$path" -type f -name "*.mustache"); do 
  cat $path | mo > ${path%.mustache} 
donegit add --all 
git commit -m "@noissue applied mustache template" 
git push

But your code won’t be this short either, because steps sometimes fail. And between them you’ll need to riddle the code with error checks and guards.

Let’s add some error checking and turn it into the promised spagetti monster:

artist’s rendering of https://imgur.com/gallery/dfsYUIB

repo=$1 
path=$2 
branch=$3 
git clone $repo

What if git clone fails?

if [ $? -ne 0 ]; then 
  echo "cannot clone repo" > /dev/stderr 
  exit 1 
fi 
paths=$(find $path -type f -name "*.mustache")

What if there are no mustache templates?

if [ -z "$paths" ]; then 
  echo "No templates in $path" > /dev/stderr 
  exit 1 
fi
for path in $paths; do 
  cat $path | mo > ${path%.mustache} 
done

Any of the git steps may fail, let’s chain them:

git add --all && \ 
  git commit -m "@noissue applied template" && \ 
  git push origin $branch

Did it work?

if [ $? -ne 0 ]; then 
  echo "cannot push changes" > /dev/stderr 
  exit 1 
fi 
echo DONE

That’s quite the mouthful of error checking. Discovering dependencies among the steps also adds to the already significant cognitive load. Let’s tackle the two issues separately

first orchestration
then error handling

Pipedreams

The previous example chains some git commands with &&, putting short circuiting to use. If only the dependent steps could pass partial results for further processing. Enter the almighty pipe!

artist’s rendering of https://gfycat.com/freethoughtfulamericancrayfish

I hear you, pipes are for asynchronous processing. Yes they are indeed. But it’s also true, that blocking on read can turn it into a synchronisation point; especially if we make sure to only pass exactly one or zero elements through.

Given the right plumbing we can make the most readablestest code ever, which bears cunning resemblence to human speach. Not really, but it’s fine:

main() { 
  repo=$1 
  path=$2 
  branch=$3git-clone $repo | 
  find-templates $path | 
  apply-templates | 
  git-commit-push $branch | 
  echo DONE 
}

Functionality is trivial now. Not the inner workings, but unless it needs to be tweaked, there should be no need to understand implementation details. To make this work, git-clone needs to pass the named of the cloned directory to find-templates, which in turn needs to pass mustache template file names to apply-templates, which passes the number of new files to git-commit-push. These are details that can be dealt with separately. Result is a collection of compact bash function, each of which can be written independently in a truly bashful spirit.

Each pipe spawns a new child process; by so we minimise the side effects of previous steps, starting with a fresh copy of the original parent shell. This means, that even if we changed the directory in one step, it has no effect on the next. Neither are newly added environment variables. The exception is the stuff that gets persisted to the disk. This helps composition more, than it hinders.

I say hinders, because, for example, you cannot rely on exit codes for error propagation between steps. $? gives you the error code in the current process, the one from the parent is lost. Adding an extra if [ $? -eq 0 ]; guard has no effect, regardless of exit codes, the chain executes all the way to the end.

What we need is two channels, one to transmit success responses, one for error signals. Pipe already operates on /dev/stdout, it is trivial to use that for success signals: just echo the result. Analogously /dev/stderr could be used for error signals, which could enable fine grained error handling. In the simple case examined here, a lack of success signal is treated as error signal, and /dev/stderr is reserved for logging. It is possible to do this, as long as pipes are synchronisation points.

leMonad: from Pipes to Flatmap

artist’s rendering of https://gameraboy1.tumblr.com/post/174606722966/legend-of-the-superheroes-1979-the-challenge

Let’s deal with errors! We want the pipe to express mapping on the success channel. We also want to apply different logic to the error channel, for example log an error, and propagate the error signal. This sounds exactly like a chain of Optional#flatmap-s in Java. Or a watered down maybe monad… a leMonad. Procrastinating on the implementation details further, let’s see an iteration on the original example, while retaining the script’s intent obvious:

main() { 
  repo=$1 
  path=$2 
  branch=$3git-clone $repo | 
  maybe find-templates $path | 
  maybe apply-templates | 
  maybe git-commit-push $branch | 
  maybe echo DONE 
}

We need maybe to invoke a mapping function, if the previous stage succeeded, or skip it, and propagate the error signal further.

artist’s rendering of https://weheartit.com/entry/46607769

Lambdas in bash? Yes, why not. Loose (as in almost non-existent) typing of bash makes it possible, of course sans the safeguards preventing you to shoot yourself in the foot. Here’s simple implementation:

maybe() { 
  read line if [ -z "$line" ]; 
    then echo "✘ $1" > /dev/stderr 
  else 
    echo "$line" | $@ 
  fi 
}

read blocks, so if nothing is sent down the pipe until the parent (ie the previous stage in the chain) terminates, $line remains empty. In which case we take to the error channel, and log an error as the simplest possible measure; error is also propagated implicitly, as nothing is written to /dev/stdout. Like ✘ apply-templates.

If something does come through the pipe, it is forwarded to squiggly-wiggly-snail-thing. Yup, that’s the lambda. Actually it’s all the parameters passed to maybe. $1 is invoked, because, well, it’s at the first place; the rest becomes the parameter list. In the example, $1 becomes find-templates, apply-templates, git-commit-push, and echo.

Mr. T, foo, bar, and baz

artist’s rendering of https://tenor.com/view/eye-roll-mr-t-gif-3528716

Let’s add some logging. We reserved /dev/stderr for that, so all we need to do, is redirect /dev/stdout, while not breaking the pipes. Mr T to the rescue:

log() { 
  tee >( while read line; do echo "[$1] $line"; done > /dev/stderr ) }

read everything from the pipe, copy to /dev/stderr with a prefix passed as parameter, and keep moving it through the pipe. For synchronization and to avoid duplication of the final output, we need to terminate tee with

end() { 
  tee > /dev/null 
}

Thus the final version of main takes the form of

main() { 
  repo=$1 
  path=$2 
  branch=$3 git-clone $repo | log "clone" | 
  maybe find-templates $path | log "find" | 
  maybe apply-templates | log "apply" | 
  maybe git-commit-push $branch | log "commit" | 
  maybe echo DONE | log "done" | 
  end 
}

Pipe is for asynchronous processing, it spawns a new subprocess. This is why in general the idomatic way of dealing with pipes on the consumer end is to read in a while loop. Thus the subprocess enters the loop the moment pipe is opened, and is kept alive via the loop as long as the pipe is open.

This is not what is done here: instead we synchronize the subprocess via read, which blocks until either something is written to the pipe, or it is closed. If you fail to synchronize, steps will no longer wait for input from the previous one.

If you pass multiple items through the pipe, like the result of a find, only the first result will be processed, the rest will be ignored. Hence the restriction to pipe exactly zero or one item.

Conclusion

I think this style produces bash scripts that are weird and certainly up to aquired taste to get used to. Yet if backed by a more robust maybe, it might prove to be a handy tool in your toolbox. Maybe not the daily toolbox, but for the one full of ikea wrenches and non-standard plugs, tucked away in the garden shed.

Originally published at kicsikrumpli.github.io on February 21, 2019.