#3: Desired outcomes

How Makefiles changed the way we think

Aljabr, Inc.
Aljabr
4 min readSep 4, 2018

--

In the previous post, we looked at simple scripts and runbooks for executing parts. The next tool in our example cases is Make. Make was one of the first tools for managing dependency graphs in workflow execution. Normally used for building software, various attempts to simplify it (imake, Autoconf, Automake, etc) have not displaced it as a simple and effective approach, albeit with its historical quirks.

Make was designed for the era of Bourne shell scripting. Consider the following “hello world” of Makefiles. It has lines with colons “:” that declare dependencies, and lines that follow (bizarrely prefixed by invisible tabs), which have the form of a traditional sequential workflow.

# Makefileall: hello world
./hello
./world
hello.o: hello.c /usr/include/stdio.h
gcc -c hello.c
world.o: world.c /usr/include/stdio.h
gcc -c world.c
hello: hello.o
gcc -o hello hello.o
world: world.o
gcc -o world world.o

This example compiles and executes the code to print hello and world from two separate programs, as shown in the diagram below:

A Makefile behaves like a number of embedded pipelines, in staged clusters. In this example, it has no formally modeled desired end-state, so the parallel behavior is undefined.

What’s interesting about a Make program is that it works backwards. We start by declaring a desired end-state: the intent to execute the program, but we also declare that this depends on certain prerequisites. Instead of blindly attempting to run the code, Make checks to see if the dependencies exist. If they don’t it looks for a declaration of a workflow to build them. Thus, we see the rules (each with their own recursive dependencies) explaining these subroutine pipelines. Make’s simplistic rule for executing a workflow is this: if a dependency is newer than a target, the rerun it.

This structure has a simple consequence. If any changes are made to sources, Make will rebuild them and all of their subsequent dependencies. By looking backwards from the end of time, it avoids unnecessary work that would follow in a congruent shell script.

Separation of control plane from data plane

If we now want to make changes to the stage software input, e.g. by adding a compiler flag for debugging:

# Makefile changeall: hello.o world.ohello.o: hello.c
gcc -g -c hello.c
world.o: world.c
gcc -g -c world.c

Had started from the beginning, like a shell script, then all of the stages would need to be rebuilt, so Make is already an improvement. But Make focuses mainly on the software dependencies. But workflows also have other build dependencies.

Suppose the version of the compiler changed because of a crucial bug fix; then we should probably rebuild the software. But this doesn’t work, because the interior transformation cannot be a dependency. Make does not do this automatically, because it is unaware of changes made to the commands inside the Makefile. We can work around this, by pulling those commands out of the Makefile into separate `containers’, packaging the commands as files too, and adding them to the dependency list. Now we have to edit multiple files (analogous to building separate containers, which shifts the overhead to the source, where it belongs causally):

# Makefile containerizedall: hello.o world.ohello.o: hello.c compile_hello.sh
./compile_hello.sh
world.o: world.c compile_world.sh
gcc -g -c world.c
./compile_world.sh

In a cloud computation, stages will already by packaged as containers, so this adds no extra steps. Whatever distributed make system we create will be able to add this at no extra cost.

But, even though we have decoupled the ballistics of the changes, the order of the output is still undefined, coordinated only by the random execution of the containerized files: this is still an imperative chain that results in two merging but uncoordinated `ballistic’ streams. To fix that, we need to give the output a desired end_state, and constrain the pipelines by completing the reverse DAG so that it converges to a single state.

# Makefile 3 for convergent pipelineend_state: exec_hello exec_world inputs
cat hello_stage world_stage inputs > end_state
exec_hello: hello exec_hello.sh
./exec_hello.sh > hello_stage
exec_world: world exec_world.sh
./exec_world.sh > world_stage
hello.o: hello.c compile_hello.sh
./compile_hello.sh
world.o: world.c compile_world.sh
./compile_world.sh
A parallelized cluster `makefile’ behaves like a number of staged clusters with a main flow direction, and several tributaries that feed into it, all of which converge onto the final desired end-state outcome.

(The dangling reference to “inputs” can refer to any pure data file, such as a version declaration, or license text, that lives in the build area, and is captured in the dependencies so that a change will trigger a new build).

Now when we type “make”, the result is an output file called “end_state” which is made in response to changes to input file “inputs”.

If a change occurs to any of the sources, or any intermediate result is lost, the correct outcome is recomputed with minimal waste!

--

--