Random idea: Using data flow graphs for optimising serial programs
One of the cool ideas i came across recently was that of a data flow graph.
Basically you represent a computation in the form of symbols and operations on them. Let’s take a very simple example. Suppose the following program:
In the above program the computations in step 1 and step 2 are independent of each other. Step 3 depends on step 2 and step 4 depends on step 3 and step 2.
Representing this in the form of a graph, it becomes something like this.
Looking at the graph, the cool part we notice is that step 1 and step 2 are independent of each other and it doesn’t really make sense for one to wait for another to finish. So if we have multiple cores available to us (gpu or cpu), we can schedule these tasks simultaneously on two cores and optimize the net computation. While this might not seem very useful in this simple program, it can be a really powerful tool.
- An example application of this would be a compiler which can automatically convert serially written code to a symbolic data flow graph and automatically schedule the tasks to free cores in order to speed up the program execution.
- Another example would be companies with complicated build/render pipelines. The build process could be written in a normal serial way and the data flow engine can automatically schedule independent tasks to run on different servers on the server farm. If a new git commit were pushed to production, the data flow graph could just recompile the dependencies in the graph that were affected thus saving a lot of time compared to compiling it from scratch.
I don’t know how the compiler would work exactly or if it would even work but it would surely be extremely convenient. After all, it’s a lot more easier to write serial programs.
UPDATE: Found this cool paper which does the serial -> parallel compilation for c programs, check this out if you are interested http://www.netlib.org/utk/people/JackDongarra/PAPERS/from-serial-europar2012.pdf