How do we learn from TensorFlow and dream about the future of silicon valley?
Last week while I was browsing Medium, I saw an interesting ML article that talked about AutoGraph, a tool for converting normal Python code into Graph code that TensorFlow uses for execution.
AutoGraph converts Python into TensorFlow graphs
By Alex Wiltschko, Dan Moldovan, Wolff Dobson
From my understanding, TensorFlow is built and optimized around computational graphs that model the flow of data. It would be very efficient if we can directly program in the graph language — but programming with graphs directly might not be as intuitive as typical programming languages such as Python. Therefore, TensorFlow provides a Python library that allows users to write graph-generating Python code that builds the graph first, and TensorFlow core runs the graph later.
If we are working on simple projects that do not require much graph-level optimization, we can use eager execution. Basically, it’s a programming environment where TensorFlow allows execution from Python directly — but with a price of inefficiency. With eager execution, we can directly invoke arithmetic operations such a matrix multiplication from the Python interpreter.
The article that I read introduced AutoGraph, a TensorFlow tool that converts eager execution code into graph-generating code without much cost in runtime efficiency. With AutoGraph, we can live in a world where we write ML code in an intuitive, linear and reasonable fashion, but in the end, the code gets transpiled into a more efficient isomorphic version of itself for better execution.
Converting human-readable code into a more computer-friendly graph is such a brilliant idea. In fact, after reading about this, I think we can explore into this idea a little more.
Why did we have human-readable computer programs in the first place?
To answer this we need to revisit when we first invented anything that has computing power. Back in the days when we just tamed transistors and realized that with a pile of transistors and electricity we can read/write logic values to wires, we needed a good architecture that is capable of managing hundreds of thousands of transistors and do meaningful work.
The Rise of Von Neumann
Then we had the Von Neumann Architecture. The architecture proposed that electronic digital computers should follow a design architecture by having the following components:
- A Central Processing Unit(CPU) that contains an Arithmetic Logic Unit(ALU) and a bunch of internal registers.
- A Control Unit that contains registers for storing the current instruction being executed(Instruction Register) and its pointer to memory(Program Counter).
- A Memory that stores data and instructions.
- I/O mechanisms.
Basically, the architecture gave us an abstraction for each key component in a computer. It also had an underlying premise that we have to break down tasks into units of instructions. Each instruction invokes related work from a part of the system depending on its type. The work here could be anything — computing 1+1, saving a value to the memory, read from user’s input, etc.
We invented a bunch of building blocks(instructions) and created a harness(architecture) to use those blocks to build into something that achieves arbitrary tasks on top of the silicon quicksand.
In fact, all modern day computers are stilled based on Von Neumann Architecture.
All Hail Assembly
While we are still using the architecture, there is always this inevitable step if we want to use the computer to achieve some tasks — to break down a complicated task into smaller, dumber instructions and use a combo of these instructions to achieve something bigger. That’s also why Computer Science education always focuses on breaking down large problems into smaller and achievable micro-problems all the way down to the granularity of CPU instructions.
During high school, I bought a TI-84 calculator for my Honors Algebra course. One day I discovered that I could also write programs on the calculator. The programming instructions from TI were the following.
Introduction to TI-Basic on your TI-84 Plus CE
You can use TI-Basic to create a program on your graphing calculator. You can create a program that will calculate a desired output or control an experience, such as a game.
What Is a Program?
A program is a set of one or more command lines, each containing one or more instructions. When you execute a program, the TI 84 Plus CE performs each instruction on each command line in the same order in which you entered them.
A program is a set of one or more command instructions. By using a combination of if-statements, goto/label and I/O functions such as detect key press or print text, I created my first TI-84 game: Tank Warfare. It was a simple game where it randomly spawns enemy targets, and the objective of the game was to drive a tank with the arrow keys and shoot at the targets. I even posted on Facebook after, complaining about how I spent 4 hours trying to figure out how to write a program.
Now to think of it, the programming environment on the TI-84 was pretty primitive. The basic building blocks were nothing fancy, and it took me a long time getting used to the environment and adjusting the way of thinking. The calculator starts the execution by running the first line of the program. The only way to jump around the instructions is to use goto and label. Basically, I place a label at the beginning of a functional block, and I can use goto to direct the CPU control flow to jump to any labels.
If we print out the program on paper, you will notice that it is written in the same flattened format as those personality tests — If you choose A then go to question 3, else go to question 2; If you choose B then go to question 4, etc.
In fact, assembly code works the same way. Basically, when any program gets compiled down to assembly, it’s essentially just a 1-D unwrapped list of instructions with special control flow redirects sprinkled in between that give multidimensional meanings to the program.
Birth of Programming Languages
Although writing in assembly is cool — you get to experience what exactly the CPU is doing, I have to admit that writing assembly code sucks. Because our natural logic structure is not flattened at all. It’s painful to adjust one’s way of thinking into this long-running noodle and try to jump back and forth.
Therefore, we created High-Level Programming Languages that are supposed to make more sense than assembly. Programs written in high-level languages are usually more readable and easier to develop with because we tend to hide CPU-specific details from developers and let them concentrate on their actual purpose.
The early programming languages usually have a direct 1-to-1 mapping to assembly. Later languages usually need a more sophisticated compiler to compile human-readable code into CPU-readable assembly code. Scripting languages such as Python have its interpreter, which interprets every line of code during runtime and choreographs corresponding CPU instructions accordingly. Technology advances exponentially and now we are in a world that we generate repetitive boilerplate code and fill in the remaining logic to create new programs.
With the birth of different programming languages, developers nowadays are more separated ever than before. Trailblazers pave out the way and write god-like libraries in their favorite language and all the later followers will adopt the same language for convenience. Since there are so many parallel options for the same purpose — no matter if it’s for database, front-end, back-end or OS — we created the term “Tech Stack”. Every tech company will have its a tech stack, which is basically its own selection of technologies that act as the backbone of the company’s core offering. Switching careers in the industry also often means that one has to ramp up to a new tech stack and adopt a new way of thinking.
This means that we essentially segregated ourselves into different programming language silos. Knowledge sharing in between different silos became hard and barriers of switching from one to another got higher. It’s like what God did to the mankind for the Tower of Babel — creating different languages and cultures so human beings cannot be as united as before.
Alternate Universe: Graph Layer & Meta-Programming
Circle back to the AutoGraph article — after reading about it, I have this idea that maybe in the future we can unite all the different program languages again so we can end this knowledge segregation.
We created programming languages because we don’t want to touch the bare-metal assembly code that has too much CPU-specific information. But we also suffer from different languages because each of them requires a different way of thinking and it’s hard to context-switch. The solution is to add something in between.
Intermediary Graph Layer
Just like how TensorFlow works on a computation graph, what if in an alternate universe, we create an intermediary graph layer between programming languages and assembly instructions? This layer basically contains purely computational and platform-agnostic graphs that can get interpreted and optimized into platform-specific CPU instructions. We then introduce an AutoGraph-like tool that compiles regular code into computational graphs.
The changes to the original programming flow happen under the hood — a normal piece of C or Go (or anything else) code gets compiled into a computational graph, and later a graph-interpreter runs it. Therefore the developer-facing programming experience will not change much.
What changes the most is how tech companies store large software projects. Now every tech company probably has their own repository somewhere storing the source code of their software product. In this alternate universe, tech companies will store annotated computational graphs into the repository instead. One complete graph is an isomorphic representation of a working software system and can be directly executed by a TensorFlow-like runtime.
When a developer finishes writing a new feature in their most comfortable language, we run a first-order compiler that compiles the code into a graph and append the graph to the existing graph repository. The compiler can also save developer-specific annotations such as variable and function names into the repository so later we can reconstruct the graph back to human-readable code.
When the developer wants to modify an existing sub-graph, the system extracts the sub-graph and its annotations and uses its best-effort to generate human-readable code in any programming languages from the annotations. Developers can then modify the code and compile a new graph to replace the original one.
Imagine the workflow of building a new backend service that takes in information from one database and updates an entry in another database. On the graph level, this can be modeled as one input data node, one processing node, and one output data node. We can then implement each node with any programming language. Later we compile the nodes into sub-graphs and join them together. In this way, not only the program itself is represented as a graph, but the entire end-to-end system can also be modeled as a graph.
With this graph layer, tech companies can truly achieve a barrier-free developer experience. Any developer can develop on the same project regardless of the programming languages they use, and they can also work remotely because all the context is well-represented with the graph interface so they don’t have to learn unnecessary context.
Meta-Programming and the Future of DevOps
Meta-programming stands for a programming technique where the program treats its self just like data and generates new programs. In order to build the graph layer from any arbitrary language, we will need to do a little bit of meta-programming by designing a compiler that analyzes the input program and compiles the output graph.
In fact, we have been doing something similar already. In modern tech companies, DevOps is usually in charge of making the developers’ lives easier — by making tools and scripts to automate a lot of the repetitive process. At my current workplace, we have code-generation tools maintained by DevOps that generates SQL bridges from Golang types so we don’t have to manually define them every time we create a new table. We also use Google’s Protocol Buffers that automatically generates type-safe messages for different platforms and languages.
I think the future of DevOps is going to be so crucial to the new software engineering experience. As more people are entering the tech industry, entry barriers have been lowered and in the future, the amount of knowledge needed to bootstrap a simple web-app or feature has decreased. Eventually, feature developers are going to become the new generation of labor workers.
In this ideal future, all of the current software engineers who acquired more computer science knowledge than the newcomers will be meta-programming the tools and compilers that helps the feature developers to convert and embed their work into the world of graphs.
However, we are still far away from the dream. We need to design these sophisticated compilers that take in any arbitrary program language and generates computational graphs. We also want to invent new reversible compilation algorithms so we can reconstruct code from graphs without loss of context and coding habit. Also, we might want to further improve developer experience by automatically reducing and merge repetitive sub-graphs.