The Architecture of Life’s Source Code
A Computer Scientist’s Musings About Gene Expression
I have been reading Robert Sapolsky’s book Behave which explores brain development, what that development means for behavior, and the enormous complexity of animal behavior. The book is expansive in its approach. Sapolsky takes readers from what happens in the neurons that fire in the milliseconds before the behavior to the millions of years of evolutionary biology that caused those particular neurons to exist (and a lot of stuff in between). The book’s discussion of the chemical mechanisms that execute our genetic source code, DNA, really got me thinking.
At the same time I’m teaching a computer architecture class, which is largely about the electrical mechanisms that execute computer source code. The juxtaposition of these two wildly different, yet remarkably similar systems fascinates me. The genomes of Earth’s inhabitants are at once a mirror and a foil for modern computer programs.
I’ve been struck before by the similarities between DNA and computer code. Our DNA is made from a bunch of nucleotides, of which there are four kinds (A, T, C, and G). These nucleotides always bind together in pairs; A with T and C with G. Wave your hand a bit and DNA looks a lot like binary code. Perhaps “quaternary” code is more accurate but regardless: it’s a discrete system that closely resembles the way we store digital information.
In their implementation though, our “biological computers” are a far cry from modern electronic computers. Our genetic code is executed by chemical processes that do not resemble the electronic processes governing computer engineering.
In a computer, programs are represented by a large collection of individual “instructions”. Each instruction is represented by a chunk of binary data. Collectively these chunks reside in “memory.” When an instruction is executed, the CPU retrieves it from memory and “decodes” it. This decoding process divides the binary chunk into even smaller sections; the values of these sections control which circuits are activated, which data is processed by those circuits, and where to store the result of that processing (usually somewhere in memory).
In a miraculous, beautiful, and strange loop, some instructions indicate which (and when) other instructions should be run. It’s binary data all the way down.
On the one hand, a gene is similar to a computer instruction. A gene is a series of discrete values (A, T, C, or G instead of 0 or 1) which taken as a whole mean something. In a computer an instruction means, “activate these pieces of hardware using this data.” In our bodies, a gene means, “make this protein.” Proteins do all kinds of things in our bodies, they are (or are used to make) enzymes, neurotransmitters, endorphins, steroids, and more. Collectively, proteins are the major player in cellular function.
On the other hand, our DNA is executed primarily by chemical processes which are completely unlike the electrical current that powers a modern CPU. Instead, DNA is “activated” by a special family of proteins called “transcription factors.” Each individual transcription factor is built in a unique way, which causes it to bind onto specific parts of our DNA — different transcription factors bind to different DNA sequences.
When a transcription factor binds to a DNA section it either attracts/enables or repels/blocks other cellular mechanisms that interact with DNA. When an “activator” transcription factor binds to DNA it enables a DNA polymerase to also bind with the DNA and begin the “transcription” process through which proteins are made. When a “repressor” transcription factor binds to DNA it blocks DNA polymerases and other transcription factors from the site, preventing transcription and therefore prohibiting the production of a particular protein.
DNA has its own strange loop.
Transcription factors are a protein. They are made in our bodies via the same transcription process as other proteins. We have genes that code for transcription factors which regulate which proteins get built, including other transcription factors. It’s proteins all the way down. Utterly miraculous, and at least as beautiful as the “Von Neumann architecture,” that powers modern computers.
One more similarity between our DNA and computer architecture has caught my attention recently. While historically geneticists have been focused on genes, lately there has been more research into “non-coding” DNA. Such DNA — once dismissed as “junk” — comprises the vast majority of all DNA, upwards of 95%.
We have since discovered that non-coding DNA contains (among other things) the DNA sequences that transcription factors bind to. Non-coding DNA does not contain the instructions for building proteins, instead it contains the instructions for which proteins to build and when to build them. Our genetic memory is divided into (at least) two separate parts.
Computers also have separate memory sections. One major section is the section for all the instructions of a program. Another section is for the data that the program creates. The data created by the program can (and does) influence the execution of the program itself; just like the creation of proteins (in particular transcription factors) influences the execution of our genetic code.
When transcription factors are made they will bind to non-coding sections of DNA, triggering the expression or suppression of specific genes. Sometimes this will cause the creation of other transcription factors; our cellular infinite loop.
One crucial difference though, is that these cellular computers are a massively parallelized. While a single CPU can only ever execute one instruction at a time, plants’ and animals’ cells can all create proteins independently — and there are lots of cells in a single human.
Furthermore, our cells take input from other parts of the body as well as from the external world. These stimuli result in changes to gene expression and cause us to interact with the external world. Those interactions in turn are stimuli for other life forms, causing them to activate and deactivate genes in their own DNA.
Life is caught in an endless exchange of information, each cell humming along without the foggiest clue about how the other systems work. Viewed in this way, animal intelligence is more like a massively distributed system than a single neural network.
One thing that Behave has made clear to me is that the brain is far from an isolated system. The electrical signals created and processed by the brain are one small part of a much larger design. If we want to build a true “general artificial intelligence,” maybe computer scientists need to look beyond the function of neurons and into the depths of cellular chemical interactions.
AI has a long way to go before it reaches the level of complexity that “life” has achieved. Nevertheless, it’s quite fun to think about the escalating interactions of robots on the internet. Russian trollbots are being moderated by Twitter built AI’s. Automated spambots are being moderated by Google built AI’s. And some AI’s have been trained to communicate privately with each other using AI invented languages that humans do not understand.
Perhaps general intelligence in computers will arise the same way it did in animals — through the massive intermingling of an extraordinary number of individually simple systems. Only time will tell.