Is DNA Like a Blueprint, a Computer Program, or a List of Ingredients?

The term DNA (short for deoxyribonucleic acid) often appears in the media these days. Because of these frequent encounters, we all have a general idea of what the term means. When we think of DNA, we think of genes — and therefore we correctly associate DNA with genetic inheritance. In particular, we think of the human genome, which is the complete set of genes in a human being, distributed across 23 pairs of chromosomes. Yet our common understanding of DNA tends to be highly limited — and somewhat inaccurate. Consider, for the example, the following three statements. Which one of these assertions would you judge to be the most accurate?

#1: The human genome is much like a set of blueprints. Our living cells use these blueprints as guides to construct the human body.

#2: The human genome operates like a computer program, with coded instructions for building and maintaining a human body.

#3: The human genome is primarily a set of lists that specify the sequence of ingredients for assembling proteins. Our DNA does not actually contain any plans or instructions for building and maintaining a human body.

There are good reasons to quibble with all three of these assertions, but the most accurate of the three is #3. The human genome is not at all like a blueprint, as we shall see. The comparison with a computer program is somewhat more enlightening, but still highly misleading. But it is indeed true that the human genome is primarily a collection of ingredient lists for assembling proteins. DNA also includes templates for RNA molecules that help regulate the manufacture of these proteins. Therefore DNA is all about making proteins, without any specific plans or instructions as to how these proteins are supposed to yield a human body — or an elephant, or an oak tree.

(Note: DNA and RNA are very similar types of molecules. Your genetic inheritance is permanently stored in long DNA molecules. RNA molecules tend to be much shorter in both length and lifespan, serving as useful but temporary copies of certain stretches of your DNA.)

Statement #3, although mostly accurate, can still mislead our thinking — primarily because the word “proteins” can send our thoughts in the wrong direction. The missing concept is that proteins control most of the development and maintenance of the human body. Therefore our DNA, due to its central role in the creation of proteins, indirectly controls these processes. To make sense of this idea, we first need to take a closer look at the wide variety of proteins in our bodies, along with the crucial roles that these proteins play. Then we can return to the question of how DNA actually works.

The Essential Roles of Proteins in Our Bodies

Most of us think of protein as the material that muscles are made of — and which we must therefore include in our diet to build and maintain a strong body. It is indeed true that our muscles are primarily made of protein — not just the skeletal muscles that allow us to move around, but also the muscles associated with internal organs, such as the heart. However, there are actually many different kinds of proteins in the human body, which perform several distinct roles:

1) Enzymes drive most of the chemical reactions that occur inside our cells. There are thousands of such reactions, with an incredibly wide range of results. Enzymes produce most of the chemical compounds needed by the body, and they also play a huge role in the construction of the body. For example, the cell membranes in your body are primary built from lipid molecules — a multi-step process driven by several different enzymes.

2) Messenger proteins carry signals between different parts of the body, coordinating biological processes that involve multiple cells. For example, the hormone insulin is a protein that tells the cells in your body when to absorb glucose from the blood, thereby regulating your blood sugar level.

3) Transport & storage proteins move atoms and small molecules from one place to another, either within cells, between cells, or across cell membranes. These proteins also help store some of these materials. A great example of a transport protein is hemoglobin, found in red blood cells, which carries molecules of oxygen from your lungs to all parts of your body.

4) Antibodies are mostly found in your blood, where they bind to foreign particles, such as viruses and bacteria, thereby disabling them.

5) Structural components provide structural support for cells, as well as allowing parts of the body to move (due to the ability of muscle cells to contract). The proteins that allow contraction are actin and myosin.

In short, proteins control most of the processes that occur in the human body. Proteins keep our bodies running, and (rather amazingly) mediate most of the human development process — in which we grow from a single-celled zygote into an adult human being. But it is our DNA that tells our bodies what proteins to build.

DNA as a Template for Proteins

Every human has approximately 20,000 genes in his or her DNA — and two copies of most of these genes. For some of your genes, the two copies are identical — but in many cases the two copies have subtle differences. But what exactly is a gene? The term was invented before we had any understanding of DNA, but now we think of a gene as a coded DNA template for assembling a specific protein. For the most part, each gene is unique — which implies that our DNA contains the instructions for building as many as 20,000 different proteins. However, the picture is actually more complicated than this, because some proteins are built by using only selected parts of a gene — and leaving out other parts. As a result, a single gene can often serve as a template for several variant proteins. Another complication is that some proteins are assembled by linking together two or more smaller proteins, each separately coded in the DNA. Therefore the number of distinct proteins that your body can build is probably at least 90,000.

So what exactly is a protein? A protein is a long chain of amino acids, strung together in a linear sequence like beads on a string. Amino acids are rather small molecules, each containing between 10 and 27 atoms. There are 20 distinct amino acids that are typically used for assembling proteins (although under certain circumstances two other amino acids can also be used). These 20 amino acids have names such as leucine, glutamine, and tryptophan. Each protein consists of a specific sequence of these amino acids. A typical protein is a chain of 250 to 500 amino acids — although some proteins can be much shorter or much longer. Most proteins fold up into a compact shape as soon as they are created — and the specific role that the protein plays is often highly dependent upon the precise shape of the folded molecule.

Each gene in your DNA is actually a coded list of amino acids — like a shopping list or a list of ingredients, except written in code. This list tells your body what amino acids to use when it builds a specific protein molecule. The list also indicates the exact sequence in which these amino acids should be strung together. In other words, the whole point of a gene is to contain information. This information is used by a living cell to build the appropriate proteins.

Inside each cell, embedded in the cytoplasm, there is a collection of tiny protein-building machines called ribosomes. The odd thing is that nearly all of these ribosomes are identical. None of them are specialists. Any ribosome can build any kind of protein that you ask it to build. You simply have to give the ribosome a coded message — copied from a gene in the DNA — that contains the recipe for the desired protein. (The molecule that carries the coded message is called messenger RNA.) The information in the message is just a long list of amino acids, in the precise sequence for building the protein. If the required amino acids are available, then the ribosome will crank out a brand new custom-built protein molecule, according to the specifications contained in the message.

The information for these protein recipes is stored in the DNA using a distinct coding system. To make sense of the coding system, it helps to picture the DNA molecule. You have probably seen illustrations of the DNA “double helix”, a spiraling ladder connected by evenly spaced “rungs”. There are four possible kinds of rung, which are called A, C, G, and T — and these rungs can appear in any order. These characteristics allow for encoded information to be embedded in the DNA molecule.

The core unit of information in a gene is a codon, which is equivalent to a 3-letter word (such as “GGA” or “TAC”), using this alphabet of just four letters (A, C, G, and T). In other words, each codon consists of exactly 3 small bits of information — stored in 3 consecutive rungs — and each bit can be any one of 4 distinct values. Each of the four possible values is represented by a tiny molecule that is incorporated into the rung. These four molecules are named adenine, cytosine, guanine, and thymine — but for practical reasons we refer to them by their first letters.

Now here is the fascinating thing about the DNA coding system. There are 64 different ways to string the 4 possible letters into a 3-letter codon — and each of these 64 combinations has a specific meaning. The codon “GGA” means that the next amino acid in the protein should be glycine. The codon “CAC” means that the next amino acid should be histidine. Sixty-one of the 64 possible codons represent specific amino acids. But because there are only 20 amino acids that are typically used, several different codons can indicate the same amino acid. For example, “CAA” and “CAG” both indicate glutamine. The other three possible combinations — the ones that don’t usually correspond to any amino acids — all mean “stop”, which tells the ribosome that the protein is now complete.

Not Like a Blueprint

So why is it incorrect to say that DNA is like a blueprint for your body? A real blueprint — such as the plans for a house or an office building — is a detailed visual representation of structure. A blueprint shows all the structural components of the planned building, indicating the precise location and dimensions (and sometimes composition) of each part. In the blueprints, you can see the dimensions of every space, the location and size of every column and beam, the routes of the plumbing and the air ducts, and many other key details. Before construction even begins, these blueprints specify the exact structure of the finished building — thereby allowing the project to proceed.

But DNA is nothing like that. DNA contains no depiction of the finished body. There is no schematic showing the skeleton, no sketch indicating the locations of the internal organs, and no diagram illustrating the routes of the blood vessels and nerves. There is nothing that depicts what the finished face should look like. There is no set of specifications indicating how long the finished arm and leg bones should be, nor how fine or coarse the hair should be. In fact, there are not even any specs to indicate how many fingers there should be, or how many vertebrae there should be. DNA is simply a coded list of the amino acids for each protein that the body can make — and somehow this information eventually yields the correct finished product: a human being, or an elephant, or an oak tree.

Not Like a Computer Program

So if DNA is not like a blueprint, then is it like a computer program? After all, a computer program is completely different than a blueprint. Like DNA, a computer program contains no depiction of the finished result. Instead, a computer program precisely defines a process, a set of steps to execute. The process encoded by a computer program is typically called an algorithm. An algorithm is seldom a straightforward list of steps. In addition to being extraordinarily detailed, it is also full of loops and conditional branches. A loop is when the algorithm repeats a certain set of steps over and over. A conditional branch is when the algorithm tests a certain condition, and then — based on the results of that test — either continues forward or jumps to a different point in the process.

To get a sense of what these loops and branches are like, imagine that you have somehow acquired an industrial robot — like those on an automobile assembly line — and you want to program this robot to make drop cookies in your kitchen. You promptly set to work writing the program. When you get to the point where the robot deposits cookie dough by the spoonful onto a greased baking sheet, you realize that this part of the process is a set of nested loops — that is, a loop inside a loop inside a loop. The inner loop directs the robot to deposit dollops of cookie dough onto the baking sheet, one cookie at a time, until a row of dollops has been completed. The middle loop, which repeatedly calls the inner loop, directs the robot to create one row of dollops at a time, until the baking sheet is filled. The outer loop, which repeatedly calls the middle loop, fills one baking sheet at a time, until there is no more dough. Note that each of the three nested loops contains a distinct process to repeat, plus a test to determine when to stop looping.

Now let’s compare a computer program to a blueprint. Suppose, as part of a woodworking project, you needed to create a square piece of plywood, nine inches on each side. In a blueprint, you would see a nice drawing of a square, with a notation indicating that each edge is nine inches long. In contrast, a computer algorithm might say:

  1. Start with a large sheet of plywood.
  2. With a circular saw, make a nine-inch straight cut in the plywood.
  3. Turn the saw 90 degrees to the left.
  4. Repeat the previous two steps three more times.

This algorithm does not depict the finished result — it doesn’t even mention the idea of creating a square. But the specified steps do indeed result in a square piece of plywood, nine inches on each side. Most of the processes in the human body are comparable, in that the end product is not defined in advance — it simply emerges as a result of the process.

In one sense, the comparison of DNA to a computer program is enlightening. There are countless processes going on within the human body at all times, each contributing to the development and maintenance of that body. These processes certainly involve a lot of looping and branching, and therefore they could validly be compared to the algorithms encoded in a complex computer program. The problem with this analogy is that these processes are not actually encoded in our DNA. The language of DNA does not contain codes to indicate the start or the end of a loop, nor when a branch should occur, nor what test to perform when a branch point is reached, nor where in the instructions to jump based on the results of that test. Worse yet, none of the individual steps in the algorithm are encoded either — the language of DNA does not contain any words for this. DNA is simply a coded set of lists that specify sequences of amino acids, plus some templates for creating RNA molecules. Therefore DNA is not like a computer program at all.

What Makes It All Work?

So how does DNA actually work? How does a collection of recipes for making proteins produce a human being, or an elephant, or an oak tree? It seems intuitive that something must be controlling the entire operation.

Well, you could say that there are three levels of management controlling this operation. Upper management consists of the molecules that control gene expression — in other words, turning the genes on and off. To put it another way, upper management decides which proteins actually get made, and when they get made, and in which cells they get made. Middle management takes these newly minted protein molecules and dresses them up as needed, making any final tweaks necessary for the proteins to start work. Lower management provides additional supervision, if needed, after the proteins are ready to begin work.

You might think of the entire operation as a huge ballet, with 90,000 roles (the various types of proteins) and trillions of protein molecules playing each role. Another analogy, perhaps a better one, is to think of the human body as a huge construction project, with each living cell serving as a job site within the project. The protein molecules are the workers — and each worker will have one of 90,000 specialties. Each cell hires its own workers and manages its own job site, but messages are constantly going back and forth between the cells to help coordinate it all. But who directs the whole enterprise? Who sits at the top, making sure that everything is going according to plan?

The best way to answer this question is to take a closer look at gene expression. At any given time, some of your genes are being “expressed” — copied onto messages for the creation of new protein molecules — and others are not. Furthermore, every cell in your body has its own copy of your DNA — and in each cell, gene expression operates independently, managed locally by the cell nucleus. Even for cells of the same kind, their specific location in the body (and many other factors) can result in a different expression of the genes. The upshot is that “upper management” in this enterprise is highly decentralized. As your body grows from a zygote into an adult human being, these differences in gene expression allow the cells to differentiate into various types — becoming skin, bone, muscle, blood, nerves, and so on. The differences in gene expression allow distinct organs to form within your body — heart, lungs, brain, liver, intestines, and so on. This grand ballet ultimately results in the final shape of your body, with two legs, five fingers on each hand, 12 pairs of ribs, 33 vertebrae, and so on.

Coordinating the Grand Project

This amazingly intricate process involves a huge number of distinct steps — along with a lot of looping and branching — just like an algorithm in an enormously complex computer program. Theoretically, one could imagine such an algorithm encoded somewhere in the body, directing all the processes of the body. But imagine what a beast that algorithm would be, in order to turn on and off each of the 20,000 genes in each of the 30 trillion cells in your body — based on an enormously long and intricate set of factors. Such an algorithm would be hopelessly complex — and far too long to store anywhere, regardless of the encoding system. Therefore the human body contains no central authority to micro-manage gene expression across all the cells. Instead, it all works as an emergent system. The details of the human body emerge as the result of all the behaviors of the contributing parts. It’s a bit like an ant colony, which functions as a coherent whole, even though no central authority — not even the queen of the colony — micro-manages the tasks performed by the individual ants. However, the development and maintenance of the human body is incredibly more complex than the behavior of an ant colony.

What about your brain? Doesn’t it control everything in your body? No, your brain controls a lot of things, but it does not control the expression of the genes within the cells of your body. And it does not control your development from a zygote to an adult.

Therefore, to make sense of how the body develops and maintains itself, we have to study the regulation of gene expression at the cellular level — knowing that each of the 30 trillion cells handles the process independently. This is a complicated topic, and there is much that we don’t yet know or understand. But due to the breakneck pace of scientific research, many of the key puzzle pieces are rapidly coming into view. First of all, there are those RNA molecules, copied from areas in your DNA that don’t code for proteins. One of the main purposes of these molecules is to help regulate gene expression. And then there are certain regulatory genes whose principal activity is to turn other genes on and off. Furthermore, there are yet other kinds of molecules that attach themselves to genes, affecting their expression.

A closely related issue is that cells need to cooperate with each other, especially with cells that are nearby. Therefore there are lots messages going back and forth between the cells, and most of these messages take the form of molecules. (Certain other messages are carried by electrical impulses in the nervous system.) Unlike messenger RNA, most of these molecular messages are not coded in any sort of language, nor are the messages divided into distinct words (comparable to codons). Instead, each message tends to have a very simple objective, usually meaning “start doing this”, or “stop doing this”, or “do this faster”, or “do this slower”.

Summary and Implications

Let’s review a few of the key ideas that we have covered, along with some of the implications of those ideas:

1. DNA is not like a blueprint.

DNA does not contain any schematics or other direct information about the structure or appearance of the organism. If you take a single gene from a fish, and insert it into an embryonic tomato plant, then you won’t get a plant that has fins or fish eyes. If the inserted gene is expressed at all, then you’ll simply get one additional protein appearing in some of the plant cells. This protein will probably have little effect on the plant — unless the protein interacts significantly with other molecules, in which case it could have a huge impact on how the plant develops and lives.

2. DNA is not like a computer program.

Human DNA does not contain any instructions for building or maintaining a human body. And while the processes that occur in the body can be roughly compared to algorithms, DNA does not contain any codes to define the individual steps in an algorithm, nor does it contain any codes to control the loops and branches of the algorithm. DNA simply contains recipes for building proteins, and it contains templates for RNA molecules that regulate the production of these proteins. Therefore, you cannot approach the editing of DNA in the same manner as you would approach the creation and editing of a computer program.

3. DNA controls the development and maintenance of the body indirectly, through the proteins that it encodes.

Proteins drive most of the processes in the human body, including our development from a zygote into an adult. DNA contains the recipes for these proteins — therefore DNA indirectly controls the development and maintenance of body, although in a manner that is far from straightforward. To get a reasonably complete understanding of how DNA controls these processes, it is necessary to know at least four key things:

  • The molecules and processes that regulate the expression of each gene
  • The proteins that are produced by each gene
  • How these protein molecules interact with other molecules in the body
  • How genes, and the products of genes, interact with each other to achieve various end results

By studying DNA and the proteins that it encodes for, we can draw correlations between the presence of certain genes and the effects of those genes. For example, by looking at someone’s DNA, it is possible to predict with a reasonable degree of certainty as to whether that person has brown eyes or blue eyes. But more often than not, the effects of our genes are quite complicated, and depend upon the interactions of many different genes, combined with various environmental factors.

4. There is no central authority, and no centralized set of instructions, that controls the development and maintenance of the human body.

Your body has trillions of cells, and every cell has its own copy of your DNA — including all 20,000 genes. Every cell independently regulates the expression of those genes. This gene expression is influenced by the molecules that come in contact with the outside surface of the cell membrane — but otherwise each cell does its own thing. There is no central authority — not in the brain, nor in any other part of the body — that drives the development and maintenance of the human body. Instead, the overall results are an example of an emergent system, in which the behaviors of all the component parts result in a distinct set of behaviors at a larger scale.

This has major implications for what could be feasibly accomplished with genetic engineering — without even getting into the moral and political issues. If DNA was like a blueprint — or like a computer program — then it would be possible for people to invent a completely new organism and to create the DNA necessary to produce that organism. Instead, the only viable approach is to start with the complete genome of an existing organism, and then to make very small modifications to the DNA. This, in fact, is how genetic engineering is currently conducted. Scientists take an existing genome — for example, that of a tomato plant — and then attempt to add or replace a single gene or a very small number of genes. A typical objective is to improve a single characteristic, such as cold hardiness or pest resistance. (Quite often, the inserted genes are found naturally in other members of the same species, or in closely related species.) If and when the objective is achieved, then the people conducting the research might try to improve another characteristic. It’s a gradual and incremental process — a feature that is shared with traditional plant breeding and with natural evolution.

Nevertheless, there are good reasons for society to engage in a serious debate about what specific limits to place on genetic engineering. But for that debate to be meaningful, it is important to have a basic understanding of how DNA actually works. To that end, perhaps this essay has been a helpful step in that direction.