Image credit: mcmurryjulie via Pixabay [public domain] (source)

Are You And I Related?

It seems a simple question: Are you and I related? When we phrase a question in this manner, we assume that there is a simple “yes” or “no” answer — either we are related, or we are not. This implies that there is a clear criterion for determining the answer. If the answer is yes, then we must have an ancestor in common. However, the hidden assumption is that any common ancestor must be within three generations (great-grandparents) or perhaps four generations (great-great-grandparents) — or else it doesn’t count. If we have to go back farther to find a common ancestor, then most people would say that we are not actually related.

A completely different mental model is to perceive all people in the world as being related. If you trace back two people’s ancestry for a sufficient number of generations, then you are guaranteed to find a common ancestor. Under this model, it is simply a matter of degree: I know that you and I are related — the question is how closely. This question can also be flipped on its head by asking how unrelated is it possible for two people to be. How many generations would you have to trace back to find a common ancestor to every person alive today?

There are also two different ways of thinking about how related two people are. In the first method, we consider only the family tree — in which case two people might be siblings, cousins, second cousins, and so on. In the second method, we consider how many genes two people have in common — which of course can be discovered through genetic testing. These two methods can provide somewhat different results. For example, you might have several first cousins — and under the family tree method, you are equally related to every one of them. But under the gene method, you might be slightly more related to one than to another. But for the moment, let’s simply consider some interesting mathematical issues related to tracing your family tree.

Consider, for example, when someone boasts “My ancestors came over on the Mayflower!” The Mayflower landed on the American coast in 1620, approximately 400 years ago. The average time between human generations is roughly 25 years — therefore the Mayflower landed approximately 16 generations ago. In other words, that would be the time of your great-great-great-great-great-great-great-great-great-great-great-great-great-great-grandparents. This is an awkward phrase at best, so let’s use the term “P-16” to represent all ancestors that are 16 generations back. (The P-1 generation represents your parents; P-2 is your grandparents, and so on.) How many P-16 ancestors does a person have?

We have two parents, four grandparents, eight great-grandparents, and so on — doubling the number with each generation. The result is that each of us should have 65,536 ancestors in the P-16 generation. So when I hear the phrase “My ancestors came over on the Mayflower”, my first thought is “That must have been a really big boat!” (The Mayflower actually had only 102 passengers, plus crew.)

Of course, the reality is that no one actually has 65,536 distinct ancestors in the P-16 generation, because there is always a certain degree of inbreeding after so many generations. There are indeed 65,536 available slots in the P-16 generation of your family tree, but some of your ancestors occupy more than one slot in that tree.

Now consider the P-30 generation, which would be approximately 750 years ago, around 1270 A.D. This would put us into mediaeval times in Europe. Each of us has more than a billion slots in our family tree for the P-30 generation — yet the entire population of the world at that time was less than a billion people. If you could actually construct a family tree going back to the P-30 generation, then you would not only have a billion slots in that generation, but you would also have a half billion slots in the P-29 generation, a quarter-billion slots in the P-28, and so on — a total of two billion names in the family tree. Of course, there would be a huge amount of repetition. Some of your ancestors will appear hundreds or even thousands of times in that tree.

Now let’s go back farther, to the time of the Roman Empire. Julius Caesar died more than 2000 years ago, in the year 44 BC. That’s approximately 84 generations back. Each of us has about 20,000,000,000,000,000,000,000,000 ancestors (20 septillion) in the P-84 generation. But back then, there were only about 200,000,000 people (200 million) in the entire world. This means that, on average, each person alive at that time must appear 10,000,000,000,000 times (10 trillion) in your family tree. Of course, some of the people alive at the time don’t appear in your family tree at all, because they had no children, or they produced lines that soon died out.

Such huge numbers may seem completely counter-intuitive, but it’s the simple result of doubling the number of ancestors in each generation that you trace back.

Suppose now that you were to go back 7000 years, to around 5000 B.C. By this time humans had migrated to every continent except for Antarctica. In several parts of the world people were already developing agricultural civilizations. The people alive at that time would be your P-280 generation of ancestors, more or less. However, the number of P-280 slots in your family tree is greater than the number of atoms in the entire universe. Numbers of this magnitude put a completely different spin on the concept of ancestry. Instead of thinking about individual ancestors and where they lived, it makes more sense to think about genes — and where the variants of different ancestral genes first arose in the world.

So how far back would we have to go to be absolutely sure that you and I have an ancestor in common? If you and I both have European ancestry, or both have Asian ancestry, or both have African ancestry, then 750 years (30 generations) is enough to be certain that we have many ancestors in common. After all, we each have a billion slots in the P-30 generation of our family tree. But suppose that you and I have roots in completely different parts of the world. Then how far back would be have to go to be sure of a common ancestor?

The key factor here is migration. Throughout human history, there have been people that have moved from one place to another — sometimes very long distances. Just as important, people have not consistently moved in one direction. Although there have often been waves of people moving in a particular direction, there have always been people moving in all other directions as well. People moved not only within continents, but also between continents. Even the Bering Strait, dividing the Old World from the New World, was a highly permeable barrier, with people often moving back and forth across it. The result is that over a period of a few thousand years, genes that arose anywhere in the world could easily have migrated to any other place in the world where people lived.

If you go back far enough, then you will come to a time when there was someone alive who is an ancestor to everyone in the world today. Based on careful mathematical modeling of historical patterns of human migration, it appears that the last common ancestor to all of today’s humanity lived between 2,000 and 5,000 years ago. Therefore, even if you and I appear to have roots in completely different parts of the world, we can be sure that we have common ancestors sometime in the past 5000 years.

If we go back a little farther, then the story gets even stranger. Consider the people who were alive 7,000 years ago, in 5000 B.C. Everyone who was alive at that time is either an ancestor to everyone alive today, or else is an ancestor to no one alive today. Therefore, the list of your living ancestors as of 7,000 years ago is identical to list of my living ancestors as of 7,000 years ago — and the same goes for every other person in the world.

But if everyone alive today has exactly the same set of ancestors as of 7000 years ago, then why don’t we all look alike? How could there be any genetic differences between us? There are several reasons for this:

1) Remember that the P-280 generation in your family tree contains more slots than there are atoms in the entire universe. In other words, there are far more P-280 slots than you have genes in your DNA. Therefore, you did not actually inherit DNA from every single P-280 slot in your family tree.

2) The P-280 generation is so vast that each of your ancestors from 7000 years ago appears trillions of trillions of times in your family tree. But the frequency that one ancestor appears in your tree can be quite different from the frequency of another ancestor. If ancestor A appears millions of times more often than ancestor B, then this will have a major impact on your particular mix of genes.

3) New gene variants inevitably appear over time, and some of these variants have appeared within the past few thousand years. Such recent genes are more likely to be restricted to limited subpopulations of the world.

To look at it another way, there has always been a tradeoff between genes moving from one place to another due to migration, and genes staying local over a period of many generations. This tradeoff means that the mixing of genes around the world is always incomplete, allowing typical mixtures of genes to become associated with specific places in the world.

Still another way of looking at the issue is to trace the ancestry of a single gene. Every person has 23 pairs of chromosomes, and each chromosome has an average of approximately 900 protein-encoding genes. Therefore, a human has approximately 20,000 pairs of genes. The genes are in pairs because you have two copies of each gene (for most of your genes) — one copy inherited from your mother, and one copy inherited from your father.

In one sense, every one of us has exactly the same set of 20,000 genes. This is the human genome, which is what makes us human. However, many of these genes exist as two or more variants — and it is these variants (also called alleles) that result in the genetic differences between people. To put it in over-simplified terms, all of us have a human gene for eye color — but some of us have the brown-eyed variant of the gene, while some of us have the blue-eyed allele instead. To be more precise, each of us has two copies of this gene — and we might have two identical copies, or we might have two different alleles. (This example, while easy to understand, is actually erroneous, because eye color is controlled by the interaction of several different genes, not just one gene.)

A much more accurate example involves your ABO blood type. The human genome contains a gene for blood type, and there are three variants of this gene, called A, B, and O. You inherit one copy from your mother and one from your father — and these two copies may be identical or different. The six possible genetic combinations are AA, AB, AO, BB, BO, and OO — although there are only four blood types: A, B, AB, and O. (AO produces the same blood type as AA, and BO produces the same blood type as BB.)

Each of our genes consists of a string of base pairs, which are given the abbreviations A, C, G, and T. A typical gene contains around 25,000 base pairs, although this number can vary enormously from one gene to another. Each allele, or variant, of a single gene tends to be quite similar to all the other alleles, differing by a relatively small number of base pairs. These differences slowly accumulate over time, making it possible to reconstruct the history of a gene — that is, the sequence in which the changes appeared, producing the various alleles. By making some logical assumptions about the rate of mutation, it is possible to produce an estimated date as to when each allele first appeared.

As you trace the history of a particular gene in this manner, you will eventually come to a version of the gene that is ancestral to all the alleles currently in existence. The person in whom this gene resided is therefore an ancestor to every person in the world today. If this ancestor lived more than 7000 years ago, then this is no real surprise, because everyone in the world at that time was either an ancestor to everyone alive today, or to no one. The interesting aspect, therefore, is that we can attach a specific gene to this ancestor. By examining the distribution of alleles around the world, and considering when each allele first appeared, we can often identify the approximate location where this long-ago common ancestor lived.

Some of the easiest genes to trace in this manner are found in our mitochondrial DNA, rather than our nuclear DNA — and we inherit this DNA directly from our mothers. (There are only about 13 protein-encoding genes in mitochondrial DNA.) Researchers have concluded that the most recent common ancestor (MRCA) for our mitochondrial DNA lived in Africa about 100,000 to 150,000 years ago. This person has been nicknamed “Mitochondrial Eve”. It is important to note that other men and women alive at the same time also contributed to our modern human DNA — but no other mitochondrial DNA has survived into the present. However, genetic variants have appeared since the time of Mitochondrial Eve, and therefore we have at least eight major lines of human mitochondrial DNA in the world today.

In theory, we can go through a similar process for any of the genes in our nuclear DNA as well. First we choose a gene for which there are several different alleles. We study the sequence of base pairs in each allele, and use the results to reconstruct the history of that gene, tracing it back to a single variant in one of humanity’s common ancestors.

The upshot is that we have multiple methods to determine how related you and I are. One method is to compare your genes to my genes for each of the 20,000 pairs of genes — counting up how many times you and I have different alleles for the genes. The fewer differences there are, the more related we are. We could go even farther and take into consideration how related our different alleles are. Did most of our differing alleles diverge fairly recently, or a very long time ago?

From a practical standpoint, the inexpensive commercial DNA tests now available do not attempt to sequence all the DNA in your genome. Instead, each of these services chooses a limited set of genetic “markers” and sequences the DNA in these short segments. The theory is that these samples are adequate for the purpose — in the same way that opinion polls are based on the responses of a few thousand people, and yet are assumed to roughly represent the opinions of the entire nation.

Like what you read? Give R. Philip Bouchard a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.