We can all agree that discovering cousins in your relative matching report is an exciting event. But can we really detect all of our cousins using genetic genealogy? The answer is no. DNA can only detect a small fraction of our true cousins. This blog post explains why and how the DNA.Land’s Relatives of Relatives Report can be used to bridge the gap.
1: How many cousins do you have out there?
A few years ago, a report by Henn et al. from 23andMe sought to analyze the number of true cousins of individuals. They used a simple model that assumes (a) a fertility rate of 2.5 children per generation (b) all children survive to fertility age (c) all children reproduce at the same rate. The model is actually a fair approximation of the Western world. The gross fertility rates of Western societies were on the order of 4–5 kids for most of the 19th century (see Figure below) but death rates before fertility age were on the order of 40%–50%. So the net effect is that an average family produces about 2–3 fertile offspring per generation, which is concordant with the Huff model.
Under the Henn model, you would have 190 third cousins, 940 fourth cousins and 4,700 fifth cousins, 23,000 six cousins and so on. For every degree of cousinship increase, we expect five times more cousins. See the table below to indicate the number of cousins you should expect:
2: You do not share DNA with many of your cousins.
To be able to detect a cousin via a DNA test, you need to share at least one identical DNA segment (technically known as “IBD segments”). But what are the theoretical chances of sharing at least one identical segment with one of your cousins? With every generation, the random shuffling of DNA due to recombination makes it harder and harder to draw the same segments that your cousin inherited. An excellent paper by Donnelly (1983) analyzed the complex math of sharing IBD segments and reported the following grim picture:
You only share a DNA segment with 30% of your fifth cousins. In other words, DNA analysis, no matter how good, is totally blind to at least 70% of your fifth cousins. Instead of the 4700 fifth cousins that are expected out there, DNA analysis is inaccessible to just over 3000 of them. This has nothing to do with the actual algorithm that is being used, whether you have been sequenced or genotyped, tested with 23andMe or AncestryDNA. You and your cousin just did not inherit the same segments.
A similar analysis also applies to distant ancestors. You have only a 9% chance of sharing genetic material with your 13th generation ancestor. Even if you are connected to a person who came over on the Mayflower in 1620 (about 13 generations ago), chances are that you did not inherit any of this person’s DNA. If we could do a genetic test, you and your Mayflower relative would look like two unrelated people [as a geneticist, I do not think it is a big deal. Our identity and family legacy is bigger than any nucleic acid in our body, but this is a topic for a different post].
Until now, we discussed the theoretical limitations of DNA analysis. In practice, the limitation is even higher. It is computational intractable for an algorithm to detect small DNA segments; genotype errors complicate finding identical segments and DNA arrays miss many true genetic variations. The Henn et al. study estimated that the 23andMe algorithm would miss 85% of your fifth cousins. Also about 50% of your fourth cousins will be inaccessible.
3: How can you find your lost cousins?
While you do not share DNA with a large number of your 4th-5th cousins, you do share DNA with some of them. For example, you might not share DNA with your fourth cousin Alice, but you might share your DNA with her brother Bob. Of course, Bob and Alice share a lot of DNA in common (just not the DNA that is common with you). So if you have the list of relatives for Bob, you could add Alice to your list of relatives.
The report allows you to use your DNA relatives as springboards to search for additional relatives using their DNA. This way you can jump above missing DNA links in your family and get a larger number of relatives. The report presents the relationship with each individual via your shared relative, estimate the certainty of the match, and presents the shared DNA segments.
Of course, the results of the report do not guarantee a true relative. For example, the report might show that you are connected to Alice via her first cousin Bob who is your second cousin. But Bob might be connected to you on his paternal slide and to Alice from his maternal side. Therefore, it is important to think about the report as a starting point to identify additional relatives and an opportunity to communicate and find true matches. For now, we recommend considering only relatives of relatives with “high” certainty to increase the chance of finding a true relative.
- DNA.Land is a free, not for profit website by Columbia University and New York Genome Center. Individuals can upload their DNA data to DNA.Land to know more about their genome and contribute it to advance science.