Alexandria Science
Published in

Alexandria Science

How Much of an Ancestor’s or Relative’s DNA Do You Have?

The only cousin statistics that acknowledge the differences in paternal and maternal relatives due to recombination rates.

One million input Poisson rates, normalized, for maternal and paternal recombination.

The average recombination rate in mothers is about 42. Conversely, genomes in fathers only recombine about 27 times, on average. This leads to a conclusion that’s intuitive to geneticists: More recombination decreases variance, leading to narrower ranges in shared DNA for maternal relatives. Less recombination results in more variance, which is why fully or predominantly paternal relatives can share a much wider range of DNA. This phenomenon has been blogged about by Graham Coop.

I’ve developed an autosomal DNA model. It doesn’t rely on any mathematical tricks to, for example, take bad data and then stretch, compress, or otherwise manipulate them in order to reconcile the data with the peer-reviewed literature. It’s a natural model. What I mean by that is that it simulates the processes that DNA goes through in real life. That includes being separated into two copies of 22 chromosomes of known lengths, recombining based on the known maternal and paternal recombination rates sampled from Poisson distributions, and recombining based on the known maternal and paternal rates per chromosome. The model also includes crossover interference. So far, recombination hotspots or jungles aren’t simulated and they likely won’t be unless required to improve accuracy. But one can see from the standard deviations below that the model is already very accurate.

It can compute the averages and ranges for any relationship, multiple cousin relationship, or combination of kits to calculate DNA coverage.

It isn’t possible for the model to produce the wrong averages of shared DNA unless the simulation user introduced an error. One example of this would be if the simulation were coded to compare oneself to an aunt when the intention was to compare to a half-sibling. It’s fairly difficult to make an error like that, but one should employ strict quality control to ensure it doesn’t happen. The means aren’t very sensitive to the number of trials. I generally do 500,000 trials per simulation, but to get the means to start being off by a tenth of a percentage point for say, full- or half-siblings, one would have to decrease the number of trials down to about 2,000.

You can judge the accuracy of a shared DNA chart or table by the known standard deviations of some of its data points. Veller et al. (2019, 2020) have calculated standard deviations between paternal grandparents/grandchildren, paternal half-siblings, full-siblings, and maternal grandparents/grandchildren. They’ve calculated these for the genomic metric (bp), which represents the amount of base pairs that two people actually share, and for the genetic linkage metric (centiMorgan), which shows what they would share as reported by a direct-to-consumer genotyping platform. While my correspondence with geneticists has revealed that they prefer the bp metric, I’m reporting genetic linkage results below for users of genotyping platforms.

I’m currently updating this page with the most recent results. Data resulting from the new changes will be shown below, then a clear separation will be made, and then data resulting from an older version of the model will be shown below that.

Table 1. Shared DNA between siblings. Standard deviations for relatives for which values are available in the literature to compare to are given one extra decimal point here to show how closely they approximate known values.

It’s hard to say which is a bigger advantage for this method of computing shared DNA averages and ranges, that it’s the most accurate method or that it can compute any combination of relatives. The latter function is illustrated below, as the model easily computes any type of 3/4 sibling or double first cousin.

Table 2. Results for shared DNA between six different types of 3/4 siblings. HIR = ‘half-identical regions,’ where one of the two chromosome homologues matches. FIR = ‘fully-identical regions,’ where both copies of a chromosome match. HIR + FIR = all of the points on chromosomes where two people match once plus all of the points where they match on both copies. HIR counting includes FIR bp, but only counts them as if they’re half-identical.
Table 3. Results for double first cousins. All parameters are the same as for Tables 1–2.
Table 4. Results for grandparents or descendants of grandparents. All parameters are the same as for Tables 1–3.
Table 5. Results for descendants of grandparents, continued, for half-relationships. All parameters are the same as for Tables 1–4.
Table 6. Results for descendants of great-grandparents.

Results below are from an older version of the model.

Table 7. Results for 2nd great-grandparents.
Table 8. Results for 3rd great-grandparents. All parameters are the same as for Table 7.

I hope you’ve found these results useful. More will be on the way.

Feel free to ask me about modeling & simulation, genetic genealogy, or genealogical research. And make sure to check out these ranges of shared DNA percentages or shared centiMorgans, which are the only published values that match peer-reviewed standard deviations. That model was also used to make a very accurate relationship prediction tool. Or, try a calculator that lets you find the amount of an ancestor’s DNA you have when combining multiple kits.

Originally published at



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store