Human genome variation and admixture

David Díez
5 min readOct 2, 2015
a) Map of worldwide variation. b) Number of variants per sampled population. c) Number of singletons per population. ©2015. Mcmillan Publishers Limited. All rights reserved.

The final papers from the 1k Genomes Project are out. Links can be found here and here.

In this last release, the consortium unveils 2504 human genomes sequenced by a combination of whole-genome sequencing, deep exome sequencing and microarray genotyping data from 26 worldwide populations. As they report, there are now more than 88 million of variant positions in the human genome across populations, including mainly Single Nucleotide Polymorphisms (SNPs) but also dozens of thousands of structural variants, such as Indels or Copy-Number Variants. Overall they represent up to 40% more variants than were previously know in humans. As the basic reference panel for human genetic variation the data released for the 1kG project is only comparable to the milestone that was the first human genome ever sequenced, back in 2003, by the Human Genome Project.

I personally like a lot the figure I reproduced above. With a simple representation of the percentage of variants from each sampled population that are exclusive from that population, exclusive from that continent, shared across continents or shared globally, we can get a glimpse of how variation is distributed in human populations across the globe. For example, African and Asiatic populations have more continent specific variants than the European or American ones. The relative size of the pie chart is also a reference of the total number of variants in that population, being African populations clearly more variable than the rest.

The map is cool, but to represent the number of variants in each population the figure b) is even better. The authors make a good point in the text. If there was still anybody doubting about the hypothesis of Out of Africa of the origin of modern humans, figure b) is quite clarifier. Note how African populations are remarkably more diverse than Asiatic and European populations, probably as response of the strong population bottleneck suffered by our distant cousins when leaving Africa. Interesting is the case of American populations, with an intermediate number of variants between Africans and Eurasians. These populations present high levels of admixture with other populations, and hence, gained their actual levels of variation in relation of their percentage of admixture (mainly with Africans). In fact, these American populations have a virtual absence of continent-exclusive variants.

Major migration events between human populations detected using haplotype-based methods. Busby et al. 2015. ©2015 Elsevier Ltd All rights reserved.

The role of admixture in variation is therefore a major concern when explaining how human populations were shaped by evolution in the last thousands of years. Not in vain, several recent important papers both including ancient DNA (here, here and here) and just modern data- but novel methods -(here) have putted admixture in the spot as the most fundamental force sculpting human populations as we know them today.

I can't avoid, however, the feeling that somehow we may be missing the point here. If what we want is to explain what makes a population to be like it is, then just detecting that population A mixed with population B it is only interesting in the context of detecting historical (or prehistorical) movements that can help us to explain cultural, language, or behaviour differences (or similarities) between these populations. Which is a lot. But if what we really want is to explain the specific role of any evolutionary force in the process, then just detecting events of admixture seems pointless. For example, imagine that we are able to detect traces of admixture between population A and B. When did that happen? What was the actual impact of that admixture on the population A? Did individuals of population B gain any relevant variant in the process? Was the level of admixture between them high or small? Enough to overcome the effects of selection on some variants in population B? All these sorts of questions are as important as the fact that admixture existed between two populations. Sadly, in part because these questions are more difficult to address they are receiving less attention. We cannot neglect the role of genetic drift in human evolution just because admixture is more sexy.

There are, of course, very cool studies addressing questions related to admixture like the ones I mentioned. For example, in here, Emilia Huerta-Sánchez and colleagues found evidence of admixture between Denisovans and human populations from Asia. The interesting part, however, is that this interbreeding event probably introduced variants of the genes EGLN1 and EPAS1 related with high-altitude adaptation by lower haemoglobin concentration already present in Denisovans to Tibetans later enabling them to deal better with life in the Himalayas. In summary, they illustrate how introgression with other hominin species has provided genetic variation that helped human populations to adapt to challenging environments.

Example of a genomic island of domestication in Asiatic and European pig genomes (ASD and EUD) that is not present in wild boar (ASW and EUW). Frantz et al .2015. © 2015 Nature America Inc. All rights reserved.

Another example of admixture-based study that I like is this one about the origins of pig domestication from Laurent Frantz. They sequenced over a hundred pig genomes including several European and Asiatic domestic breeds and wild boar lineages, and using Approximated-Bayesian Computation (ABC) estimated that gene flow between wild boar and domestic pigs has been continuous since the neolithic domestication events. But the authors didn’t stop there, they analyzed the effect of this gene flow on areas of the pig genome associated to domestic traits such as loci that affect behaviour and morphology, and discovered that there are genomic islands of domestication, areas unaffected by gene flow from wild boars probably due to artificial selection of these traits.

Just like with the 1000G project, more and more human genetics scientists are understanding the value of collaboration by openly sharing their data. In the face of abundant and exciting datasets, including very good quality genomes both ancient and modern, I believe it is time to move forward and use hypothesis-driven analyses to understand how the interplay between evolutionary forces explain the origins and history of human populations and not just describe the data.

--

--