DNA.Land’s Trait Prediction Report

(this blog post was written by Richard Aufrichtig, DNA.Land’s User Engagement Coordinator, and Jie Yuan, a Computer Science PhD student in Columbia University).

DNA.Land is excited to launch its newest feature: Trait Prediction Report!

Trait Prediction is our attempt to predict real-word characteristics based on genomic data. This feature harnesses the results of studies that have discovered associations between single nucleotide polymorphisms (SNPs) [areas of the genome that vary between individuals] and biological traits believed to have a genetic component. The trait prediction menu page shows the available traits and your predictions.

At the moment, there are only two traits available on DNA.Land. But, additional traits will be added periodically.

Traits predictions based on genetic information are complicated by a wide variety of factors, including environmental effects, population stratification and statistical uncertainty, among other causes. Some of the predictions shown might be inaccurate. To keep improving our prediction models we ask our users to complete a preliminary survey — these surveys will enable us to refine our predictions. The survey must be completed before seeing your report so that responses are not biased.

If your prediction is inaccurate, please keep in mind that we will use the valuable information you’ve provided in the survey to improve our future prediction models.

DNA.Land users can find our new feature in the “My Reports” section.

A report page for each trait contains three main sections: Inferred Trait, Genotype Summary, and Genotype Details.

Inferred Trait:

The first section includes your predicted score based on the uploaded genotype file and published research:

In this section, Genetic Confidence refers to how much the currently discovered genomic variants are able to predict an individual’s traits. For the first two traits we have launched, we specifically chose topics from opposite ends of this spectrum. While eye color has a high genetic confidence level, the Educational Attainment trait’s genetic confidence is low.

In almost all complex traits, a large fraction of genetic variants thought to contribute an effect have yet to be identified. So, these predictions represent only our best guess based on current genomic knowledge. Keep in mind that the prediction only takes into account the genomic component. Because almost all human traits are determined by a combination of our genomes and environment, it is possible that your trait may be determined predominantly by your environment.

Genotype Summary:

This section of the report illustrates how the genetic variants of the genotype file are involved in determining the predicted score. It is broken up into two sub-sections: Effect Sizes and SNP Locations.

An effect size is attached to one allele of each SNP in our report. It is the amount in standard deviations that your trait score is expected to change if you have that particular allele. These effect sizes are summed together based on your SNP profile, and are then adjusted according to population distributions to produce your final prediction.

BLUE bars represent the maximum potential effect.

If the BLUE bar extends to the right of the 0.0 line, the SNP has potential to increase the predicted score. 
 If the BLUE bar extends to the left of the 0.0 line, the SNP has potential to decrease the predicted score. 
 If the BLUE bar extends to both sides, the SNP has potential to both increase and decrease the predicted score. 
 Information about BLUE bars is determined from published research.

RED bars represents the contribution of your genotypes to the predicted score.

If the RED bar extends to the right of 0.0 line, this raises your score relative to the average. 
 If the RED bar extends to the left of 0.0 line, this lowers your score relative to the average. 
 Information in RED bars comes from your genotype file.

If you click on one of the bars, it will indicate the SNP Location on the chart to the right.

Genotype Details:

This section includes a table with detailed information about the most significant genetic variants from the genotype file.

Our predictions take into account varying allele frequencies between populations in order to cancel out bases introduced by population stratification. The allele frequencies are taken from the 1000 Genomes Project and are grouped into five broad ancestry categories. The allele frequencies from one of these groups are applied to your predictions based on your inferred ancestry. These columns are empty if they are not reported in the research paper on which our scores are based. They are not based on your score directly

Odds Ratio: This is a representation of the effect size of a SNP that is commonly used in case-control genetic studies. For example, a geneticist may conduct a study on two groups: those who have cancer, and those who do not. If a particular allele has a very high odds ratio, then the odds (or probability) of an individual with one or two copies of the allele is highly elevated compared to an individual who has no copies of the allele.

Standard Error: These values pertain to the effect sizes of the SNPs, and are extracted from the research papers that report these SNPs. They indicate how confident the researchers are in the particular effect size, and how much this effect size is expected to differ based on measuring different samples. The lower the standard error, the more confident in the effect size.

P-value: This is the result of a hypothesis test on the value of the effect size of a SNP. A low P-value means that the SNP is more likely to be involved in determining the trait in question. A common threshold P-value used to determine significance in genome-wide association studies is 5 * 10–8.

As this is a new feature, we are sure our users will have questions that we have not anticipated. We would very much appreciate your feedback! E-mails including questions should be sent to info@dna.land. Keep your eyes peeled for additional traits in the coming weeks!