How accurate DNA genotyping results are?

Comparison of direct-to-consumer DNA testing results

Most of personal genomics companies use Illumina BeadChip arrays, witch offer great accuracy. For example, AncestryDNA uses an Illumina OmniExpress Plus array that advertises 99.99% reproducibility rate.

We have 3 raw DNA files from 23andMe, Ancestry.com, and nonprofit Genes for Good. Let’s check it for inconsistent SNP values. We will use this ruby script to merge raw DNA files and to collect some stat.

Benchmark data

  • 23andMe v4 raw file with 599,694 SNPs
  • AncestryDNA v2 raw file with 667,430 SNPs
  • Genes for Good raw “GFG6_filtered_unphased_genotypes_23andMe.txt” file with 563,259 SNPs

It doesn’t include “no call” SNPs with “ — ” or “00” values.

Results

23andMe vs AncestryDNA: 305,751 intersections and 103 inconsistencies, most of them are in deletions and insertions (DD vs II).

23andMe vs Genes for Good: 135,623 intersections and 14 inconsistencies, most of them are in regular SNPs (CT vs CC, GA vs GG, and so on).

Genes for Good vs AncestryDNA: 191,656 intersections and 16 inconsistencies, all of them are in regular SNPs (TT vs CC, GG vs TG, etc).

Conclusion

The DNA raw data may differ for many reasons: differences in data format, genotyping and processing errors, even random mutations can play a role. However, these differences are relatively small and quite consistent with the declared accuracy values.