New Paper Highlights Results from Recent PrecisionFDA Truth Challenge

DNAnexus
DNAnexus Science Frontiers
3 min readJun 21, 2022

Scientists from DNAnexus, the Genome In a Bottle consortium, and other institutions have published a paper in Cell Genomics that shares results from the most recent precisionFDA challenge.

The challenge, which ran from May to June 2020 is a follow-up to the first Truth Challenge organized in 2016. Both challenges aimed to assess how well variant calling pipelines perform in challenging genomic regions. Participants were tasked with testing their pipelines against data from short- and long-read sequencing sequencing technologies that covered specific regions of the genome. Challenge organizers evaluated the submissions using best practices for benchmarking small variants from the GIAB consortium. The benchmarks assessed algorithms’ performance on difficult regions of the genome such as the major histocompatibility complex (MHC) and segmental duplications.

Overview of the Results

A total of 20 teams participated in the challenge in 2020. They received FASTQ files from three human samples generated using three sequencing technologies. In total, they submitted 64 variant call files for the challenge datasets. Competitors used various methods in their pipelines including graph-based algorithms and machine learning methods like deep-learning to call variants.

The new Cell Genomics paper includes details of pipelines from 15 of the 20 teams. The results showed that pipelines that used deep-learning approaches did well when they were applied to long-read datasets, while those that used graph-based methods performed best on short-read datasets. Below are more highlights of the pipelines’ performance reported in the paper:

  • For all the regions included in the benchmarks, the best performing pipelines used data from all three sequencing technologies to make the variant calls. The next best submissions used reads from PacBio followed by Illumina and then Oxford Nanopore sequencing technologies.
  • For difficult to map regions of the genome, submissions that used reads from Oxford Nanopore technology did better than those that used reads from Illumina platforms.

Additional details on how each pipeline performed when applied to different genomic regions are provided in the supplementary documents for the paper.

Characterizing Clinically Relevant Parts of the Genome

The results from the challenge highlighted methods that scientists can use to characterize clinically-relevant and highly polymorphic parts of the genome. Specifically, a graph-based method that used a pan-genome reference was able to improve variant calling performance and better capture the variability in the MHC region. Some long-read based methods that were submitted also performed well in this region. One pipeline designed for calling variants in nanopore datasets worked particularly well on single nucleotide variants in the MHC and in other genomic regions assessed by the benchmarks.

Improved Benchmarks and Algorithms

The first version of the Truth Challenge was held in 2016. The results from this recent challenge demonstrate the pipelines and benchmarks have advanced significantly since then. For example, pipelines in this challenge had better error rates when used to call single nucleotide variants. In fact, algorithms in this round showed as much as 10-fold improvement compared to those used in 2016. Then benchmarks have also improved since the first round. Specifically, the V4.2 benchmark used for this challenge covers 7% more of the genome than the one used in 2016. Sequencing technologies have also come a long way resulting in better coverage of the genome overall even in regions with a lot of variation.

The full pFDA challenge results are published here, while the challenge data is available on the precisionFDA platform and here.

--

--