Gencove for agricultural species and model organisms

Joe Pickrell
The Gencove Blog
Published in
3 min readJun 26, 2019

tl;dr: Gencove now supports analysis of low-pass sequencing data from cattle, chickens, pigs, mice, rats, maize, and soybeans.

We launched Gencove’s low-pass sequencing platform about a year ago with the goal of making genome sequencing technologies accessible and cost-effective for large-scale genomics. Indeed, we’ve been pleased to see low-pass sequencing adopted for human genetics applications ranging from eQTL mapping to pharmacogenetics and polygenic risk score profiling.

Outside of human genetics, we’ve seen increasing interest in sequencing from agricultural companies, who use genomics to guide breeding programs directed at traits as varied as meat quality in cattle, drought resistance in maize, and egg production in chickens. We are now pleased to release low-pass sequencing analysis software and sequencing services for a wide range of agricultural and model organisms, including cattle, chickens, pig, mice, rats, maize, and soybeans. Compared to existing solutions like genotyping arrays, low-pass sequencing provides orders of magnitude more data (many millions of genetic variants called with >99% accuracy) and flexible extension to new species and/or breeds, while providing the throughput and cost-effectiveness needed for large-scale applications.

For each species, we’ve assembled or generated large haplotype reference panels that enable genotype imputation from low-pass sequencing data with high accuracy.

As an example of how this process works, in the remainder of this post we describe our cattle haplotype reference panel and benchmarking results produced in collaboration with Warren Snelling of the USDA; these results were recently presented at the annual Beef Improvement Federation meeting.

We first assembled a set of genome sequences from a wide range of cattle with an emphasis on the major B. taurus breeds. All of the sequences from diverse sources were put through Gencove’s internal processing and joint variant calling pipeline, in total we called around 48M bi-allelic variants in a set of 579 cattle. Principal component analysis (below) shows that the samples cluster according to breed differences, with the main split in these data being between Angus and Holstein cattle (since these comprise the largest numbers in our data). After phasing, the 1,158 haplotypes comprise our imputation reference panel.

Principal component analysis of the Gencove cattle haplotype reference panel

We then sought to evaluate the performance of our low-pass sequencing analysis and imputation pipeline when applied with this reference panel. In collaboration with Warren Snelling at the USDA, we used a set of bulls with known genotypes that had also been sequenced to over 4x coverage, and performed downsampling to different levels of coverage. After processing through our analysis pipeline, we compared the genotype calls to a high-density genotyping array. Restricting analysis to B. taurus-derived breeds, we see >99% concordance to arrays for most analyses (see below), even at the lowest levels of sequencing coverage.

Concordance (in percentage) between imputed low-pass sequencing at two sequencing coverages and high-density genotyping arrays in six B. taurus bulls from different breeds.

Overall, these results set the stage for low-pass sequencing (with >40 million genetic variants called on each sample) to enable improved genomic prediction and other expanded use cases for genomics across the cattle industry.

We have built analogous haplotype reference panels and analysis pipelines for a wide range of species (listed below) and are continuously developing new resources, if there are particular species or features that we’re missing please reach out!

--

--