Backwards-compatible genotyping array deliverables from sequencing data
This blog post was written by Joe Pickrell and Tomaz Berisa
Gencove’s low-pass sequencing SaaS can now generate deliverables that mimic Illumina GenomeStudio zip archives, commonly used as deliverables for genotyping arrays. Through analysis of the array design for any existing array, we are able to match both the format and strand reporting of the array, including A/B, top/bottom, and forward/reverse alleles while translating variant positions across any genome build supported by the Gencove platform.
The main motivation for releasing this feature was user feedback that highlighted the importance of backwards compatibility with existing analysis pipelines. The bottom line was that users were quite excited about implementing new sequencing technologies, analytics, and data formats, but also valued the ability to use sequencing data as a drop-in replacement for genotyping array within existing processes.
As of today, users can generate these deliverables for all samples in a Gencove project or specify a subset of samples. It is currently available via the Gencove API (/project-batch*
) and Gencove CLI v2.0.23.
Initially, we’ve publicly released deliverables for two cattle arrays:
- Illumina BovineSNP50
- Illumina BovineHD
As part of our process for implementing an array, we conduct an analysis of the content of the array in comparison to the genetic variation in our haplotype reference panel. An example of such an analysis is below for the Illumina BovineHD array — over 95% of the variants on the array are present in our haplotype panel with the exact same alleles, while the remainder of variants were either filtered out (e.g. if the site is multi-allelic in our haplotype reference panel) or not identified (for example if the probe sequence doesn’t map to the reference genome).
Supporting new arrays is straightforward, so if there is a genotyping array you’d like us to add for a species - reach out and let us know!