Understanding the Limitations and Potential of GWAS in Unraveling Disease Pathology

Freedom Preetham
Meta Multiomics
Published in
5 min readJun 22, 2024

Genome-Wide Association Studies (GWAS) have become indispensable tools in genomics, offering significant insights into the genetic underpinnings of complex diseases. By analyzing the genomes of large populations, GWAS can identify single nucleotide polymorphisms (SNPs) associated with diseases. However, while GWAS can pinpoint genetic variants linked to disease risk, they do not elucidate how these variants affect gene function or contribute to disease pathology. This gap between association and mechanism presents a critical challenge in genomics.

Ai generated image 🤷🏽‍♂️

GWAS: Pinpointing Disease-Related Genetic Variants

GWAS identify SNPs through a statistical approach. The principle is to compare the frequency of genetic variants between individuals with and without a particular disease. These identified SNPs are often in noncoding regions of the genome, which do not directly alter protein sequences but may affect regulatory elements controlling gene expression.

Errors in Noncoding Variants

A significant portion of disease-associated variants identified by GWAS are located in noncoding regions. These regions include enhancers, promoters, and other regulatory elements. Unlike coding variants that directly alter amino acid sequences and protein function, noncoding variants may influence the binding of transcription factors or the chromatin state, thereby modulating gene expression.

For example, an enhancer region might contain a SNP that affects the binding affinity of a transcription factor (TF). The binding of TFs to DNA can alter the transcriptional activity of a gene, potentially impacting the expression levels of genes involved in disease pathways. Enhancers are particularly interesting because they can act over long distances, looping through the 3D structure of the genome to contact promoters and other regulatory regions.

Functional Genomics: Bridging the Gap

To move beyond statistical associations, researchers employ functional genomics approaches such as expression quantitative trait locus (eQTL) mapping, which correlates genetic variants with gene expression levels. eQTL studies can reveal how SNPs affect the transcription of genes, providing insights into the regulatory mechanisms underlying disease.

Chromatin immunoprecipitation sequencing (ChIP-seq) and Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) are other key techniques. ChIP-seq can identify binding sites of DNA-associated proteins, such as transcription factors, across the genome, providing a map of regulatory elements. ATAC-seq reveals regions of open chromatin, which are accessible for transcription factor binding and thus likely to be regulatory.

Challenges of eQTL

eQTL mapping is a valuable technique for linking genetic variants to gene expression levels, shedding light on the regulatory mechanisms behind complex traits and diseases. However, this method faces several significant challenges.

One major issue is the tissue and cell-type specificity of gene expression. eQTL effects can vary greatly between tissues and even among different cell types within a tissue. Most eQTL studies are conducted on easily accessible tissues like blood, which may not reflect the regulatory dynamics in disease-relevant tissues. Additionally, bulk RNA-seq averages expression across cell types, potentially masking cell-type-specific effects.

Other challenges include the need for large sample sizes to achieve sufficient statistical power and the confounding effects of population structure, environmental influences, and batch effects, which can introduce biases and obscure true eQTL signals.

Gene expression’s dynamic nature and the complex regulatory interactions involving multiple elements further complicate eQTL analysis. The difficulty in pinpointing causal variants due to linkage disequilibrium and the lack of integration with other omics data layers also limit the method’s resolution and comprehensiveness. Addressing these challenges requires advanced experimental designs, larger and more diverse sample collections, and sophisticated computational methods to improve the accuracy and interpretability of eQTL studies.

Integrating GWAS with Functional Data

Integrating GWAS findings with functional genomics data and computational models provides a comprehensive approach to understanding the functional impact of genetic variants. This integration involves linking SNPs to changes in gene expression, chromatin accessibility, and other regulatory mechanisms.

Consider a SNP in an enhancer region that influences the expression of a nearby gene. A deep learning model might predict that this SNP alters the binding affinity of a transcription factor, leading to changes in gene expression. This prediction can then be validated using eQTL data, which correlates the SNP with gene expression levels across different tissues.

Examples and Biological Insights

One illustrative example involves the FTO locus, which has been associated with obesity. GWAS identified SNPs in this region that were linked to increased body mass index (BMI). However, these SNPs are located in a noncoding region, making it unclear how they contribute to obesity. Further functional studies revealed that these SNPs affect the expression of the IRX3 and IRX5 genes by altering enhancer activity, thereby influencing adipocyte differentiation and metabolism.

Another example is the SNP rs1421085 in the FTO locus. This SNP disrupts a conserved motif for the ARID5B repressor, leading to a decrease in ARID5B binding. This results in the derepression of IRX3 and IRX5, promoting a shift from energy-dissipating beige adipocytes to energy-storing white adipocytes, thereby increasing the risk of obesity.

Chromatin Structure and Regulatory Networks

The 3D structure of the genome plays a critical role in gene regulation. Chromatin interaction maps, such as those generated by Hi-C and related technologies, reveal how different regions of the genome physically interact. These interactions can bring enhancers into close proximity with their target promoters, facilitating gene regulation over long genomic distances.

Understanding these interactions is crucial for interpreting GWAS results. A SNP located far from any gene in the linear genome sequence might still affect gene expression by altering a long-range chromatin interaction. Integrating chromatin interaction data with GWAS findings helps identify these regulatory connections.

The Cognit Approach

At Cognit, we are at the forefront of integrating advanced computational models with functional genomics to bridge the gap between GWAS findings and mechanistic insights. We are building foundational AI models from ground-up to predict the functional impact of noncoding variants on gene expression, and employ single-cell genomic technologies to study gene regulation and variant effects at the cellular level.

By integrating chromatin interaction data, we map the 3D architecture of the genome, identifying long-range regulatory interactions that explain how distant SNPs influence gene expression. Through functional validation and translational genomics, we aim to develop novel therapeutic strategies and personalized medicine approaches for complex diseases. This integrated approach not only enhances our understanding of gene regulation and disease pathology but also paves the way for innovative treatments and improved health outcomes.

--

--