Networks in Biology: an introduction to Gene Coexpression Networks
Developing high-throughput experiments have provided vast genomic and transcriptomic data. The availability of such big data has led to the use of network methods to construct gene expression networks.
Both gene expression and co-expression networks can be used to understand the relationships between genes and their functions and identify key players in biological processes. However, they differ in their construction and the information they provide.
In comparison, Gene Expression Networks focus on understanding how genes are expressed and regulate their expression. Gene Coexpression Networks, on the other hand, are constructed by identifying pairs of genes consistently coexpressed across several different samples or conditions.
Gene co-expression network analysis has been widely used for understanding which genes are highly co-expressed through special biological processes or differentially expressed in various conditions. They can also be used to identify potential functional relationships between genes, such as whether one gene might regulate the expression of another.
WGCNA (Weighted Gene Co-expression Network Analysis)
WGCNA is a method for constructing a gene co-expression network from high-throughput gene expression data. It is based on the concept that co-expressed genes across different experimental conditions are likely to have related functions or be regulated by common pathways.
WGCNA has been widely used to identify clusters of co-expressed genes or “modules” and to analyze the relationships between gene expression and various traits of interest (such as disease status).
Core Steps in WGCNA:
step 1: Data Preprocessing — cleaning and normalizing the gene expression data
step 2: Network Construction — calculating the pairwise correlations (Pearson/Spearman correlations) between genes and constructing a network of coexpressed genes using these correlations
step 3: Module Detection — identifying groups of genes (modules) that are significantly coexpressed within the network using methods, including hierarchical clustering, k-means clustering, and dynamic tree cutting
step 4: Module annotation — annotating the genes in each module with functional information (Gene Ontology terms or Pathways)
step 5: Network visualization — visualizing the network where the genes are represented as nodes, and the weights between them are represented as edges by exporting data into Cytoscape or Gephi
Gene coexpression networks are important because they can help to identify groups of genes that are coordinately regulated, which can provide insights into the mechanisms underlying biological processes and assist in identifying potential therapeutic targets for diseases. Additionally, gene coexpression networks can be used to predict uncharacterized genes’ function and prioritize genes for further study.
Functions necessary to perform WGCNA on high-dimensional data is available as a R package at https://cran.r-project.org/web/packages/WGCNA/index.html
A quick navigation to the entire package is available at https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/
RESOURCES: