GSoC’23 @ NRNB : Week 4 ( Data Extraction and Preparation)
AIM : Data Extraction and Preparation with prepare_pc_data
Mentors : Guangchuang Yu, Augustin Luna
Week — 4 : Jun 19 — Jun 25
Introduction:
Welcome to the fourth week of my GSOC journey! In the previous weeks, we’ve embarked on a journey through the basics of Pathway Commons analysis. This week, we’re venturing into the crucial process of extracting and preparing data for further analysis using the powerful prepare_pc_data
function.
Progress Made:
This week, I dived into the intricacies of prepare_pc_data
. This function is instrumental in retrieving and structuring Pathway Commons data, splitting it into two essential components: PCID2GENE
and PCID2NAME
. These structured data frames lay the foundation for more in-depth analyses.
prepare_pc_data <- function(source, keyType) {
pc2gene <- get_pc_data(source, keyType, output = 'data.frame')
##TERM2GENE
pcid2gene <- pc2gene[, c("id", "gene")]
##TERM2NAME
pcid2name <- unique(pc2gene[, c("id", "name")])
list(PCID2GENE = pcid2gene,
PCID2NAME = pcid2name)
}
Next Week Plan:
In the upcoming week, we’ll explore the fascinating world of Over-Representation Analysis (ORA) using the enriched data we’ve prepared.
Conclusion:
Preparing data is akin to laying the tracks for an insightful analysis journey. Join me next week as we navigate the pathways of Over-Representation Analysis using the groundwork we’ve established.
Repository :
https://github.com/YuLab-SMU/clusterProfiler/blob/devel/R/pathwayCommons.R