GSoC’23 @ NRNB : Getting the data
Project : Creating support for Pathway Commons in clusterProfiler
Mentors : Guangchuang Yu, Augustin Luna
Week — 2 : Jun 5 — Jun 11
Introduction :
Welcome to the blog post highlighting my second-week experience in the Google Summer of Code (GSOC) program. This week involved learnings on the implementation of R programming to extract data from Pathway Commons before reading it for analyses. Let’s have a look at the details of the accomplishments and plans for the upcoming week.
Progress Made :
I started implementing the function to extract the GMT file from the URL which calls a function read.gmt.pc() to read the manipulate the data in appropriate format.
- get_pc_data :
get_pc_data <- function(source, keyType, output = "data.frame") {
gmtfile <- get_pc_gmtfile()
gmtfile <- gmtfile[grepl(source, gmtfile) & grepl(keyType, gmtfile)]
pcurl <- 'https://www.pathwaycommons.org/archives/PC2/v12/'
url <- paste0(pcurl, gmtfile)
# f <- tempfile(fileext = ".gmt.gz")
# dl <- mydownload(url, destfile = f)
# if (is.null(f)) {
# message("fail to download Pathway Commons data...")
# return(NULL)
# }
# read.gmt.pc(f, output = output)
read.gmt.pc(url, output = output)
}
Next Week Plan :
In the coming week, I intend to craft the functions for reading the Pathway Commons data according to the format to extract relevant columns for analyses.
This would involve :
1. read.gmt.pc_internal
2. read.gmt.pc :
The Repository :
https://github.com/YuLab-SMU/clusterProfiler/blob/devel/R/pathwayCommons.R
Conclusion :
The second week was indeed worthwhile for the numerous documentations and resources I referred to for understanding parts of data and manipulating them using R programming language.