GSoC’23 @ NRNB : Getting the data

Jigyasa Gupta
1 min readJun 15, 2023

--

Project : Creating support for Pathway Commons in clusterProfiler

Mentors : Guangchuang Yu, Augustin Luna

Week — 2 : Jun 5 — Jun 11

Introduction :

Welcome to the blog post highlighting my second-week experience in the Google Summer of Code (GSOC) program. This week involved learnings on the implementation of R programming to extract data from Pathway Commons before reading it for analyses. Let’s have a look at the details of the accomplishments and plans for the upcoming week.

Progress Made :

I started implementing the function to extract the GMT file from the URL which calls a function read.gmt.pc() to read the manipulate the data in appropriate format.

  1. get_pc_data :
get_pc_data <- function(source, keyType, output = "data.frame") {
gmtfile <- get_pc_gmtfile()
gmtfile <- gmtfile[grepl(source, gmtfile) & grepl(keyType, gmtfile)]

pcurl <- 'https://www.pathwaycommons.org/archives/PC2/v12/'
url <- paste0(pcurl, gmtfile)
# f <- tempfile(fileext = ".gmt.gz")
# dl <- mydownload(url, destfile = f)
# if (is.null(f)) {
# message("fail to download Pathway Commons data...")
# return(NULL)
# }

# read.gmt.pc(f, output = output)
read.gmt.pc(url, output = output)
}

Next Week Plan :

In the coming week, I intend to craft the functions for reading the Pathway Commons data according to the format to extract relevant columns for analyses.

This would involve :

1. read.gmt.pc_internal

2. read.gmt.pc :

The Repository :

https://github.com/YuLab-SMU/clusterProfiler/blob/devel/R/pathwayCommons.R

Conclusion :

The second week was indeed worthwhile for the numerous documentations and resources I referred to for understanding parts of data and manipulating them using R programming language.

--

--