Introducing pyBioPortal: a Python package for accessing cBioPortal data

Introduction

4 min readJan 26, 2024

In the vast field of bioinformatics, accessing biological data is crucial for research and development. pyBioPortal is a novel Python package designed to streamline the retrieval of data from the cBioPortal (an open-access and open-source resource for interactive exploration of multidimensional cancer genomics data sets) providing a convenient interface and returning results mainly as Pandas DataFrames.

Overview

The usefulness and functionality of cBioPortal are well known and also documented in scientific articles [1–3]:

cBioPortal provides a simple yet flexible interface to integrated data sets, intuitive visualization options, and a programmatic web interface, all of which can aid researchers in translating cancer genomic data into biologic insights and potential clinical applications.

The pyBioPortal package is a valuable tool that complements the data analysis capabilities provided by cBioPortal, catering to experienced Python programming users and offering maximum flexibility to create customized analyses based on specific research objectives.

With pyBioPortal, users have the ability to automate queries to conduct advanced analyses and generate customized datasets using code. This allows complete control over data manipulation, enabling programming experts to tailor analyses according to their specific research needs. This flexibility proves advantageous for experts looking to conduct in-depth research and develop personalized analytical approaches to address specific questions in the field of cancer genetics.

The cBioPortal API is structured into specific categories, each dedicated to a specific type of data, providing an organized structure for retrieving information.

pyBioPortal refers to this structure by providing, for each data category, modules consisting of functions that query the API with GET or POST requests (as stated in the API documentation).

Some examples of http request for data categories

Installation

Getting started with pyBioPortal is a straightforward process, ensuring a quick setup. The package is conveniently available on two widely used platforms: PyPI (Python Package Index) and Anaconda. This versatility allows users to choose their preferred installation method based on their existing Python environment.

To install pyBioPortal using pip, you can run the following command in your terminal or command prompt:

pip install pybioportal

Alternatively, for users leveraging the Anaconda distribution, pyBioPortal can be installed effortlessly using the conda package manager:

conda install -c matteo.valerio pybioportal

Examples

The following are some examples of using the package:

Retrieve clinical attributes from studies in cBioPortal.

from pybioportal import clinical_attributes as ca

df = ca.fetch_clinical_attributes(study_ids=["brca_tcga", "brca_bccrc"])

The DataFrame df contain this data:

Retrieve patients survival clinical data from a specific study in cBioPortal.

from pybioportal import clinical_data as cd

attribute_ids=["OS_STATUS", "OS_MONTHS", "RACE"]

df=cd.fetch_all_clinical_data_in_study(study_id="brca_tcga",
                                       attribute_ids=attribute_ids,
                                       clinical_data_type="PATIENT", 
                                       ret_format="WIDE")

The DataFrame df contain this data:

With these data, through appropriate processing with Python coding, Kaplan-Meier plots can be obtained.

Documentation

The package’s documentation is available online on Read the Docs.

The documentation provides comprehensive guidance on utilizing the various modules comprising the pyBioPortal package where users can find examples demonstrating the usage of each module.

Conclusions

pyBioPortal presents itself as a useful tool for accessing and analyzing data from cBioPortal. It offers users, particularly Python programming experts, an effective way to automate queries and customize analyses to suit their research needs.

New data analysis functions will be implemented in future versions of pyBioPortal that aim to further simplify the analysis process for users.

Any suggestions or contributions for improving the package are welcome and can be submitted by referring to the GitHub repository.

[1] Cerami E, Gao J, Dogrusoz U, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012 May;2(5):401–4. (PubMed)

[2] Gao J, Aksoy BA, Dogrusoz U, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013 Apr 2;6(269):pl1. (PubMed)

[3] de Bruijn I, Kundra R, Mastrogiacomo B, et al. Analysis and Visualization of Longitudinal Genomic and Clinical Data from the AACR Project GENIE Biopharma Collaborative in cBioPortal. Cancer Res. 2023 Dec 1;83(23):3861–7. (PubMed)