Kinase Activity Prediction from Phosphoproteomic Data

Deciphering of the kinases associated with the identified phosphorylation sites

Gitanjali Roy
The Computational Biology Magazine
4 min readDec 4, 2020

--

Image from Cell Signaling Technology

After the phosphoproteomic analysis in a mass spectrometer, the next question is to decipher the kinases associated with the identified phosphorylation sites. This can be done experimentally or by computational predictions. There are multiple computational tools available to predict kinase activity but all of these tools have certain drawbacks with respect to the organisms they support, the number of kinases they can predict and the input files they require. Being a computational biologist, the most difficult task for me has always been to prepare my input files in the specific format that is required by the bioinformatic tool. Below is a list of all the tools for kinase prediction and their limitations:

Table 1:

Image by author

Amongst all of these tools, the piNET and KSEA are my personal favorites, as they provide a comprehensive resource and good visualization for the publication purposes.

piNET is available at http://www.pinet-server.org/pinet/peptideToProtein . In addition to integrating several databases and resources for proteomic studies like UniProt, PsiMOD, PROSITE, PhosphoSitePlus, iPTMnet, Signor, Enrichr, Reactome and DeepPhos for mapping, annotation, analysis of post-translational modification sites and modifying enzymes that target these sites. Since piNET is available as a web server this can be used by biologists with no programmatic experience and it is also available as API methods for programmatic access. piNET also connects proteomics profiles with transcriptional and proteomics signatures generated by The Library of Integrated Network-based Cellular Signatures (LINCS) project which aims to systematically collect Omics signatures of genetic and chemical perturbations, including signatures induced by genetic knock-downs or small molecule inhibitors of the modifying enzymes, such as kinases.( http://www.ilincs.org/ilincs/) The exciting thing about piNET is that it can map acetyl transferases and methyl transferases using iPTMnet and Signor in addition to kinases. It provides nice visualization for pathway enrichment, kinase network and perturbation network.

Figure 1: Pathway enrichment graph from piNET. (Image from piNET)
Figure 2: Kinase enrichment graph from piNET. (Image from piNET)
Figure 3: LINCS perturbation network from piNET. (Image from piNET)

KSEA was first published as a freely available python tool by Wirbel et. al. in 2018 and is available at http://github.com/saezlab/kinact/. The R shiny tool called KSEAapp by Wiredja et. al. is available in https://github.com/casecpb/KSEA/ . The KSEAapp provides a web-based application that can be used by researchers with little to no programmatic experience. This tool predicts kinase activity based on collective phosphorylation changes of it’s identified substrates. Currently KSEA supports only human kinases but mouse kinases can easily be predicted with some modifications in the code. The input format is very simple and requires the UniProt protein id, gene names, peptide sequences, phosphorylated residues, p-values and fold change values. The UniProt id, peptide sequences and p-values are optional though the columns must be present.

Table 2:

Image by author

The algorithm scores each kinase based on the relative hyperphosphorylation and dephosphorylation of its substrates, as identified from the phosphosite specific databases such as PhosphoSitePlus and NetworkIn. The positive and negative values represent an increase or decrease in kinase activity in response to drug treatment relative to control. The kinase is scored based on the formula:

Where 𝑠̅ denotes the mean log2(FC) of known phosphosite substrates of the given kinase, 𝑝̅ represents the mean log2(FC) of all phosphosites in the dataset, m denotes the total number of phosphosite substrates identified from the experiment that annotate to the specified kinase, and δ denotes the standard deviation of the log2(FC) across all phosphosites in the dataset. This formula is based on a z-score transformation, and the resulting scores (denoted as ‘z-score’ in the KSEA App outputs) are assumed to be normally distributed. Subsequently, the p-value is determined by assessing the one-tailed probability of having a more extreme score than the one measured, followed by a Benjamini-Hochberg FDR correction for multiple hypothesis testing. After analysis is complete, the algorithm generates 2 csv files, the kinase scores and kinase-substrate links and a bar plot that summarizes the KSEA results. The p-value cutoff determines which kinases to color blue/red for visual annotation of kinases that reach statistical significance. Kinases with non-significant scores will be black. Downregulated kinases are shown in blue at the bottom of the plot and upregulated kinases are shown in red at the top.

Figure 4: KSEA barplot (Image from KSEAapp)

The list shown here is not comprehensive and there are many other tools available for kinase prediction. The two resources, piNET and KSEA discussed here are the easiest to use according to my experience. Every computational tool has certain limitations and it is upto the researcher to choose the tool that best compliments their work.

--

--

Gitanjali Roy
The Computational Biology Magazine

I am a graduate student in Biochemistry working on Proteomics. I specifically search for PTMs in various biological samples and study PPI.