Protein Function Prediction — PIK3AP1

Simon Tse
Learn about Cancer with Code
2 min readApr 8, 2024

Introduction

In last post, I described how a few motif of proteins with 36 amino acids (AA) has a high degree of cluster. It is natural to hypothesise that the function(s) of a protein is determined by the combination of few motifs. This post is about how I would approach this hypothesis.

Approach

To explore if there is any underlying pattern that I can exploit, I choose to study one particular gene product from PI3K family. Phosphoinositide 3-kinases (PI3Ks), also called phosphatidylinositol 3-kinases, are a family of enzymes involved in cellular functions such as cell growth, proliferation, differentiation, motility, survival and intracellular trafficking, which in turn are involved in cancer. [1]

I have compared the CAFA data set against KEGG database on a set of genes that falls under PI3K/Akt/mTOR pathway.

Created and prepared by author

For illustration purpose, I will focus on PIK3AP1 as it spans across only 5 different GO groups. The size is good to develop a toy model.

Let’s see which 5 GO Groups that PIK3AP1 spans across.

Created and prepared by author

And what these 5 GO Groups represents

Created and prepared by author

From the description, it is apparent that PIK3AP1 binds to some molecular entity inside cell cortex. The description is rather general and it does not say much on the characteristics of those proteins under respective GOs.

So let’s focus on PIK3AP1

Created and prepared by author

From the CAFA dataset, the 5 GO groups have 55,480 protein entries altogether. Based on these 50K entries, I am going to break those proteins down into 36-AA long motif for analysis.

Intermission

In next post, I should have completed the construction of those 36-AA motifs into a graph for a network analysis.

Stay tuned.

--

--

Simon Tse
Learn about Cancer with Code

Try to apply my ML/NLP knowledge to problems I am interested in and create a narrative with the data. Current Interest: Cancer Biology