Protein Function Prediction — PIK3AP1
Introduction
In last post, I described how a few motif of proteins with 36 amino acids (AA) has a high degree of cluster. It is natural to hypothesise that the function(s) of a protein is determined by the combination of few motifs. This post is about how I would approach this hypothesis.
Approach
To explore if there is any underlying pattern that I can exploit, I choose to study one particular gene product from PI3K family. Phosphoinositide 3-kinases (PI3Ks), also called phosphatidylinositol 3-kinases, are a family of enzymes involved in cellular functions such as cell growth, proliferation, differentiation, motility, survival and intracellular trafficking, which in turn are involved in cancer. [1]
I have compared the CAFA data set against KEGG database on a set of genes that falls under PI3K/Akt/mTOR pathway.
For illustration purpose, I will focus on PIK3AP1 as it spans across only 5 different GO groups. The size is good to develop a toy model.
Let’s see which 5 GO Groups that PIK3AP1 spans across.
And what these 5 GO Groups represents
From the description, it is apparent that PIK3AP1 binds to some molecular entity inside cell cortex. The description is rather general and it does not say much on the characteristics of those proteins under respective GOs.
So let’s focus on PIK3AP1
From the CAFA dataset, the 5 GO groups have 55,480 protein entries altogether. Based on these 50K entries, I am going to break those proteins down into 36-AA long motif for analysis.
Intermission
In next post, I should have completed the construction of those 36-AA motifs into a graph for a network analysis.
Stay tuned.
Reference: