Bioinformatics #6: STRINGdb — Protein-protein interaction network database

Michael Anekson
4 min readDec 25, 2022

--

This week, I am going to take a little bit detour. I am going to discuss STRINGdb that I used for MCI clustering algorithm last week because I think STRINGdb is quite powerful as same as Metascape but STRINGdb has some advantages that Metascape does not have.

STRINGdb Objective

Figure 1. STRINGdb user interface. Source: STRINGdb.

Based on their newest publication (Szklarczyk, et al., 2021), their objective is they want to integrate all experimented and predicted functional and physical protein interaction. They want to help user to check whether their protein list or gene list has correlation or not because if the scientist check their data manually, it will take a lot of time for their research. The input data for their website is same as Metascape, you only need a list of genes/ proteins and you have select the species that you want to analyze. You have to pay attention to species because each species can have significantly different result and I’ve been in that situation.

Figure 2. The user interface after you put the gene/protein list. Source: STRINGdb.

Once you enter the gene list, the interface like in the Figure 2 will appear. You don’t need to worry about this if you are a beginner user. This is just annotation checking especially if the list contains a gene or a protein that has multiple interpretations such as protein “OLE1” which has “OLE1", “MGA2", “SPT23". This situation usually happens because some genes have different meaning in different animal, organs, cell types, or even condition. The user just need to tick the protein which is necessary for their research. So, the moral value in this part is you have to know which gene you want to target or at least you have to understand that the common protein that exist in your research topic.

Figure 3. Protein-protein network analysis based on STRINGdb.

Once you click “continue” from Figure 2, the protein-protein interaction network will be shown like in the Figure 3. As you can see, you will which protein has interaction and not, and you can check their protein structure as well by seeing inside their node. Then, you will have multiple colorful edges and it’s not for full aesthetic purpose because those colorful edges have meaning about the protein is predicted or experimented (Figure 4).

Figure 4. Colorful edges symbol meaning. Source: STRINGdb.

In these figure legends (Figure 4), you will know the meaning of node and its interaction. You can check the data is experimentally tested, even you can know the interaction is confirmed or not through edges such as light blue color edge. You can check predicted interactions type which is divided into 3 types, gene neighborhood, fusions, and co-occurrence. Gene neighborhood is analyzed based on gene interaction similarity such as interaction with nuclear, epigenetic modification (De & Babu, 2010). Gene fusions is two gene that combines each other to form a protein. Co-occurrence means those genes just exist across species during experiment happens. There is “Others” section which provides textmining, co-expression, and protein homology. Textmining is analysis based on text analysis from title and abstract research articles, co-expression interaction is protein-protein that expressed together in the same or other species. Protein homology is based protein structure similarity.

Figure 5. Functional enrichment result. Source: STRINGdb.

Another result that you can get from this web application is functional enrichment result. STRINGdb also has multiple database as same as Metascape and you can see how many nodes contributes to that function. The real different between Metascape and STRINGdb is STRINGdb uses false discovery rate as their adjusted p-value while Metascape just use p-value score.

Figure 6. MCL clustering. Source: STRINGdb.

Last part in this article, you can apply MCL clustering in your PPI network just by tick MCL clustering, adjust, and apply it. STRINGdb is pretty simple and you can adjust the clustering method easily which is different with Metascape. In Metascape, they use MCODE but you will find difficulty in adjusting your clustering setting. However, if you are beginner user, you don’t have to worry about this factor.

Overall, STRINGdb is successfully show they are different and has several advantages from features presentation until their user interface is pretty friendly compares to Metascape. However, I have to admit that Metascape has some advantages either such as database variations, their strict clustering algorithms like MCODE, and simple interface that you only need to do one-click and all results will be shown.

References

  1. Damian Szklarczyk, Annika L Gable, Katerina C Nastou, David Lyon, Rebecca Kirsch, Sampo Pyysalo, Nadezhda T Doncheva, Marc Legeay, Tao Fang, Peer Bork, Lars J Jensen, Christian von Mering, The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D605–D612, https://doi.org/10.1093/nar/gkaa1074.
  2. https://string-db.org/cgi/input?sessionId=byqXPQmmoBSr&input_page_show_search=on
  3. De, S., & Babu, M. M. (2010). Genomic neighbourhood and the regulation of gene expression. Current opinion in cell biology, 22(3), 326–333. https://doi.org/10.1016/j.ceb.2010.04.004.

--

--

Michael Anekson

A data analyst that concerned about research publication and scientist lifestyle