Bioinformatics #2: Metascape — The simple but powerful functional enrichment

Michael Anekson
5 min readOct 29, 2022

--

Source: https://metascape.org/gp/index.html#/main/step1

From bioinformatics #1, I explained about 3 bioinformatics applications and Metascape was one of them. Now, I want to tell you more detail about Metascape and the reason why they are quite famous for functional enrichment web application. I can say they are famous because their recent publication was in 2019 but already got citation 4870 times! during this article was written. This condition shows how important and powerful the web application is.

Input data

Figure 1. The input data can be in various format.

the input data in this web app is pretty simple, you just need to put the gene name either in xls, txt file or you just type it in their white column. Then, you can put the gene symbol (Cdx2, Sox2, Oct4), RefSeq (NP_000024, XP_011514163), or Entrez Gene ID (215,1780,6729,154810) which makes this web app is quite flexible with your input data. Maybe you are wondering why they provide RefSeq and Entrez Gene ID which are quite weird and even normal human can’t understand. Well, the answer for that question is sometimes the scientist get the result from high-throughput analysis such as microarray, RNA-seq and their result do not show gene symbol directly. Instead, they provide another format like RefSeq or Entrez Gene ID. That’s the reason Metascape authors develop multiple format for input data to facilitate scientist from multiple backgrounds.

Species feature

Figure 2. Species to specify the functional enrichment.

In biology, same gene can be expressed differently in multiple species or let’s say the gene is not expressed consistently in same organ or moment in different species. To make sure that the functional enrichment result biologically meaningful, they provide species feature to specify the functional enrichment. This is maybe a simple feature but it is powerful to avoid misunderstanding or biologically irrelevant.

Database

Figure 3. Metascape provides multiple database for any kind of users.

Functional enrichment always related to database and I can say database is the soul of functional enrichment. Without database, your input data can’t be analyzed by any kind of programming related to functional enrichment. However, the next question is what kind of database you really need for your genes. Luckily, they provide a lot of database that you can adjust by just click “Custom analysis” in the previous page. My general advice for beginner user is usually functional enrichment database is gene ontology (GO) and pathway database such as KEGG, Reactome. Those database are the basic for functional enrichment. Therefore, if you don’t know any kind of database, just tick GO database, KEGG, and Reactome.

Membership

Figure 4. Membership provide prioritization that user desires

Membership is one of the good features in Metascape because it allows user to prioritize or filter which pathways that user really wants. In this case, as you can see in Figure 4, I want to find any results related to cancer which later will only show functions related to cancer only. You can use this feature to find result that you desire using this feature.

Enrichment

Figure 5. Enrichment provides statistical adjustment that the user compromises

Enrichment provides statistical detail which is important for adjustment if the functional enrichment result is not good. Sometimes the gene list from the user shows bad result with many information are statistically not significant because the p-value bigger than 0.01. In order to adjust this, user can go to the enrichment page and change it into p-value less than 0.05. You can also edit the protein-protein interaction as well. About protein-protein interaction appearance, I will show you later in the next section.

Bar plot visualize the significant functions which are affected by your gene list

Figure 6. Bar plot appearance in Metascape

The visualization is important in functional enrichment application. For Metascape, they provide bar plot to show the significant functions that are affected by user gene list. Regarding the method, I will explain it to you next time because it’s quite complicated to explain everything in here. On the other hand, using this bar plot, you can check the top 10 or top 5 most significant pathways that you want.

Network plot for gene or protein interaction

Figure 7. Network plot in Metascape

Let’s say you know already which function is good for you research. Then, you will have a question, how do the genes interact, right? to answer that question, Metascape provides network plot to facilitate user interpret genes or proteins interaction. By using this visualization, you will understand the molecule relationship easily.

That’s all I have to explain the Metascape features. Overall, Metascape is great application because of its simplicity with one-click feature basically because by just input your gene and click, the result will show up. However, if you are advanced user and want some adjustment in your result, you can edit it with custom analysis as well.

Reference

  1. Zhou, Y., Zhou, B., Pache, L., Chang, M., Khodabakhshi, A. H., Tanaseichuk, O., Benner, C., & Chanda, S. K. (2019). Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature communications, 10(1), 1523. https://doi.org/10.1038/s41467-019-09234-6

--

--

Michael Anekson

A data analyst that concerned about research publication and scientist lifestyle