How to use Streamlit to convert misidentified date terms from Excel to updated gene names

A web tool, Gene Updater, made by Streamlit to help preserve scientific data integrity and reproducibility

Kuan Rong Chan, Ph.D.
Omics Diary
Published in
2 min readJul 28, 2022

--

Gene Updater is a web tool made by Streamlit, which helps convert Excel auto-formatted terms back into updated gene names that are more resilient to autoconversion. For more details, users can read the publication in Scientific Reports

When gene expression data is opened in Excel, some of the gene symbols will be autoconverted into date terms. For instance, “SEPT1” (Septin 1) becomes “SEP-01” in Excel, which can affect downstream pathway analysis as many of these databases rely on gene symbols to detect for pathway enrichment. To circumvent this limitation, we used Streamlit to create a web tool called Gene Updater that allows users to convert the date terms to the updated gene terms recommended by HUGO, which are more resilient to autoconversion by Excel. The webpage is hosted at: https://share.streamlit.io/kuanrongchan/date-to-gene-converter/main/date_gene_tool.py.

Users may visit our GitHub address to download the date_gene_tool.py file to understand the Python codes that allows construction of such a web tool. To run the web tool locally, users can download all the required Python packages and files at the designated GitHub address or at https://zenodo.org/record/6845701#.YtvCYiX0rDs. After directing the terminal commands to the location where the downloaded files are, users can simply type the following command in terminal:

streamlit run date_gene_tool.py

To illustrate the importance and utility of this web tool, we downloaded the supplementary files from several of the top journals in the last month. This pursuit yielded 28/81 tables with date terms in their data files, of which 6 of them had no gene description columns, which can make the interpretation of these data files challenging. Fortunately, most of these errors can simply be corrected by Gene Updater, ensuring consistency in data sharing and data communication.

Besides using for this application, users can adopt the codes in the date_gene_tool.py file and specify the terms that needs to be changed (see below for the code section that can be edited to suit your needs). This can ensure that any other unintentional formatting by Excel can be resolved.

date_gene_tool.py file located at GitHub that can be potentially edited to create your own web tool for your specific needs

For more information of the web tool, users can read the scientific article currently published in Scientific Reports. To find out what we do, please visit kuanrongchan.com for more details. This work is done in Duke-NUS Medical School, and special thanks to Clara Koh and Justin Ooi for creating and designing the web tool.

--

--

Kuan Rong Chan, Ph.D.
Omics Diary

Kuan Rong Chan, PhD, Senior Principal Research Scientist in Duke-NUS Medical School. Virologist | Data Scientist | Loves mahjong | Website: kuanrongchan.com