Google Summer of Code 2018 @ Stemformatics, Report
A big thank you to my mentors, Chris Pacheco and Isha Nagpal for guiding and supporting me throughout the course of the project. Thanks to Rowland Mosbergen too, for his guidance and getting me started with the project. It has been a pleasure working with Stemformatics and an enjoyable experience!
What is Stemformatics?
Stemformatics is an online portal, running on Pyramid, a python based web framework, which enables stem cell biologists to visualize, analyze and explore interesting datasets quickly and easily. It is a portal to a series of various public experiments which describe human and mouse stem cells. Stem cells are those cells which have the ability to differentiate into various different cell types. Stemformatics makes the data easy for searching as well as exporting.
Another insight to understanding what Stemformatics is, as described on the website,
Stemformatics is not a substitute for good collaboration between bioinformaticians and stem cell biologists but is a stepping stone towards that collaboration.
The Project
Bitbucket Repository : https://bitbucket.org/stemformatics/s4m_pyramid/src/annotation_page_clean_up_master/
The primary goal of the project was to ease, the entry of biological data and the ability to annotate this data. This was done firstly by improving features in the annotation table, integrating an ontology lookup service and creating a summary table page to have a quick look at the data and edit it as well.
Project Progress
The project was divided into 3 parts, first was to improve the annotation table and integrate ontology lookup service, second was to create a summary table page and third was to link the summary page to the annotation page.
Phase 1: Annotation Table Improvisation https://gsoc2.stemformatics.org/admin/annotate_dataset?ds_id=3000
The annotation table is something like an excel sheet where the data annotator can input data after conducting experiments. This table runs on handsontable javascript spreadsheet. As the version used was older my first task was to implement the newer version, taking care that things don’t go haywire in the process.
The next task was to solve a bug in the table. The annotation table has a large number of columns and since navigating through them would be difficult a show/hide necessary columns feature had already been implemented. The bug was that when a column got hid at certain times the column headers wouldn’t match the respective column data. After updating the handsontable to the latest version available, I used the afterRender feature of the handsontable to enable the hiding and to also trigger the header checklist which solved the bug.
Navigating through so much data is difficult. To ease the search of data I implemented a row filter for the table.
Data in certain columns needed to follow an ontology. So ontology list text files were created for these columns (Parental cell type, final cell type, sex, organism) and autocomplete was implemented. On entering data the data annotator could thus choose an option from the dropdown (the dropdown list is loaded from the text files when the handsontable is loaded) or if a new entry were added it would automatically detect and save to the respective list text file.
These were some of the main points I worked on in the first phase of the project.
Phase 2: Creation of Summary Table https://gsoc2.stemformatics.org/datasets/summary_table?ds_id=3000
The data displayed in the summary table was parsed from GEO (Gene Expression Omnibus) and stored in geo_biosamples_metadata a table in the psql database. While the annotation table is where the annotator can enter data, the summary table contains data from GEO, which the annotator could look up to refer while entering data into the annotation table. This data too could be edited by the annotator. This is what the summary table looks like.
Following are the list of features I implemented in the summary table:
- Ungrouped Data Mode, gives an overview of the data and makes editing specific data easier.
- Grouped Data Mode, is useful for editing data in bulk. Grouping is done based on metadata value.
- Filter : To get data based on a particular metadata name. Once the form is submitted, it will redirect to a new page where the summary table would contain data based on that particular metadata name selected.
- Quick Search : To filter rows quickly to get to a specific data
- Save : is to save changes to the database directly (used ajax). Saves to the geo_biosamples_metadata table in the psql database.
- Save (Annotation Table) : Send data from summary table to annotation table. Saves to the biosamples_metadata table in the psql database.
- CSV, JSON downloads an instance of the table as is on the screen.
Phase 3a : Implementation of Ontology Lookup Service https://gsoc2.stemformatics.org/admin/annotate_dataset?ds_id=3000
The Ontology Lookup Service as the name suggests is used to lookup biomedical ontology terms. It is developed and maintained by EMBL-EBI. A really useful feature of OLS is the ols-treeview, which shows biomedical ontologies for a query in tree format. I used the ols-treeview widget from biojs to implement the treeview on the annotation table page. This particular service is used by the data annotator while annotating. But repeatedly going to the OLS site is tedious. So, the OLS tree was implemented on the Annotation table page itself.
To fetch search results based upon a query, I used EBI’s api. The data is then divided on 3 pages using pagination. And on clicking on any one of the search results the tree-view of the respective term appears on the right.
I implemented this entire section just above annotation table making it convenient for the data annotator. In case one needs to view more information, the EBI search button would directly make a query and show the results in a new tab on the EBI Ontology Lookup Service website itself.
Phase 3b : Linking of certain data from summary table to the annotation table
Certain data in the summary table was required to be imported into the annotation table. Copying data manually is not very efficient. Therefore, certain data (Age, Sex, Organism) is sent to the biosamples_metadata table in the database, which reflects in the annotation table. The data is sent on clicking the Save to Annotation Table button on the summary table.
Pull Requests (Reviewed and merged to annotation_page_clean_up_master branch)
- Autocomplete and search filter implementation in annotation table, solved column shuffling bug, added 4 columns for the annotation table
- Summary Table
- Summary Table (Save feature complete)
- OLS Implementation on the annotation table
- On click nodes in treeview display data
- OLS for Annotation Table Page
- Ontology Links and OLS search pagination
- Addition of gsm_ids in summary table and Faster OLS Search
- Save to annotation table
- Save new entry in annotation table to existing autocomplete lists
- Added scripts and implemented addition of parental cell type to db
Future Work
- All the data attributes will be copied from summary table to annotation table in future.
- Addition of ability to grab data from ArrayExpress to summary table (similar to the geo database)
- Move php validator code from php to pyramid (within Stemformatics)
Conclusion
GSoC has been a great experience for me. I’m really grateful that I got to be part of such an amazing organisation. I enjoyed the last 3 months of writing and debugging code. Knowing that the code I have written is going to make things easier for people is indeed fulfilling.