GSoC 2018 — InterMine Data Browser: Progress so far (Part I)

It has been a long time since my last post, and I have to excuse myself for that, but as compensation I’ll introduce some of the new features that we have been developing now that the second evaluation phase is coming closer. Due to the large number of features implemented, I’ll divide the explanations in different posts, since a very long post will be boring for most of you.

As a brief reminder, my project goal is to build a faceted search tool to display the data from InterMine database, allowing the users to search easily within the different mines available around InterMine without an extensive knowledge of the data model. If you want to check the browser by yourself while reading this post, you can access it in the following url: http://im-browser-prototype.herokuapp.com/ ;).

InterMine ontology concepts to be used as filters

As you can see in the GitHub issues for milestone 1 and milestone 2 (which is the way we have been organizing the work), one of the first features was to add filters with automatically filled fields. To do so, we took a look at the InterMine ontology, which defines the terminological concepts to model the entities stored in the InterMine databases (take a look at OWL for more info). As a result, the concepts displayed on the left-hand side figure were selected to create filters in the browser.

Basically, the filters to add were:

  1. Filter by ontology term from GO annotations (look here)
  2. Filter by dataset name
  3. Filter by location
  4. Filter by pathway (more info here)
  5. Filter by protein domain name (check this)

This filters must work as an OR logical constraint for filters within a filter, and as AND between them, so if a user filters by 2 GO annotations (namely A and B), and a dataset name (namely C), the data should be filtered with the constraint (A OR B) AND C.

As a result for the first filter, we implemented a input box with typeahead features, where the user can search for GO annotations and click the desired one, which, again, helps users without an extensive knowledge of the data model to query correctly for the desired data.

GO Annotation filter

Next, for the datasets, a multiple checkbox filter was implemented, since the amount of different datasets is not that big. Also, in order to not take innecesary space initially, a “View more” button is under the datasets, that shows the remaining ones when the user clicks it.

Dataset name filter

Furthermore, for the locations filter, the user can specify a Chromosome, meaning that the resultant rows need to be in this Chromosome, and a starting and ending position. The starting position is treated with a ≥ constraint, whereas the ending one is treated as a ≤.

Locations filter

Moreover, for the pathways filter, a similar approach that of the one for the GO annotation filter has been developed, letting the user search (with typeahead) for a certain pathway to filter the table. The Protein Domain Name filter works in the same way.

Pathways filter
Protein Domain Name filter

On the other side, we have implemented multiple filtering capabilities, where each time the user chooses one filter, its added to a list (see the green box) with a ‘x’ button on its right (see the orange box), in order to remove the filter.


And that’s all for this part! Stay tuned for the next one (this week, hopefully), where more features will be described ;).

If you want to give some feedback, you can open a new issue in the GitHub repo, or just email me at adrianrodriguezbazaga@gmail.com, I’ll be happy to receive new ideas or suggestions!