Exploring the COVID-19 network of scientific research with SciSight

By Tom Hope, Jon Borchardt, Jason Portenoy, Kishore Vasan, and Jevin West

Tom Hope
Tom Hope
May 6, 2020 · 6 min read
Using SciSight from AI2 for exploratory search and network visualization of the research groups working on COVID-19.

Three months into the coronavirus pandemic, the world’s scientific knowledge of the SARS-CoV-2 virus is rapidly expanding. Reports of potential vaccines and treatments sprout up almost daily. Thousands of papers have been pouring into Semantic Scholar’s COVID-19 Open Research Dataset (CORD-19), a collection of nearly 60,000 scientific publications of potential relevance to the topic, both historical and cutting-edge.

Artificial intelligence researchers have also mobilized, building a flurry of search engines and knowledge bases. Standing out according to recent news is BenevolentAI, which built a biomedical knowledge graph based in part on the scientific literature, and used it to discover a potential drug to repurpose for COVID-19 treatment. But the AI could only go so far, and human intelligence was needed to get across the finish line. “It is not like we have this giant button and we just smack it and stuff comes out the other end,” an engineer was quoted as saying. The team, including a pharmacological expert, used visualization tools such as a “multicolored map” and “a kind of digital flow chart” to identify the relevant genes and drugs, eventually landing at a drug previously used to treat arthritis.

Despite the natural human inclination toward visual information, the predominant way scientists search and consume research is through lists of articles in academic search engines, with scarce visualization features. While search engines are a powerful tool for quickly finding documents relevant to a query, they are mostly not geared for exploration and discovery — investigating a knowledge base, finding out the unknown unknowns. A recent review of scholarly visualization tools states that they are “applied relatively rarely” in information retrieval, and that useful and user-friendly science mapping systems are difficult to design.

Exploratory search and visualization for science

To help accelerate scientific discovery with visualization, last month we launched SciSight, a framework of exploratory search and visualization tools for the COVID-19 literature. The first version of SciSight supported exploring associations between biomedical concepts appearing in the literature. In preliminary user interviews, the tool was found helpful in discovery-oriented search. We now release two important updates of SciSight.

SciSight exploratory faceted paper search. Selecting a facet automatically suggests new search refinements. Here, we search for Intervention: Ribavirin (an anti-viral drug), together with Patient Characteristic: Immunocompromised.

The first is an exploratory faceted paper search. When searching for a topic or an author, new suggestions to help refine the search are suggested based on top co-associations with the initial search. This feature also helps prevent fixation on an initial topic and facilitates associative exploration — navigating an area or topic of interest by viewing associated topics (or authors) to explore, especially helpful when users do not have a pinpointed query in mind. In addition, the number of relevant publications is presented over time, possibly revealing trends for specific facets (e.g., a topic, author, organization or a combination). Users can adjust the desired time range and papers and facets displayed update accordingly.

Exploring the network of science

On the face of it, the science around COVID-19 is all about biomedical knowledge on genes, proteins, drugs, diseases, vaccinations, etc. But science is of course very much a human endeavor, with sociological undercurrents that have tremendous effects that are studied in science of science research. We want to leverage this social structure to bridge groups and disciplines, potentially facilitating new collaborations and discovery of methods, problems and directions other scientists are working on.

Silos of knowledge exist throughout the literature, and the sheer number of papers coming out every year in biomedicine makes it hard to keep track, let alone find out about new groups and new fields. This lack of visibility beyond one’s immediate sphere also leads to redundant work. During times of emergency, these problems are even more pronounced — when knowledge-sharing and collaboration are vital.

To facilitate exploratory search within the COVID research community, AI2 is working with the University of Washington’s DataLab to build an interactive visualization to help researchers make connections to other groups, methods and ideas in the literature. Unlike most bibliographic visualizations, we focus on groups and their connections, and integrate this social graph with exploratory faceted search.

Users can find what groups are working on what directions by searching topics, affiliations, or authors. As with the faceted search interface, each selected search query automatically prompts new suggested queries that are associated with the original, to suggest more groups and topics to explore. For example, a virologist may be interested in searching for groups connected to the University of Oxford. When searching for Oxford, the Ebola virus topic moves up the list of associated concepts. The scientist can select it to further refine the search.

Search suggestions. Searching for an affiliation (Oxford), moves Ebola Virus up the list of top associated topics.

The notion of a research group is somewhat elusive — scientists can belong to several groups depending on who they work with and on what. Here, we extract groups automatically from the co-authorship network of Semantic Scholar’s CORD-19 papers with graph-mining, in this case using an overlapping community detection approach so that authors can belong to multiple groups. We initially focus on authors who have been recently active — having at least one paper since 2017.

Given a query, we retrieve relevant groups and their connections. Each group is visualized as a “box”, each box showing salient authors, affiliations, and topics in a group. Boxes are color-coded to reflect relevance to the initial query — aiming to strike a balance between the relevance and diversity of the results shown. Users may select how many groups to view, zoom in/out, click a group and scroll down to see more detailed information and the group’s papers. Continuing our example, the virologist is now able to see not only groups directly affiliated with Oxford and Ebola, but also “neighboring” groups based on shared authors or shared topics.

The network of research groups working on related concepts, as discovered by SciSight.

Finally, links between groups represent topical affinity and social affinity. Hovering over green links to shows the top shared authors between two groups, and over purple links shows the top shared topics. To find out which researchers work in both groups or what are the major topics of interest they share, our virologist simply hovers over a purple edge, for example, to find out that two groups are both focusing on MERS.

Exploring the connection between research groups with SciSight.

Discovering different views

Laura Doepker, a postdoctoral Research Fellow at the Fred Hutchinson Institute who works in virology and immunology, said in a user interview that she was able to discover several groups of associated authors and papers she hasn’t heard of. “I was able to find a new perspective on the subject that I would not have found otherwise”, she said after searching for groups focusing on a certain coronavirus gene. In one case, Dr. Doepker was interested in finding different/competing groups in the same domain as that of a leading virologist with whom she was already familiar. Using SciSight, she discovered a group of Chinese authors that published a paper in a local journal she hadn’t heard of — and she found the paper to be interesting and worth reading based on the title. “I wouldn’t have come across this journal and paper otherwise, and it’s a very different approach. I should definitely read this”, she said.

Finally, demonstrating the versatility of tools in SciSight, when Dr. Doepker was interested in studying in-depth a certain antibody (CR3022) and its associations — she used the co-association visualization feature, finding “very relevant associations” and also “two potentially surprising and interesting publications. I’m going to look into those papers.”

Follow @semanticscholar on Twitter for the latest CORD-19 updates. Follow @allen_ai and subscribe to the AI2 Newsletter to say current on news and research coming out of AI2.

AI2 Blog

AI for the Common Good.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store