Exploring the COVID-19 network of scientific research with SciSight
By Tom Hope, Jon Borchardt, Jason Portenoy, Kishore Vasan, and Jevin West
Three months into the coronavirus pandemic, the world’s scientific knowledge of the SARS-CoV-2 virus is rapidly expanding. Reports of potential vaccines and treatments sprout up almost daily. Thousands of papers have been pouring into Semantic Scholar’s COVID-19 Open Research Dataset (CORD-19), a collection of nearly 60,000 scientific publications of potential relevance to the topic, both historical and cutting-edge.
Artificial intelligence researchers have also mobilized, building a flurry of search engines and knowledge bases. Standing out according to recent news is BenevolentAI, which built a biomedical knowledge graph based in part on the scientific literature, and used it to discover a potential drug to repurpose for COVID-19 treatment. But the AI could only go so far, and human intelligence was needed to get across the finish line. “It is not like we have this giant button and we just smack it and stuff comes out the other end,” an engineer was quoted as saying. The team, including a pharmacological expert, used visualization tools such as a “multicolored map” and “a kind of digital flow chart” to identify the relevant genes and drugs, eventually landing at a drug previously used to treat arthritis.
Despite the natural human inclination toward visual information, the predominant way scientists search and consume research is through lists of articles in academic search engines, with scarce visualization features. While search engines are a powerful tool for quickly finding documents relevant to a query, they are mostly not geared for exploration and discovery — investigating a knowledge base, finding out the unknown unknowns. A recent review of scholarly visualization tools states that they are “applied relatively rarely” in information retrieval, and that useful and user-friendly science mapping systems are difficult to design.
Exploratory search and visualization for science
To help accelerate scientific discovery with visualization, last month we launched SciSight, a framework of exploratory search and visualization tools for the COVID-19 literature. The first version of SciSight supported exploring associations between biomedical concepts appearing in the literature. In preliminary user interviews, the tool was found helpful in discovery-oriented search. We now release two important updates of SciSight.
The first is an exploratory faceted paper search. When searching for a topic or an author, new suggestions to help refine the search are suggested based on top co-associations with the initial search. This feature also helps prevent fixation on an initial topic and facilitates associative exploration — navigating an area or topic of interest by viewing associated topics (or authors) to explore, especially helpful when users do not have a pinpointed query in mind. In addition, the number of relevant publications is presented over time, possibly revealing trends for specific facets (e.g., a topic, author, organization or a combination). Users can adjust the desired time range and papers and facets displayed update accordingly.
Exploring the network of science
On the face of it, the science around COVID-19 is all about biomedical knowledge on genes, proteins, drugs, diseases, vaccinations, etc. But science is of course very much a human endeavor, with sociological undercurrents that have tremendous effects that are studied in science of science research. We want to leverage this social structure to bridge groups and disciplines, potentially facilitating new collaborations and discovery of methods, problems and directions other scientists are working on.
Silos of knowledge exist throughout the literature, and the sheer number of papers coming out every year in biomedicine makes it hard to keep track, let alone find out about new groups and new fields. This lack of visibility beyond one’s immediate sphere also leads to redundant work. During times of emergency, these problems are even more pronounced — when knowledge-sharing and collaboration are vital.
To facilitate exploratory search within the COVID research community, AI2 is working with the University of Washington’s DataLab to build an interactive visualization to help researchers make connections to other groups, methods and ideas in the literature. Unlike most bibliographic visualizations, we focus on groups and their connections, and integrate this social graph with exploratory faceted search.
Users can find what groups are working on what directions by searching topics, affiliations, or authors. As with the faceted search interface, each selected search query automatically prompts new suggested queries that are associated with the original, to suggest more groups and topics to explore. For example, a virologist may be interested in searching for groups connected to the University of Oxford. When searching for Oxford, the Ebola virus topic moves up the list of associated concepts. The scientist can select it to further refine the search.
The notion of a research group is somewhat elusive — scientists can belong to several groups depending on who they work with and on what. Here, we extract groups automatically from the co-authorship network of Semantic Scholar’s CORD-19 papers with graph-mining, in this case using an overlapping community detection approach so that authors can belong to multiple groups. We initially focus on authors who have been recently active — having at least one paper since 2017.
Given a query, we retrieve relevant groups and their connections. Each group is visualized as a “box”, each box showing salient authors, affiliations, and topics in a group. Boxes are color-coded to reflect relevance to the initial query — aiming to strike a balance between the relevance and diversity of the results shown. Users may select how many groups to view, zoom in/out, click a group and scroll down to see more detailed information and the group’s papers. Continuing our example, the virologist is now able to see not only groups directly affiliated with Oxford and Ebola, but also “neighboring” groups based on shared authors or shared topics.
Finally, links between groups represent topical affinity and social affinity. Hovering over green links to shows the top shared authors between two groups, and over purple links shows the top shared topics. To find out which researchers work in both groups or what are the major topics of interest they share, our virologist simply hovers over a purple edge, for example, to find out that two groups are both focusing on MERS.
Discovering different views
Laura Doepker, a postdoctoral Research Fellow at the Fred Hutchinson Institute who works in virology and immunology, said in a user interview that she was able to discover several groups of associated authors and papers she hasn’t heard of. “I was able to find a new perspective on the subject that I would not have found otherwise”, she said after searching for groups focusing on a certain coronavirus gene. In one case, Dr. Doepker was interested in finding different/competing groups in the same domain as that of a leading virologist with whom she was already familiar. Using SciSight, she discovered a group of Chinese authors that published a paper in a local journal she hadn’t heard of — and she found the paper to be interesting and worth reading based on the title. “I wouldn’t have come across this journal and paper otherwise, and it’s a very different approach. I should definitely read this”, she said.
Finally, demonstrating the versatility of tools in SciSight, when Dr. Doepker was interested in studying in-depth a certain antibody (CR3022) and its associations — she used the co-association visualization feature, finding “very relevant associations” and also “two potentially surprising and interesting publications. I’m going to look into those papers.”