Measuring research impact using AI: opportunities and challenges

Erika Deal
Semantic Scholar
6 min readOct 25, 2017

--

Recently, Science magazine published an article with a ranking of top biomedical researchers based on citation data provided by Semantic Scholar. The list rightly sparked a discussion on social media about citation data and bias in algorithms and academia more generally.

After reflection we realize that publishing this data was a mistake both due to the quality of the data and because it does not benefit the scientific community. It is also a good opportunity to talk about the challenges and limitations of analyzing researcher impact using algorithms.

How can bias show up in research impact analysis?

Bias can come into play in two major ways when analyzing citation data:

  1. The data that we select for analysis
  2. The assumptions built into the model that performs the analysis

In producing the list of top scientists, we realize that we had flaws in both steps. Here’s what happened and why this kind of analysis is hard to do right.

Bias in the data: garbage in, garbage out

Researchers like to keep tabs: who’s famous, who’s on the rise, whose lab is ahead. Scientific research is highly competitive and it stands to reason that researchers might want to see the citation record sliced and ranked in new ways to show who’s on top.

But many recent studies have exposed problems with citations as a measure of scientific impact. Large-scale analyses of available citation data have taught us that:

All of this adds up to increasing awareness in the academic community that citations, while critical for tracing and measuring scientific work, are a flawed or at least insufficient way to measure impact.

Multiple studies have found evidence that citation metrics favor some groups over others. Some possible causes of unequal distribution and benefit of citations include:

  • Self-citations. As citation counts and h-index increasingly become part of tenure considerations, researchers are incentivized to pad and otherwise game the system. As noted above, this happens at different rates among different groups. We filter out self-citations in our metrics for this reason.
  • Institutionalized bias in fields of study. Particularly in the sciences, women have long struggled with lower visibility and the vicious cycle of less prestigious positions, lower output, and lower citation rates. Women are also underrepresented as part of the peer review process, even correcting for numerical differences in representation.
  • Researcher bias. This may take the form of gender and racial bias, but it may also include hard-to-quantify preferences for scholars who work in the same country or with whom the researcher has a professional relationship.

Despite the fact that the deck is stacked against underrepresented groups, citation-based measurements still remain a primary measure of achievement for researchers. As more citation data becomes publicly available, however, Semantic Scholar and others are starting to experiment with alternative ways to analyze and measure researcher impact.

Bias in AI models

When we take that biased data and feed it into an algorithm, we’re subjecting it to an additional layer of assumptions as we try to produce an analysis. These assumptions may not account for complications like:

  • A “good” citation count may look different from one field of study to the next.
  • Self-citations can greatly inflate citation counts, even indirectly.
  • Researchers may change their publishing name or have names that don’t conform to the “firstname lastname” convention.
  • Innovative research is not always published in top journals.

Relying on community consensus is an easy starting point for the issues above, but it runs the risk of amplifying and perpetuating community biases under the banner of objectivity.

On Semantic Scholar team, we have attempted to address these issues in several ways, most important of which is our measure of “highly influential citations.” This metric is an attempt to identify citations that indicate one article was strongly impacted by another. It also excludes self-citations — in other words, we try to capture citations that more closely represent “scientific impact” and weren’t just added for the sake of the review process.

However, we realize that this metric remains imperfect. It is still fundamentally based on cumulative citation counts, vulnerable to coverage gaps, and makes assumptions about how scientists discuss influential papers that likely need closer examination. Even as we try to correct for known issues with the data, there are challenges that we have yet to account for.

Since we know there will be issues with the data and our ability to analyze it, any analysis of research impact we make will either amplify or expose bias in the scientific community — it’s up to us to decide which route to take.

What can we do better?

The difference between amplifying and exposing bias starts with contextualizing the data. This was a major learning for us on Semantic Scholar: when you know you have flawed data, it becomes all the more important to clarify what conclusions can and cannot be reasonably drawn from it. And when you see flaws that come from bias, you have a responsibility to try to address them.

As of today, here is how we’re thinking about appropriate use and communication of citation data:

Be careful about using AI to rank individuals and groups of people. AI models are only as good as the data that goes into them, and they are a lot less accurate and deterministic than many people assume.

Consider the public good when releasing citation data. What are we adding to the research community when we talk about citations as an impact indicator for researchers, papers, and journals? If a ranking or score does not assist researchers in their work, consider not publishing it.

Be transparent about training data and assumptions built into the models. And if we can’t explain those assumptions and how we chose the data very clearly, we have some work to do before we can release our findings. This is the standard we hold peer-reviewed research to, and we should do the same for automated analyses of that research.

Make sure flaws in the data can be corrected. This is another big learning for the Semantic Scholar team: our output will never be as good as we want it to be if we can’t improve the underlying data. This is also an important principle of user-centered design and something we hope to address in 2018.

Open and ongoing research

Finding more accurate ways to measure impact is an open area of research. Several researchers are now focusing specifically on finding ways to combat bias when measuring scholarly impact.

At the University of Washington, Jevin West and Carl Bergstrom are digging into citation data to uncover new ways to measure impact. The Eigenfactor project explores problems of gender bias, journal ranking, and structural issues in science while providing alternate metrics for measuring the impact of scientific research.

Altmetric and Impactstory attempt to analyze additional sources of information that capture scientific impact: discussions, open reviews, and more. As more and more scholarly output moves online, altmetrics provide an interesting avenue for new measurements of research impact.

In biomedicine, researchers are investigating interventions and supporting large-scale initiatives to improve scientists’ willingness to participate in bias training, ensure more equal access to funding and other professional opportunities, and fight institutional barriers to ensuring that everyone can benefit from medical advances.

Of course, this is just scraping the surface. Normalizing citation metrics is an open research problem, and breaking down the issues that can lead to unequal research impact analysis will take time and experimentation. We hope that the spirit of scientific inquiry helps us have more open, more self-critical conversations and find more effective ways to equalize the access, reporting and impact of science over time.

Did we miss relevant research we should share here? Please comment below!

Thanks to Miles Crawford for contributions to this post and several members of the Semantic Scholar team for reading and editing.

--

--