ScholarInsight

A Chrome Web Browser Extension for Visualizing Google Scholar Profiles.

Yuexi (Tracy) Chen
VisUMD
5 min readOct 25, 2021

--

When I was a naive Ph.D. student (I still am), I had a conversation with one of my senior Ph.D. friends. It went as follows:

“I just found a highly productive research group at the University of XXX: Ph.D. students in that group published 10+ papers way before they graduate!”

“How many are first-author papers?” My friend calmly asked.

“I ignored that. I also noticed that the professor who leads that research group submitted tens of papers to a single conference.”

“How many are last-author papers?” My friend calmly asked again.

“Umm, I also ignored that. Why do you ask?”

My friend explained that since the authorship order in our research area is contribution-based, for student researchers, usually people only care about first-author papers, and for faculty researchers, last-author papers. I dug into this problem further and found a report on Science, which reveals how some researchers managed to publish more than 70 papers a year via courtesy authorship.

Limitations of Google Scholar

Since then, when I looked at a researcher’s Google Scholar profile, I started to pay more attention to the authorship orders, especially for highly cited papers. However, Google Scholar doesn’t filter papers by authorship orders automatically, so I have to sift through the author names manually.

Despite its popularity, Google Scholar provides limited user interactions on the Scholar’s profile page: papers can only be sorted by year or citations. While it’s easy for people to identify the most cited or most recent papers, it’s hard to spot papers published a few years ago but gained more attraction than peer papers. Google Scholar provides a bar chart for aggregated citations each year; however, it treats all citations generated by papers of different authorships equally.

ScholarInsight

I decided to do my data visualization course project on this. At first, I thought about creating offline data visualizations based on existing datasets. However, considering citations accumulate every day, I gravitated towards a more general tool to visualize citation data on-demand.

Therefore, I created a Chrome web browser extension called ScholarInsight. As a Chrome extension, it can easily be applied to any Google Scholar profile. (With the wonderful D3.js, I’m also able to create customized data visualizations.)

Here is what ScholarInsight looks like:

For any Google Scholar profile page, ScholarInsight creates a side panel for it. The primary view is a scatter plot, of which the x-axis is the published year and the y-axis is the number of citations. Each paper is visualized as a colored dot, of which orange denotes first-author papers, blue denotes last-author papers, and grey denotes other types of papers. ScholarInsight also calculates the author’s median citation for each year and connects them to a dashed line.

ScholarInsight provides several ways for users to interact with:

  1. Filter: users can filter papers by checking/unchecking paper type boxes; currently ScholarInsight supports first-author, last-author, and other types.
  2. Scale-transform: users can choose linear or log-scale for citations
  3. Zoom/pan: users can either click the zoom buttons or use the mouse wheel to zoom; users can also pan the visualization by dragging it.
  4. Tooltips: users can hover on the dot, and click it to show detailed information about that paper.

Let’s zoom in and take a look at recent first-author publications of this researcher:

We can see that there is a paper published in 2018 that is way above the median citation line (the dot becomes larger because we hovered on it), and if we click it, it’s “Data Illustrator.” The paper won the best paper award of the CHI (Computer-Human Interaction) conference and is also a popular data visualization authoring tool: Data Illustrator has its own Twitter account and has attracted 1200+ followers so far.

Sometimes the linear scale doesn’t work well; for example, let’s look at this researcher’s profile:

At first glimpse, something went wrong. Why is there only one dot? Oh wait, that’s because one paper has generated more than 17k citations; thus, all papers are clustered at the bottom. In this case, we can switch to the log-scale.

If we only look at the first-author and last-author papers, we can see a rough boundary: this researcher started to have last-author papers after 2009. What happened? Well, yes, he became a professor around that time. It’s expected for productive authors that more than one paper published in the same year happened to have the exact citations, so ScholarInsight also jittered the data to avoid visual clutter.

Here is another some interesting finding by using ScholarInsight

Above is the profile of Andrew Ng, a renowned machine learning researcher. Why he has many first-author papers in recent years when he’s already a tenured professor? It turned out that he spent more time introducing artificial intelligence to a broader audience, so he started to publish less technical papers, like this one: Artificial intelligence is the new electricity.

Discussion

I conducted informal user study sessions of three female computer science Ph.D. students. In their research fields, the order of authorship is also based on contributions. They applied ScholarInsight to one of the researchers that they are interested in and found some insights. For example, they quickly identified the most-cited first/last-author papers. One student disappointedly found that although one researcher has hundreds of citations, he didn’t have any first or last-author papers.

I expect ScholarInsight to help novice researchers or students who apply for graduate schools: when they need to look at tens of potential advisors’ Google Scholar profiles.

Demo: https://youtu.be/7-C5sBFzqNc

Download ScholarInsight and try it yourself: https://github.com/TracyYXChen/ScholarInsight

(ScholarInsight is currently under review for the Chrome Web Store.)

--

--

Yuexi (Tracy) Chen
VisUMD

Computer science PhD student@Univ of Maryland, working on Human-Computer Interaction (HCI)