Designing for Truth: Google Scholar Concept

Imagine you wake up one morning to an emailed article from your aunt, warning you not to vaccinate your newborn son. You worryingly open the link and read that a study published in The Lancet (one of the most reputable scientific journals in the world) has conclusively found that vaccines cause autism. The PDF of the actual study seems legitimate, however your lack of medical expertise prevent you from understanding or easily discrediting it. After all, it’s been published by The Lancet, has a dozen citations, and the article makes a reasonable argument. You may choose to research further, but some just may be convinced and re-share it on Facebook to warn other parents of the dangers of vaccination.

This is how misinformation happens.

The study in question is in fact real- authored by Andrew Wakefield and published in The Lancet in 1999. But what your aunt’s article left out is that the study has been fully retracted in 2010 due to data manipulation and Wakefield’s medical license has been revoked. The incident has come to be widely regarded as a major catalyst of the Anti-Vaccination movement, which is responsible for outbreaks of previously controlled diseases like measles and mumps, leading to many deaths.

Truth Compass

We’re in the midst of an epistemological crisis as a result of rampant misinformation which we either don’t have the time or expertise to verify. Although not perfect, science is the best tool we have to arrive at truth, as it produces the fundamental facts on which opinions, beliefs and world politics are based upon. However there many challenges to identifying credible research:

  1. Out of reach- technical jargon, dense text, and frequent paywalls make scientific research virtually inaccessible to the masses, leaving us reliant on the media for translation.
  2. Exaggerated- the media often over-simplifies and sensationalizes findings in order to generate ad revenue, further diluting the truth.
  3. Dynamic world, static info- most info published today is set in stone unless manually updated. This is especially dangerous for science, as it’s always evolving- what’s proved today may be disproved tomorrow and vice versa.
  4. Citation rabbit hole- studies often cite 20–50 other studies. If one of those studies is retracted, shouldn’t it affect the credibility of any study citing it?
How are we to know if a study in an article has been revised or retracted?
How credible is a study which cites retracted or outdated studies?
Is a study published by a private organization with conflict of interests?
Does the author have a documented history of fraud?

Current efforts like RetractionWatch.com keep track of retracted papers, along with the most highly cited retractions and a leaderboard of authors with the highest retraction count. Although a step in the right direction, visiting a site every time you want to verify a study or an author is burdensome and doesn’t scale.

Google Scholar Browser Extension

Perhaps one of Google’s more underrated projects is Google Scholar- a free database of ~150 million peer-reviewed academic journals, books, conference papers, theses, dissertations, and even court opinions and patents.

I envision a Google Scholar browser extension that detects retracted or outdated documents, warns if a study’s citation is no longer valid, alerts of authors with documented fraud, and notifies of privately funded studies, in hopes of helping readers establish credibility.

Let’s revisit your aunt’s article from earlier, with the Google Scholar extension installed:

When you open the article, your Google Scholar browser extension warns you it’s detected a bad study in the article:

As you start reading, you notice a link to the study in question is highlighted:

Clicking “View Details” or the extension icon brings up information cards with more details:

Info Cards consists of 7 sections:

  • Warning (if applicable) .
  • Document Type: name and link of document.
  • Publisher: publisher name, date, source
  • Author: Name, title, institution, co-authors,
  • Institution: Name, private or public
  • Info bar: cited by number, related articles and download (if applicable)

There are various severity states for each UI element of the extension:

In case of multiple documents, the Extension Icon displays the number of highest severity documents (if there’s 2 retracted and 4 nonclassified, the icon will be red with the number 2)


Info Card links lead to Google Scholar’s corresponding pages:



Research

I started by researching how information was disseminated and consulted a PhD friend of mine. I mapped out the relationships between the major players in the scientific news ecosystem:

  • Authors (professors) perform science research
  • Institutions fund those Authors
  • Journals Publish the quality findings
  • Media Sites and blogs report on those findings
  • Readers learn about scientific research findings through media sites

After some digging, it becomes obvious that nearly all players are incentivized to act in self-interest:

  • Authors- valuable research findings = career advancement
  • Institutions- more valuable authors employed = better reputation/ $
  • Journals- more valuable research published = better reputation / $
  • Media- more interesting research published = more ad money generated

These findings further reinforce the need for such an extension.

Next up, I mapped out the current navigation of Google Scholar:

Afterwards, I planned out the functionality I envisioned and navigation between pages:

Future Steps

The downside of browser extensions is that they don’t work on mobile. A possible solution would be a dedicated mobile browser or to bake the functionality in Google Chrome (android/iOS). Additionally, there are some interesting future possibilities:

  • A.I.- machine learning could eventually parse through semantics, and create meaningful relationships between Google Scholars’ database that could lead to further insight. AI could eventually be the ultimate peer reviewer, and B.S. detector, as it makes sense of all academic literature by detecting previously unnoticed patterns.
  • Tapping the crowd- sites like PubPeer.com allow academics perform post-publication peer review, which has highlighted shortcomings in several high-profile papers and even lead to retractions. It would be interesting to explore commenting or even voting by verified scientists.
  • Decentralization- I suppose no project is complete until the blockchain is involved… all jokes aside, I could see a science journal DApp on Ethereum that stores and notarizes studies on the blockchain. Verified scientist that submit peer-reviewed studies would be rewarded crypto tokens for quality contributions- there might even be a clever crowdfunding model for funding future studies. At that point Google Scholar would no longer belong to Google, as it would be independent service, immune to central control or censorship.

My longterm vision is a truth mechanism that extends beyond science, to news articles, blogs, tweets, sites, and even ebooks, by incorporating services like FactCheck, Snopes, FiB, B.S. Detector, and MediaBias.

The Russian influence on the 2016 US Presidential Elections is a testament to the power of weaponized information. In an age where social media has become a primary news source for many, reliability and credibility have faded, and in many cases outright disappeared. False information has become a major force that’s shaping our world as it’s cheaper and easier than ever to manufacture and propagate. This concept is just one approach to a much broader and complex problem, as we’re in desperate need of tools to guard us against misinformation

PS: Several days after finalizing this article, the Chan-Zuckerberg Initiative announced the Computable Knowledge Project, which aims to comprehensively connect and comprehend scientific papers using AI.