Mailänder Platz 1, Stuttgart, Germany — Photo by @ante_kante

AI & Academic Search Engines

Artificial intelligence is changing the way research is performed

Alex Moltzau
DataSeries
Published in
6 min readJun 2, 2020

--

What will searching for academic information look like for the child peering out from the fence of 2020?

More importantly, for now perhaps, how is research changing this year?

Seeing a world of books, smartphones and computers while the protests in the streets for George Floyd is ongoing.

Bringing to mind inequalities in society there is much that is clear and horrible.

Made clear by activists, researchers, politicians, artists and a wide variety of civil or business engagement.

Yet, although many aspects are becoming clear there is much we cannot immediately see — like search patterns in research.

Because research is changing, and we need to consider carefully how.

I say this without arguing that there is any immediate inequalities that are generated, however that it could be worth exploring.

Search engines in research

It is curious to think that Google Scholar was launched in November 20, 2004. It is now the 3rd of June 2020, and it has come quite a way.

While Google does not publish the size of Google Scholar’s database, scientometric researchers estimated it to contain roughly 389 million documents including articles, citations and patents making it the world’s largest academic search engine in January 2018.

An article in Springer 2018 lists a series of academic search engines and bibliographic databases (ASEBDs).

As such it can be seen that there are several platforms that are focused on being search engines and some more focused on aggregating scientific information.

There is additionally a great variety in the way that search is being performed.

However, if you look elsewhere there is an excellent page on Wikipedia that lists about 147 academic databases and search engines.

That is legitimately a lot of academic search engines.

Some of the key players in this quest for the best content curation tools can be said to be Google, Microsoft, and AI2.

AI2 or the Allen Institute for AI is perhaps one of the most interesting ones.

They seek to achieve scientific breakthroughs by constructing AI systems with reasoning, learning, and reading capabilities.

They made Semantic search, but they are attempting to go way beyond this!

Three reasons why AI2 is one to watch

This is no grand reasoning, simply a list of five projects at AI2.

  1. Aristo: where the goal is to design an artificially intelligent system that can successfully read, learn, and reason from texts and ultimately demonstrate its knowledge by providing explainable question answering.
  2. PRIOR: A ground-breaking research project on visual knowledge extraction that capitalizes on the wealth of information available in images. PRIOR aims to create knowledge bases composed entirely of information derived from images, both static and video.
  3. MOSAIC: the Mosaic project is focused on defining and building common sense knowledge and reasoning for AI systems.

In this manner they are focused on the semantic, yes, however they go beyond the semantic to understand different data types.

The interesting part is that AI2 is a research institute.

It was founded by late Microsoft co-founder Paul Allen.

They are according to themselves very much focused on AI for the common good:

They are very inspiring indeed, and could be an important player should they choose to engage themselves for mapping methodologies in how AI influences research.

…And they have more projects that combine to make their relevance high in this case.

Most of these are projects interesting for academic search.

Still, there needs to be transparency in these processes.

Microsoft Academic and search transparency

How does one of the largest technology companies, if measured on market capitalisation, measure up?

At the time of writing (the 3rd of June) these were the stats presented on their page.

In addition it seems to make their data available for offline processing.

In addition to this they seem to have a blog about how they work with their search methods. Certainly more transparent than most other engines I have seen.

As such from a short overview we can see there are plenty of large and small technology companies interested in academic search.

Artificial intelligence in academic search

What can be discussed to a greater extent is how AI influences research.

If research is both increasingly personalised and global, then it would make sense for the researcher to understand how their search was influenced, at least to some extent.

However, can that be done?

If it can be done, can it be done in a responsible manner?

If every researcher needs a data request for their research things might get messy.

On the one hand it might be difficult, on the other hand there is a methodological question that can be of interest.

Then again, is it that hard?

GDPR makes it easier to request your own data:

“The process for data access under GDPR will be mostly the same as it was under the Data Protection Act of 1998, but with a few slight differences. For starters, a person will need to file a subject access request (SAR) that, as noted by the Guardian, is simply “an email, fax or letter asking for their personal data.”

If it is this easy then surely a scientist can send a SAR to Google Scholar?

A scientist can send a SAR to Microsoft Academic.

… and Semantic Scholar, and so on so forth.

Is this good research practice?

Increased personalisation may mean researchers are living in their own echo chambers of what they ‘like’ or ‘cite’ online.

Citation circles or citation cartels has already been an issue that is becoming clearer with these digital tools.

It seems as well citation patterns are changing due to these new platforms.

Most research is now influenced by search engines.

Machine learning is one component (not all) of a good search engine, however it seems to increasingly be playing a larger role.

So it might be important to keep an eye on AI & academic search engines.

If not, perhaps simply doing a subject access request (SAR) based on how you have been conducting your research on the specific search engines.

Attempting to understand from a critical perspective how your search was influenced by other things you search for elsewhere (particularly in the case of Google) may be beneficial.

I hope you found this interesting, and although I am mainly writing about artificial intelligence within the context of hardware and the climate crisis you may understand why these applies to all above.

Currently it could be argued that AI partly influences most research.

If there is to be transparency, reflexivity, and accountability in research we may need to be more aware of what that influence is.

This is #500daysofAI and you are reading article 364. I am writing one new article about or related to artificial intelligence every day for 500 days. My focus for day 300–400 is about AI, hardware and the climate crisis.

--

--

Alex Moltzau
DataSeries

Policy Officer at the European AI Office in the European Commission. This is a personal Blog and not the views of the European Commission.