Using Large language models like GPT to do Q&A over papers (II) — using (free) over CORE,, Semantic Scholar etc domains

Aaron Tay
Academic librarians and open access
16 min readFeb 28, 2023


Summary : In this blog post, I delve into the use of, a startup that combines search engine results with GPT-3 results like Bing’s new chatbot. But instead of letting it get results from all over the net, I test with a few examples what happens if you restrict the results, so it gets results only from domains that include only academic content such as Google Scholar (, CORE (, Semantic Scholar (, Scite ( and Google books(

Effectively, you are now querying papers directly for answers using the power of state of art — large language models (LLMs). The results are amazing! Among other things, this technique is able to find seminal papers, answer direct questions on the paper that first coined a certain term etc.

Interestingly, two of these domains and have specialized tools that already apply LLMs on their content. In the case of there is a beta “Ask a question feature” , while does the same over Semantic Scholar data.

When the results from these specialized tools are compared against the simplistic idea of restricting results to specific domain in Perplexity, Perplexity shockingly does a lot better in these few examples! I speculate in the blog post why this is so.

All in all, I am nearly sold, it isn’t perfect but Perplexity does amazingly well for a task that it wasn’t designed for. And this isn’t just about Perplexity, I expect with a bit more fine tuning and tweaks this way of using LLMs will become the standard in the next few years for search.


In part I, I shared about how the latest Large Language Models are being combined with search engine technology to create real Q&A systems.

Unlike past promises of semantic search, this seems like the real deal. The latest generation of language models particularly the transformer based models like BERT and GPT which use tricks like self-attention and masked self-attention really seem to be able to “understand” text.

They combine the state of art capabilities of LLMs to do Natural Language processing with the information retrieval capabilities of traditional search engines.

Roughly speaking, when you do a query it looks up the top N promising documents, then tries to see which parts of the top N document might best match the query (embeddings are used!). The top passages which are deemed to likely answer the query are then passed over to the LLM (e.g. ChatGPT, GPT3.5-davinci model, OPT, BLOOM) which then use these passages to answer the question.

If you are interested in knowing more how LLM technology is combined with traditional search technology please refer to my last post

Examples of search engines that work in such a way currently at time of writing includes general search systems like Microsoft’s Bing+Chatbot — so called Sydney (limited release at time of writing), Google’s Bard (not released at time of writing) and other commercial systems like and

I’m going to stick my head out to make a prediction that in 3–5 years times, such search engines will be the norm!

These systems will find results from the general web but you might prefer one that works closer to Google Scholar and extracts results only from academic papers when doing academic research.

Academic search + Large Language models

Take the example below.

While the example above looks good, you notice the sources are domains like and which aren’t the best sites you want your evidence to come from. ( is much better).

While OpenAI’s WebGPT (unreleased) was trained to try to learn which websites are reputable and search accordingly(the model has full access to a BING api which it can use to send queries and extract from webpages), this might not be something in the current Bing model.

One obvious idea in the case of doing academic queries is to do this search only over academic websites/papers.

In practice, if you looking for health type evidence, it might even be better to just do the query over Systematic reviews and meta-analysis!

If so your choices include tools like ,, scispace do.

Below shows an example of query which extracts answers from papers available in the Semantic Scholar Corpus.

Elicit query can you use google scholar alone for systematic reviews?

See my post on Q&A academic systems —, Scispace,, and Galactica.

Rolling your own search + large language model

Or you can consider building your own! See here, here, here ,here but all this require a bit of know how and either cost some money if you use OpenAI and other paid APIs for state of art embeddings or need a fairly powerful computer to run opensource LLMs like T5.

In this blog post, I will show you how using or you can have a feel of how such a system will work for your use case without paying a cent or knowing a line of code.

Obviously, Bing+Chatbot might work as well, but as of time of writing I don’t have access to it.

As noted in the last blog, the trick is simply restricting the queries to a specific domain. In my example, I wanted to do a Q&A over my library website so I simply did* <query>

*I assume it honors the site operator….

Perplexity search — restricted to my University site.

You get similar results using the Perplexity Chrome extension by going to a specific domain (e.g. your website) and selecting “This domain”

Using “this domain” option with Perplexity Chrome extension

You get similar results with

Using on domain (it got the answer wrong!)

The results aren’t always perfect, for example it can get confused by tables, but this is something that can be worked around by parsing the table etc.

What happens if you do it over only academic content?

I was toying with the idea of it answering research questions as a library chatbot and I shot it the question what are good datasets for ceo renumeration

And I was amazed it gave an excellent answer.

Askng Perplexity a research question restricted to

How did it do that? A lot of it was simply the fact that our institutional repository was hosting a lot of papers in finance and accounting on the same domain and Perplexity was extracting the answer from there! (Our ResearchGuides on the same domain helps too sometimes)

This led me to think, could I do the same trick over domains with a ton of papers? Sure I could do it on domains for preprint servers like but that would be domain specific. Publisher sites like might work but that would be too publisher specific.

In the end I thought of the following

  1. Google Scholar (
  2. CORE (
  3. Semantic Scholar (
  4. Scite —(
  5. Google books — (

After some testing, seems to have a large index than so I will show results only from perplexity.

1. Restricting to Google Scholar (

A natural idea was to try this on the largest single source of academic content — Google Scholar.

In my first query I asked about the result from one of my old published papers is accuracy of Wikipedia articles correlated with edit age?

Perplexity did very well and correctly found the paper — Improving Wikipedia’s accuracy: Is edit age a solution? that had the result. The last sentence might not be correct though.

What is interesting is that the sources given when linked on goes to the Google Scholar profile of the two authors including mine. I have a suspicion it is drawing own from the brief record with title , abstract like this page as Google Scholar does not itself host full text.

You certainly can ask questions about metadata

Here’s some evidence it is only using title/abstract and not full-text when asked what are seminal papers on deep learning.

You might be interested in my recent blog post on methods for finding seminal papers

The response was as follows

The papers suggested seemed odd to me even though I’m not a deep learning specialist. But when I looked at one of the papers suggested, you can see it is picking it up from the abstract where it mentions “seminal segmentation”.

Similarly asking which paper coined the term bronze OA gets zero answers.

Preliminary Conclusion — In general, this technique works only if the content you are looking for is in the title, abstract or other metadata. Can we do better with domains with full-text?

2. Restricting to CORE ( Domain

But what if I wanted to search full-text of papers for answers? Unfortunately, we don’t quite live in a world of full open-access.

That said one of the largest aggregator of Open academic content I know of is the CORE service at

Core Service

More importantly, unlike other aggregators like BASE (Bielefeld Academic Search Engine) Core actually harvests and hosts full text of papers on their domain, so hopefully the full-text can be used by Perplexity to answer queries.

First let’s try asking the question on seminal papers on deep learning

I’m not expert enough to judge if this answer is good, but checking the sources you can see it is picking up some answers from not just the abstract but also the full text (e.g. source 5)

Perplexity picks up seminal papers using full-text of paper indexed by CORE — source 5

After some testing, it seems asking for “Seminal works” instead of papers might give better results.

Here are other attempts to find seminal papers. The results look fairly decent (though not perfect particularly as it goes on) to my untrained eyes.

Again, changing the query to ask about seminal works instead of seminal papers make a difference. This shows how sensitive these searches are to different queries , which is characteristic of LLMs.

Let’s try asking the question that Perplexity failed when restricting to Google Scholar — which paper first coined the term Bronze OA?

As far as I know this answer is correct. At the very least it was extracted from source 1 — that believes this is so and is citing the Piwowar et al. (2018)

Preliminary Conclusion — In general, this technique looks to be really powerful, and it seems to be really looking into full-text of papers! But like any LLM prompts it can be sensitive to the phrasing you use.

3. Restricting to Semantic Scholar (

The other major source of open access data (both metadata and full-text) is Semantic Scholar. In fact, it is the source used by and many other tools that need a source of scholarly paper data.

You might bebetter off using here, since they use the same source but use LLMs in a far more refined manner, but I was curious to see the difference.

let’s try to find seminal works.

The results don’t look as good if I ask for “seminal papers” rather than “Seminal works” for unknown reasons.

For comparison, you might want to try the same queries on, which uses also Semantic Scholar as data with large language models to extract answers.

Here’s a comparison when I use to ask for seminal works.

Using elicit to ask for seminal papers

Interestingly, the answers look worse in I speculate that because Elicit is trying to rank papers to find seminal works, its top 4 papers which it uses to generate answers are quite old and do not tend to mention other seminal works?

Perplexity restricted to Semantic Scholar also passes the test when I ask which paper first coined the term bronze OA as it is able to look into full-text of papers.

When we try the same query in , it totally fails.

Elicit query on which paper first coined the term bronze OA

Here you can see the obvious problem.’s sematic search has totally failed to find the right papers at all. My guess is it’s ignoring the quotes for “bronze OA” and gets into trouble.

Changing the query to say “bronze open access” gets slightly better results but is still nowhere as good.

Preliminary Conclusion — In general restricting perplexity to answers from is gives similar results to restricting answers to domain and both seem to be able to extract answers from full text.

Given is using Semantic Scholar data with LLM technology, it is natural to compare the results there with this technique.

Interestingly, for the few queries I tested, Perplexity restricted to Semantic Scholar domain trounces elicit.

My theory is as such. generates results from the top 4 results while Perplexity from the top 10. But more importantly, Perplexity probably is using a “dumber” keyword search, while Elicit is doing a smarter semantic type search to find documents.

Ironically this hurts certain type of searches. For example, when I use Perplexity it probably just does a keyword search (which matches full-text) and looks for documents that mention paper XYZ is seminal and surfaces those. is probably smarter, and tries to actually find seminal papers….

For whatever reason for the few use cases I tried,’s more semantic searches are surfacing far worse results than Perplexity’s, and given the worse results, the LLM is unable to extract good answers.

4. Restricting to scite (

The next domain I tried with Perplexity is using This is a interesting search index that doesn't have full-text. What they have besides the usual metadata is citation citance or statements.

They have recently launched in beta a “Ask a question” feature that essentially uses LLMs to extract answers… but not from full-text, instead they use citation citance or statements. If you think about it is just a special part of the full-text aka the sentences around citations.

Are the results better? Let’s try again with perplexity limited to

Here’s a similar search asking for seminal works on deep learning using scite’s beta feature

As mentioned above, has it’s own beta “Ask a question” feature that is using GPT-3 to extract answers. How does it match up for the same query?

Not very well in my opinion.

Now let’s try Perplexity on the question — which paper first coined the term bronze OA. As before this is restricted to the domain.

So it failed for the specific question on which paper first coined the term bronze OA. It’s unclear why since the same type of query worked when restricting on the domain Maybe the papers available was different?

For what’s it worth using the’s beta ask a question feature, also did not do well.

Preliminary Conclusion —It's hard to say what’s going on. It seems to be able to identify seminal works to some extent, but in my few examples, restricting to domain doesn’t seem to do as well as to say

I’m guessing because each article paper comes with citation statements, it can see how other papers are describing the paper itself. For example, a seminal paper like Coase(1937) will have a page listing not just the metadata of the paper but also include citation statements from other papers saying it is a seminal work and this is picked up by perplexity.

Why it can’t find Piwowar (2018) as the paper that coined Bronze OA is unclear to me, as there is a scite page for it. Is it simply the lack of a citation statement in scite saying so?

Similar to the Semantic Scholar case, for some reason the results I get using perplexity restricted to scite domain seem superior to using specialized search systems that use these sources. In this case, Perplexity restricted to gives better results than’s build-in ask a question feature!

This feels surprising because you would expect a specialized tool like or to work better than a tool like perplexity that was never designed for this use case.

Then again’s feature is still in beta and while elicit has been working with OpenAI for almost as long as Perplexity (1 year+), it focused more on extraction of article properties (e.g. population sample, sample size, region of study etc) and the generative answer part was added fairly recently.

5. Restricting to Google books (

Let’s now switch to books, in particular let’s restrict results to

In first test query, I see the sources it points to are pages like

Some of these books have preview versions available, is Perplexity able to “See” them?

Probably. Take the query below.

You can see in the sources, sentences found by Perplexity, and when you check google books , you will notice there are from within Preview versions of books.

In the example below, I looked at the first source and searched within the book “It is clear that Darwin did not sail directly from Christian orthodoxy to atheistic materialism.” and you can see the hit.

All this seems very good, but when I try to work the other way, where I search for something that is available in Google book preview , Perplexity can’t seem to find the answer.

I suspect this has to do with the way Perplexity works . My guess is for googlebooks it doesnt have all the google book previews indexed…but if you hit the right pages, it can extract the results?

Perplexity — how it works

Perplexity has precious little information on how it works. we do know it uses or used OpenAI’s GPT-3 (similar to Elicit) , but we don’t how it matches or ranks pages.

The first question I wondered about is this. Like most search engines I assume when you search you are matching an index of content their crawlers have seen earlier.

But does it then extract the information from the indexed version or does it scrape the result on the fly? I believe it scrapes the page on the fly,

This is because when I search for the current date it is able to give me the right date and time, so it is definitely looking at the current contents of the pages found and not the indexed cached version.

One other feature I did not mention is, you can actually edit sources, by adding or removing them.

For example, when I do a search restricted to, it is unable to find the right answer because the paper isn’t on You can actually edit sources to improve the results.

The feature to edit sources is quite hidden. Click on the three dots button and then “Edit sources”.

Removing sources is nice particularly if you use a full internet search and you want to remove results from less trust-worthy sources.

Still in this particular case, Perplexity is unable to find the right answer as the paper with the right answer simply isn’t on as it isnt open access. However, by simply adding a URL to a page with the right paper’s title & abstract the problem should be fixed. (See earliest example’s using Perplexity restricted to Google scholar domain)

However while there is a “add more sources” feature button, but disappointingly, it does not allow you to add URLs directly, but just gives you the next highest ranked urls.

In fact, I would love for perplexity to allow me to define list of domains as a whitelist (only show results from these domains. I would add say all preprint, publisher etc. domains for research papers) or alternatively a blacklist (exclude trash domains)


Perplexity restricted to domains with papers seems unreasonably effective! And this is despite not being designed for this use case.

I have long talked about the power of Machine learning and deep learning coupled with open access to change the game. This appears to be one aspect of it.

Related: Are we undervaluing Open Access by not correctly factoring in the potentially huge impacts of Machine learning? — An academic librarian’s view

More specifically back in 2020 during the height of the pandemic, I mused about how the CORD-19 (COVID open research dataset) which was a huge project, partnered by Publishers, discovery vendors etc to aggregate all papers on COVID (including full-text) , harmonize the data to allow researchers to do text mining was a interesting “grand experiment” on the power of such techniques.

It seems two years on, the combination of search technology + even more powerful LLMs in general will kick off yet another larger wave of experimentation.

From my point of view, I am nearly sold, it seems inevitable to me that the future of search belongs to this class of systems that combine search results with LLMs extraction!

Post edit note: I just got access to the new Bing+chat feature. And the results are out of this world. Most of the time, I don’t even need to do site restriction. Some examples below



Aaron Tay
Academic librarians and open access

A Librarian from Singapore Management University. Into social media, bibliometrics, library technology and above all libraries.