Turn GPT-4 Into your Personal Literature Review Bot
👋 Hey Friends,
Getting into a new field of research requires reading dozens of landmark and seminal papers to understand that field’s foundations. This usually involves the manual process of searching key-words on google scholar, finding papers, polling friends, advisors, colleagues, and other experts for the most important papers. It’s an ad-hoc process that can take months.
Even more time intensive is conducting a proper, methodical, literature review. Establishing a method, identifying the correct keywords, finding the right databases - it’s difficult but vital work.
Lately, I’ve been thinking about how I can use ChatGPT to get researchers a summary of a field’s most important papers as quickly and digestibly as possible. That brings me to the first tool in The Academic’s ToolKit, the Scholar Scraper.
All the code can be found here: https://github.com/TheAcademicsFieldGuideToWritingCode/ScholarScraper
Scholar Scraper
The Academics_Scholar_Scraper is a Python package that quickly searches and summarizes the most cited articles pertaining to a keyword. It uses the Elsevier Scopus API and GPT-4 to rapidly produce a csv of the top articles with the following data:
- Title: The title of the paper.
- Authors: The authors of the paper.
- Publication Name: The name of the publication where the paper was published.
- Publication Date: The date when the paper was published.
- DOI: The DOI of the paper.
- Summary: A summary of the paper generated by the GPT-4 model.
- Hypotheses: Hypotheses in the paper as interpreted by the GPT-4 model.
- Methods: Methods used in the paper as interpreted by the GPT-4 model.
- Findings: Findings in the paper as interpreted by the GPT-4 model.
Convinced? Let’s get into how to use it!
Installation
To install the scholar scraper there are three steps:
- Install Python on your system if you haven't already.
- Install the
academics_scholar_scraper
package using pip. - Set up your Elsevier API key and OpenAI API key as environment variables.
Let’s go through them together 😄
Heads up! I’m still getting acquainted with what level of detail is required for different audiences. If you have any questions about any of the steps laid out here I’d love the opportunity to answer them and help however I can.
Installing the academics_scholar_scraper package
- Make sure you have Python installed on your system. You can check if Python is installed by running the command
python --version
in your terminal. If Python is not installed, you can download it from the official website at https://www.python.org/downloads/. Alternatively, you can use your Operating System’s package manager to install. - Open a terminal or command prompt and run the following command to install the
academics_scholar_scraper
package:
pip install academics_scholar_scraper
This command will download and install the package and its dependencies.
Setting up the Elsevier API key
- To use the Elsevier API, you need to obtain an API key from the Elsevier Developer Portal. If you don't have an account, you can create one for free at https://dev.elsevier.com/.
- Once you have an account, log in to the Elsevier Developer Portal and navigate to the "API Key Generator" page. Select the "SCOPUS" product and generate a new API key.
- Copy the API key and paste it into a text editor or note-taking app for safekeeping.
- In your terminal or command prompt, run the following command to set the
ELSEVIER_API_KEY
environment variable:
export ELSEVIER_API_KEY=your_elsevier_api_key
5. Replace your_elsevier_api_key
with the API key you generated in step
6. To verify that the environment variable is set correctly, run the following command:
**echo $ELSEVIER_API_KEY**
This should output your Elsevier API key.
Setting up the OpenAI API key
- To use the OpenAI GPT API, you need to obtain an API key from the OpenAI website. If you don't have an account, you can create one for free at https://beta.openai.com/signup/.
- Once you have an account, log in to the OpenAI website and navigate to the "API Keys" page. Generate a new API key for the GPT model.
- Copy the API key and paste it into a text editor or note-taking app for safekeeping.
- In your terminal or command prompt, run the following command to set the
OPENAI_API_KEY
environment variable:
export OPENAI_API_KEY=your_openai_api_key
5. Replace your_openai_api_key
with the API key you generated in step 2.
6. To verify that the environment variable is set correctly, run the following command:
echo $OPENAI_API_KEY
This should output your OpenAI API key.
That's it! You have now installed the academics_scholar_scraper
package and set up the necessary API keys to use it.
Usage
To run the script from the command line, use the following arguments:
- keyword: The keyword to search for in the articles.
- n, --num_papers: The number of papers to retrieve (default: 10).
- o, --output: The output CSV file (default: papers.csv).
- s, --subject: The subject area (e.g., AGRI, ARTS, BIOC, etc.) (optional).
For example, if you want to search for ten papers related to machine learning in the computer science subject area and save the summaries in a file called "results.csv", use the following command:
academics_scholar_scraper "machine learning" -n 10 -o results.csv -s COMP
Here's a step-by-step guide on how to use the academics_scholar_scraper
package to retrieve academic papers using the command line:
- Open a command prompt or terminal window on your computer. You can typically do this by searching for "Command Prompt" or "Terminal" in your computer's search bar.
- Navigate to the directory where the
main.py
file is located using thecd
command. For example, if themain.py
file is located in a folder calledmy_project
, you can navigate to that folder using the following command:
cd path/to/my_project
Replace path/to/my_project
with the actual path to the my_project
folder on your computer.
3. Once you're in the correct directory, you can run the academics_scholar_scraper package with the appropriate arguments. Here's a breakdown of the available arguments:
keyword
: The keyword to search for in the articles. This argument is required.n
,-num_papers
: The number of papers to retrieve (default: 10).o
,-output
: The output CSV file (default: papers.csv).s
,-subject
: The subject area (e.g., AGRI, ARTS, BIOC, etc.) (optional).
here's a list of subject areas that you can use with the --subject
argument:
- AGRI: Agriculture and Biological Sciences
- ARTS: Arts and Humanities
- BIOC: Biochemistry, Genetics and Molecular Biology
- BUSI: Business, Management and Accounting
- CHEM: Chemistry
- COMP: Computer Science
- DEC: Decision Sciences
- DENT: Dentistry
- EART: Earth and Planetary Sciences
- ECON: Economics, Econometrics and Finance
- ENGI: Engineering
- ENVI: Environmental Science
- HEAL: Health Professions
- IMMU: Immunology and Microbiology
- MATE: Materials Science
- MATH: Mathematics
- MED: Medicine
- NEUR: Neuroscience
- NURS: Nursing
- PHAR: Pharmacology, Toxicology and Pharmaceutical Science
- PHYS: Physics and Astronomy
- PSYC: Psychology
- SOCI: Social Sciences
- VET: Veterinary Science and Veterinary Medicine
4. To run the package with the appropriate arguments, use the following command structure:
academics_scholar_scraper "keyword" -n num_papers -o output_file -s subject
Replace keyword
with the keyword you want to search for, num_papers
with the number of papers you want to retrieve (if different from the default of 10), output_file
with the name of the output CSV file you want to create (if different from the default of "papers.csv"), and subject
with the subject area you want to search in (if applicable).
For example, if you want to search for ten papers related to machine learning in the computer science subject area and save the summaries in a file called "results.csv", use the following command:
academics_scholar_scraper "machine learning" -n 10 -o results.csv -s COMP
5. Once you've entered the appropriate command, press Enter to run the script. The script will retrieve the specified number of papers related to the specified keyword and subject area, and save the summaries to the specified output file in CSV format.
That's all there is to it!
Output
here’s an example run of the package I did for a friend in psychology:
academics_scholar_scraper 'empathy' -n 10 -o test2.csv -s PSYC
{
"Title": "Measuring individual differences in empathy: Evidence for a multidimensional approach",
"Authors": "Davis M.",
"Publication Name": "Journal of Personality and Social Psychology",
"Publication Date": "1983-01-01",
"DOI": "10.1037/0022-3514.44.1.113",
"Summary": "This article explores the concept of empathy as a multidimensional construct and proposes a new method for measuring individual differences in empathic abilities.",
"Hypotheses": "The author hypothesizes that empathy is a multidimensional construct and that it can be effectively measured using a multidimensional approach.",
"Methods": "Davis developed the Interpersonal Reactivity Index (IRI), a self-report questionnaire designed to assess four dimensions of empathy (perspective-taking, empathic concern, personal distress, and fantasy), and tested its validity using various samples.",
"Findings": "The results indicate that the IRI is a reliable and valid measure of individual differences in empathy, supporting the idea of a multidimensional approach to empathy assessment."
}
❗ Disclaimer
It is important to note that the script uses GPT-4 to generate summaries, which may not always be perfectly accurate. The generated summaries should be used as a starting point for further investigation, and you should always refer to the original articles for accurate information.
Conclusion
The ScholarScraper tool is a powerful and convenient way to search for and summarize scholarly articles. By using the Elsevier Scopus API and OpenAI's GPT-3 model, it can help you quickly find relevant articles and get a high-level overview of their content. Give it a try and see how it can enhance your research process!
If a youtube tutorial would be useful, let us know! Also, if there’s interest, I’ll do a follow up blog post on how the code works, how to extend it, and any new features we at the Academic’s Field Guide to Writing Code implement along the way.
Cheers,
Nathan Laundry