Scraping All your Conversations with ChatGPT Made Easy with GPT_Scraper!

Rodolflying
4 min readMar 25, 2023

--

Midjourney — Imagine a chatbot being scraped by a freelancer

I developed a powerful tool called GPT_scraper, which is a collection of scripts that can scrape conversations with ChatGPT without the need for an API_KEY!

The GPT_scraper repository currently has three main tools: Backend API Scraper, Selenium Scraper, and Store a new conversation. With these tools, you can interact with ChatGPT programmatically, saving credits, and maximizing the power of the language model.

To use GPT_scraper, you need to have Python 3.x, Postman or Insomnia, Google Chrome (version 111 or later), and ChromeDriver installed. You also need to install the required Python packages listed in the README file.

1) scrape with the hidden api (api_scraper.py)

You can check directly the article for the full explanation of this script here:

https://medium.com/@rodolfo.antonio.sep/chatgpt-api-magic-leveraging-frontend-endpoints-for-advanced-data-extraction-fab5d520a0fc

If you want to use the Backend API Scraper, you need to log in to ChatGPT using your main Google Chrome application. Then, follow the instructions in the README file to obtain the required headers and run the api_scraper.py script in your terminal. The program will fetch conversations with a time limit of 2–5 seconds between each conversation to avoid getting blocked. The results will be saved in the “outputs” folder in either JSON or CSV format.

  • No need for an API_KEY — saves credits and reduces hassle
  • Fetches conversations with a time limit to avoid getting blocked
  • Results can be saved in either JSON or CSV format for easy analysis
  • Provides conversation_id, creation_time, and title for each conversation fetched
  • Need more initial steps and effort but it totally worth it!
  • Way faster than selenium

2) scrape with selenium (scraper.py)

Alternatively, you can use the Selenium Scraper tool by running the scraper.py script in your terminal. This tool relies on the user’s browsing history, so make sure to close all other instances of Chrome before running the script. The results will be saved in the “outputs” folder in either JSON or CSV format. The program will fetch conversations with a time limit of 6–10 seconds between each conversation to avoid getting blocked (yes, they blocked me because a reached the limit of request). If for some reason you get blocked, simply eliminate the browser history of the last day or last hour, depending when you get blocked.

  • Easy to use — simply run the script and wait for the results
  • Results can be saved in either JSON or CSV format for easy analysis
  • Relies on user’s browsing history, so no need for additional API keys or credentials

3) conversations.py

If you want to store a new conversation with ChatGPT, you can use the conversations.py script. Follow the instructions in the README file to run the script, prompt your questions to ChatGPT, and close the conversation with one of the available options. The results will be saved in the project folder in JSON format with the date of the conversation (you can check another article i made before for a further explanation of this option : https://medium.com/@rodolfo.antonio.sep/streamline-your-chatgpt-experience-with-gpt-scraper-eabf30643b44).

  • Easily stores new conversations with ChatGPT in JSON format
  • Simple and user-friendly — just prompt your questions and close the conversation with one of the available options
  • Results can be easily accessed in the project folder with the date of the conversation

Conclusion:

With GPT_scraper, you can easily scrape conversations with ChatGPT without the need for an API_KEY. This tool is powerful, easy to use, and can save you a lot of credits. So go ahead, give it a try, and see how it can help you in your language-related projects.

Overall, the main benefit could be the fact that you can look for past answers instead of asking again. The data stored and its further analysis could tell a lot of your own interests and more fun stuff! … and potentially material for more scripts and articles ;)

This is a resume article for the repo, i’m going to explain further details in a articles series for each script (more technical details). If you have some ideas to ask for to develop in future version, you are welcome to comment the post, contact me or collaborate on github!

Midjourney Art

--

--

Rodolflying

Industrial Engineer. I find inspiration in data science and technology to solve real-life problems. https://www.linkedin.com/in/rodolfo-sepulveda-847532135/