Web Scraping with ChatGPT Code Interpreter Using Only 01 Prompt (Step-by-Step Tutorial)

DatHero
12 min readJul 28, 2023

--

Performing Web Scraping may seem like a complicated and demanding task, whether you have programming knowledge or not. However, ChatGPT and the Code Interpreter plugin will save us many lines of code and headaches, as it will be able to extract information from web pages in seconds with just a single prompt.

Next, we will see, through three examples, how we can use ChatGPT to perform Web Scraping in a simple and practical way, all explained step by step

Let’s start…

1) Walmart

We are going to use the “Shop all Back to School” section of the Walmart online store. I am providing the direct link below:

Step 1: Define the Fields to Extract

We need to define the information we wish to extract. This is very important, as it will help us later construct our prompt in ChatGPT

In this case, we will scrape the product name and price

Step 2: Inspect Code

Here we need to define the code for 1 product (as an example to then input it into ChatGPT)

But before we do that, keep the following in mind:

To access the inspect element feature in Chrome, there are two keyboard shortcut options if you’re using Windows:

a) Ctrl + Shift + c

or

b) Ctrl + Shift + i

If you’re using macOS, use:

a) alt + Command + i

or

b) Option + Command + i

With that in mind, we can now inspect the Walmart website. Let’s review the sections:

i) Product Name

In this case, we need to locate the product name within the code to scrape

Let’s copy it and then include it in our prompt. To copy the span tag, we hover over the section, right-click, and the following will appear:

Now we just copy it, and for practical purposes we’ll keep it handy to include in the prompt later

<span data-automation-id=”product-title” class=”normal dark-gray mb0 mt1 lh-title f6 f5-l lh-copy”>Nintendo Kids Super Mario Bros. Mario World 17" Laptop Backpack</span>

ii ) Price

We will do the same for the price field

We’ll keep the copied element of the price field for later use

<div class=”mr1 mr2-xl b black lh-copy f5 f4-l” aria-hidden=”true”>$14.92</div>

If you need to extract more sections from the web page, you should repeat the same steps we performed for the product name and price

Tip: To quickly locate the field to inspect within the code area, simply position your mouse over the field, right-click, and the inspect option will be enabled

Step 3: Save the HTML File

Since we are going to work with the Code Interpreter, we need to attach a file to it. So what we will do is save the page we want to scrape as an HTML file.

Go back to the page and use the keyboard shortcut Ctrl + S (for both Windows and macOS)

keyboard shortcut : Ctrl + s

Next, save the file in HTML format in a local folder

Step 4: Upload HTML File + Generate Prompt

Now that we have defined the fields to scrape and their code on the web, let’s construct the prompt in ChatGPT

If you haven’t activated the Code Interpreter, let’s follow some instructions. Otherwise, I recommend you skip this part and go directly to constructing the prompt

i) Settings

ii ) Turn on Code Interpreter

After activating the Code Interpreter in ChatGPT, let’s upload the HTML file that we saved in Step 3

Now let’s construct the prompt, taking into account the product name and price, as well as the code for each of these sections (if in doubt, review Step 2)

Prompt: from the HTML file, extract the name of product and price, Put the data on a table and export it to a CSV file

Here is the element of one product:
<span data-automation-id=”product-title” class=”normal dark-gray mb0 mt1 lh-title f6 f5-l lh-copy”>Nintendo Kids Super Mario Bros. Mario World 17" Laptop Backpack</span>

Here is the element of the price:
<div class=”mr1 mr2-xl b black lh-copy f5 f4-l” aria-hidden=”true”>$14.92</div>

In case the price of the product is missing, leave that price as a null data

In the prompt, we see that there are 04 parts.

In the first paragraph, I specify that I have loaded an HTML file and ask it to scrape the product name and price. After doing this, I request it to export the data into a CSV file

In the second and third paragraphs, I provide ChatGPT with an example of each corresponding structure for the product name and price fields. We see that each product is a span tag and the price is a div tag

In the last paragraph, I ask it to assign null data if it finds null values for the price

It’s important to keep this prompt in mind, as the upcoming examples will have the same structure and will only change the fields and their codes

Results:

Download and open the CSV file

Finally, we have successfully performed web scraping for the products and their respective prices, which were then exported to a CSV file as shown in the table image. Note that the product we used as an example is included!

Bonus

The previous steps allowed us to perform web scraping from the first (01) page of the Walmart website. However, if we want to extract data from the second (02) page, we perform the same previous steps but don’t forget to identify a product within this new page and include it in the prompt as an example

Page 02 of the Back to School section on the Walmart website

i) Product name

<span data-automation-id=”product-title” class=”normal dark-gray mb0 mt1 lh-title f6 f5-l lh-copy”>Minecraft Boys Cliff Goats Graphic T-Shirt, 2-Pack, Sizes 4–18</span>

ii) Price

<div class=”mr1 mr2-xl b black lh-copy f5 f4-l” aria-hidden=”true”>$13.96</div>

Just like with the first page, we need to save the file of this second (02) page in HTML format (if you have any doubts, review Step 03)

Prompt

from the HTML file, extract the name of product and price, Put the data on a table and export it to a CSV file.

Here is the element of one product:
<span data-automation-id=”product-title” class=”normal dark-gray mb0 mt1 lh-title f6 f5-l lh-copy”>Minecraft Boys Cliff Goats Graphic T-Shirt, 2-Pack, Sizes 4–18</span>

Here is the element of the price:
<div class=”mr1 mr2-xl b black lh-copy f5 f4-l” aria-hidden=”true”>$13.96</div>

In case the price of the product is missing, leave that price as a null data

If you wish to merge both tables into one, you can ask ChatGPT to do the following:

2. Target

In this second example, we will perform Web Scraping from the cell phone section of the Target website. We will proceed directly, referring to the steps from the first example with Walmart if there are any doubts

Here is the direct link:

Step 1: Let’s determine the fields to extract

a) Product
b) Brand
c) Price

Now, let’s inspect the code level of each of our target fields (review step 2)

Keyboard shortcut to inspect: Ctrl + Shift + c (Windows) or Alt + Command + i(macOS)

Step 2: Inspect Code

i ) Product

We locate the code and tags. We copy and keep the code to later incorporate it into the ChatGPT prompt (if in doubt, review step 02 of the first Walmart example)

<a href=”/p/tracfone-prepaid-apple-iphone-se-2nd-gen-64gb-cdma-black/-/A-82040163#lnk=sametab” aria-label=”Tracfone Prepaid Apple iPhone SE 2nd Gen (64GB) CDMA — Black” class=”styles__StyledLink-sc-vpsldm-0 styles__StyledTitleLink-sc-14ktig2–1 fajhWk gkIDAW h-display-block h-text-bold h-text-bs” data-test=”product-title”>Tracfone Prepaid Apple iPhone SE 2nd Gen (64GB) CDMA — Black</a>

ii) Brand

<a href=”/b/apple/-/N-5y3ej” class=”styles__StyledLink-sc-vpsldm-0 lnixiM h-text-sm h-text-grayDark” data-test=”@web/ProductCard/ProductCardBrandAndRibbonMessage/brand”>Apple</a>

iii) Price

<div class=”h-padding-r-tiny”><span class=”” data-test=”current-price”><span>$189.99</span></span></div>

Step 3: Save the HTML File

Save the page to be scraped as an HTML file (review Step 3 from the Walmart example)

Step 4: Upload HTML File + Generate Prompt

We are going to construct the prompt, but unlike the previous example, we will include the cellphone brand field (see Step 4 of the Walmart example).

Load the HTML file and add the code for each of the fields to be scraped (product name, brand and price)

Prompt:
from the HTML file, extract the name of product, brand, price, Put the data on a table and export it to a CSV file. Extract all products

Here is the element of one product:
<a href=”/p/tracfone-prepaid-apple-iphone-se-2nd-gen-64gb-cdma-black/-/A-82040163#lnk=sametab” aria-label=”Tracfone Prepaid Apple iPhone SE 2nd Gen (64GB) CDMA — Black” class=”styles__StyledLink-sc-vpsldm-0 styles__StyledTitleLink-sc-14ktig2–1 fajhWk gkIDAW h-display-block h-text-bold h-text-bs” data-test=”product-title”>Tracfone Prepaid Apple iPhone SE 2nd Gen (64GB) CDMA — Black</a>

Here is the element of the brand:
<a href=”/b/apple/-/N-5y3ej” class=”styles__StyledLink-sc-vpsldm-0 lnixiM h-text-sm h-text-grayDark” data-test=”@web/ProductCard/ProductCardBrandAndRibbonMessage/brand”>Apple</a>

Here is the element of the price:
<div class=”h-padding-r-tiny”><span class=”” data-test=”current-price”><span>$189.99</span></span></div> In case the price of the product is missing, leave that price as a null data

Results

Download and open the CSV file

And the results were great, we were able to scrape all the data from the Target website

3) Amazon

In this final example, we will perform web scraping for Kindle books. This might be interesting to see which books are most popular, and then to create stories with different trending themes using ChatGPT

Here’s the link:

Step 1: Let’s determine the Fields to Extract

a) Product or Title
b) Author
c) Price

Step 2: Inspect Code

i) Product or Title:

We locate the code and tags. We copy and keep the code to later incorporate it into the ChatGPT prompt (if in doubt, review Step 02 of the first Walmart example)

The keyboard shortcut to inspect is: Ctrl + Shift + c(Windows) or Alt + Command + i(macOS). You can refer to Step 2 for more details

<span class=”a-size-base-plus a-color-base a-text-normal”>Lessons in Chemistry: A Novel</span>

ii ) Author

<a class=”a-size-base a-link-normal s-underline-text s-underline-link-text s-link-style” href=”/Bonnie-Garmus/e/B09964CPY4?ref=sr_ntt_srch_lnk_1&amp;qid=1690568130&amp;sr=8–1">Bonnie Garmus</a>

iii) Price

Let’s note that we are only going to extract the integer part of the price for this example

<span class=”a-price-whole”>14<span class=”a-price-decimal”>.</span></span>

Step 3: Save HTML File

We save the web page to be scraped as an HTML file. To do this, we use the shortcut Ctrl + S on the page we want to save. Let’s not forget to save the file in HTML format (check the details in Step 3 of the Walmart example)

Step 4: Upload HTML file + Generate Prompt

Now, let’s construct the prompt based on the fields we want to extract from the Amazon webpage, specifically from their Kindle books section. In this case, we want to extract the title, author, and prices.

Next, we load the HTML file and add the code to scrape each of the desired fields (title, author and price)

Prompt:
from the HTML file, extract the name of product, author and price, Put the data on a table and export it to a CSV file.

Here is the element of one product:
<span class=”a-size-base-plus a-color-base a-text-normal”>Lessons in Chemistry: A Novel</span>

Here is the element of the author:
<a class=”a-size-base a-link-normal s-underline-text s-underline-link-text s-link-style” href=”/Bonnie-Garmus/e/B09964CPY4?ref=sr_ntt_srch_lnk_1&amp;qid=1690568130&amp;sr=8–1">Bonnie Garmus</a>

Here is the element of price:
<span class=”a-price-whole”>14<span class=”a-price-decimal”>.</span></span>

In case the price of the product is missing, leave that price as a null data

Let’s see that the prompt in the examples we have seen has the same structure

Results

We download the CSV file

And we have succeeded!

Summary and Recommendations

  1. If we try to directly put the URL into ChatGPT, even with Code Interpreter activated, it won’t be able to perform Web Scraping. For that reason, we download the page to be scraped in HTML
  2. ChatGPT may not initially recognize the tags of the fields to extract and it may give us erroneous information. At that point, I recommend opening another chat and running the prompt again
  3. We should keep in mind that Code Interpreter uses Python and libraries such as BeautifulSoup for Web Scraping
  4. This method does not aim to replace traditional Web Scraping, however, it will save us time and lines of code
  5. What we’ve seen in the story through the 03 examples of Web Scraping is geared towards both people who work in programming as well as people who have little or no knowledge in this field
  6. It is interesting what we can accomplish through Web Scraping, as I mentioned above, we could focus on dropshipping, create Kindle books taking into account the best-selling books, analyze competitors’ prices, track certain products, and much more

This complete guide is intended for people who want to have an alternative for doing Web Scraping using ChatGPT. It’s not necessary to have prior programming knowledge, just curiosity and patience. See you in a next story, blessings!

--

--

DatHero

Content Creator and Passionate about Teaching How to Use Technologies in Our Everyday Life | Data Scientist