Web Scraping with ChatGPT Code Interpreter Using Only 01 Prompt (Step-by-Step Tutorial)
Performing Web Scraping may seem like a complicated and demanding task, whether you have programming knowledge or not. However, ChatGPT and the Code Interpreter plugin will save us many lines of code and headaches, as it will be able to extract information from web pages in seconds with just a single prompt.
Next, we will see, through three examples, how we can use ChatGPT to perform Web Scraping in a simple and practical way, all explained step by step
Let’s start…
1) Walmart
We are going to use the “Shop all Back to School” section of the Walmart online store. I am providing the direct link below:
Step 1: Define the Fields to Extract
We need to define the information we wish to extract. This is very important, as it will help us later construct our prompt in ChatGPT
In this case, we will scrape the product name and price
Step 2: Inspect Code
Here we need to define the code for 1 product (as an example to then input it into ChatGPT)
But before we do that, keep the following in mind:
To access the inspect element feature in Chrome, there are two keyboard shortcut options if you’re using Windows:
a) Ctrl + Shift + c
or
b) Ctrl + Shift + i
If you’re using macOS, use:
a) alt + Command + i
or
b) Option + Command + i
With that in mind, we can now inspect the Walmart website. Let’s review the sections:
i) Product Name
In this case, we need to locate the product name within the code to scrape
Let’s copy it and then include it in our prompt. To copy the span tag, we hover over the section, right-click, and the following will appear:
Now we just copy it, and for practical purposes we’ll keep it handy to include in the prompt later
<span data-automation-id=”product-title” class=”normal dark-gray mb0 mt1 lh-title f6 f5-l lh-copy”>Nintendo Kids Super Mario Bros. Mario World 17" Laptop Backpack</span>
ii ) Price
We will do the same for the price field
We’ll keep the copied element of the price field for later use
<div class=”mr1 mr2-xl b black lh-copy f5 f4-l” aria-hidden=”true”>$14.92</div>
If you need to extract more sections from the web page, you should repeat the same steps we performed for the product name and price
Tip: To quickly locate the field to inspect within the code area, simply position your mouse over the field, right-click, and the inspect option will be enabled
Step 3: Save the HTML File
Since we are going to work with the Code Interpreter, we need to attach a file to it. So what we will do is save the page we want to scrape as an HTML file.
Go back to the page and use the keyboard shortcut Ctrl + S (for both Windows and macOS)
Next, save the file in HTML format in a local folder
Step 4: Upload HTML File + Generate Prompt
Now that we have defined the fields to scrape and their code on the web, let’s construct the prompt in ChatGPT
If you haven’t activated the Code Interpreter, let’s follow some instructions. Otherwise, I recommend you skip this part and go directly to constructing the prompt
i) Settings
ii ) Turn on Code Interpreter
After activating the Code Interpreter in ChatGPT, let’s upload the HTML file that we saved in Step 3
Now let’s construct the prompt, taking into account the product name and price, as well as the code for each of these sections (if in doubt, review Step 2)
Prompt: from the HTML file, extract the name of product and price, Put the data on a table and export it to a CSV file
Here is the element of one product:
<span data-automation-id=”product-title” class=”normal dark-gray mb0 mt1 lh-title f6 f5-l lh-copy”>Nintendo Kids Super Mario Bros. Mario World 17" Laptop Backpack</span>Here is the element of the price:
<div class=”mr1 mr2-xl b black lh-copy f5 f4-l” aria-hidden=”true”>$14.92</div>In case the price of the product is missing, leave that price as a null data
In the prompt, we see that there are 04 parts.
In the first paragraph, I specify that I have loaded an HTML file and ask it to scrape the product name and price. After doing this, I request it to export the data into a CSV file
In the second and third paragraphs, I provide ChatGPT with an example of each corresponding structure for the product name and price fields. We see that each product is a span tag and the price is a div tag
In the last paragraph, I ask it to assign null data if it finds null values for the price
It’s important to keep this prompt in mind, as the upcoming examples will have the same structure and will only change the fields and their codes
Results:
Download and open the CSV file
Finally, we have successfully performed web scraping for the products and their respective prices, which were then exported to a CSV file as shown in the table image. Note that the product we used as an example is included!
Bonus
The previous steps allowed us to perform web scraping from the first (01) page of the Walmart website. However, if we want to extract data from the second (02) page, we perform the same previous steps but don’t forget to identify a product within this new page and include it in the prompt as an example
Page 02 of the Back to School section on the Walmart website
i) Product name
<span data-automation-id=”product-title” class=”normal dark-gray mb0 mt1 lh-title f6 f5-l lh-copy”>Minecraft Boys Cliff Goats Graphic T-Shirt, 2-Pack, Sizes 4–18</span>
ii) Price
<div class=”mr1 mr2-xl b black lh-copy f5 f4-l” aria-hidden=”true”>$13.96</div>
Just like with the first page, we need to save the file of this second (02) page in HTML format (if you have any doubts, review Step 03)
Prompt
from the HTML file, extract the name of product and price, Put the data on a table and export it to a CSV file.
Here is the element of one product:
<span data-automation-id=”product-title” class=”normal dark-gray mb0 mt1 lh-title f6 f5-l lh-copy”>Minecraft Boys Cliff Goats Graphic T-Shirt, 2-Pack, Sizes 4–18</span>Here is the element of the price:
<div class=”mr1 mr2-xl b black lh-copy f5 f4-l” aria-hidden=”true”>$13.96</div>In case the price of the product is missing, leave that price as a null data
If you wish to merge both tables into one, you can ask ChatGPT to do the following:
2. Target
In this second example, we will perform Web Scraping from the cell phone section of the Target website. We will proceed directly, referring to the steps from the first example with Walmart if there are any doubts
Here is the direct link:
Step 1: Let’s determine the fields to extract
a) Product
b) Brand
c) Price
Now, let’s inspect the code level of each of our target fields (review step 2)
Keyboard shortcut to inspect: Ctrl + Shift + c (Windows) or Alt + Command + i(macOS)
Step 2: Inspect Code
i ) Product
We locate the code and tags. We copy and keep the code to later incorporate it into the ChatGPT prompt (if in doubt, review step 02 of the first Walmart example)
<a href=”/p/tracfone-prepaid-apple-iphone-se-2nd-gen-64gb-cdma-black/-/A-82040163#lnk=sametab” aria-label=”Tracfone Prepaid Apple iPhone SE 2nd Gen (64GB) CDMA — Black” class=”styles__StyledLink-sc-vpsldm-0 styles__StyledTitleLink-sc-14ktig2–1 fajhWk gkIDAW h-display-block h-text-bold h-text-bs” data-test=”product-title”>Tracfone Prepaid Apple iPhone SE 2nd Gen (64GB) CDMA — Black</a>
ii) Brand
<a href=”/b/apple/-/N-5y3ej” class=”styles__StyledLink-sc-vpsldm-0 lnixiM h-text-sm h-text-grayDark” data-test=”@web/ProductCard/ProductCardBrandAndRibbonMessage/brand”>Apple</a>
iii) Price
<div class=”h-padding-r-tiny”><span class=”” data-test=”current-price”><span>$189.99</span></span></div>
Step 3: Save the HTML File
Save the page to be scraped as an HTML file (review Step 3 from the Walmart example)
Step 4: Upload HTML File + Generate Prompt
We are going to construct the prompt, but unlike the previous example, we will include the cellphone brand field (see Step 4 of the Walmart example).
Load the HTML file and add the code for each of the fields to be scraped (product name, brand and price)
Prompt:
from the HTML file, extract the name of product, brand, price, Put the data on a table and export it to a CSV file. Extract all productsHere is the element of one product:
<a href=”/p/tracfone-prepaid-apple-iphone-se-2nd-gen-64gb-cdma-black/-/A-82040163#lnk=sametab” aria-label=”Tracfone Prepaid Apple iPhone SE 2nd Gen (64GB) CDMA — Black” class=”styles__StyledLink-sc-vpsldm-0 styles__StyledTitleLink-sc-14ktig2–1 fajhWk gkIDAW h-display-block h-text-bold h-text-bs” data-test=”product-title”>Tracfone Prepaid Apple iPhone SE 2nd Gen (64GB) CDMA — Black</a>Here is the element of the brand:
<a href=”/b/apple/-/N-5y3ej” class=”styles__StyledLink-sc-vpsldm-0 lnixiM h-text-sm h-text-grayDark” data-test=”@web/ProductCard/ProductCardBrandAndRibbonMessage/brand”>Apple</a>Here is the element of the price:
<div class=”h-padding-r-tiny”><span class=”” data-test=”current-price”><span>$189.99</span></span></div> In case the price of the product is missing, leave that price as a null data
Results
Download and open the CSV file
And the results were great, we were able to scrape all the data from the Target website
3) Amazon
In this final example, we will perform web scraping for Kindle books. This might be interesting to see which books are most popular, and then to create stories with different trending themes using ChatGPT
Here’s the link:
Step 1: Let’s determine the Fields to Extract
a) Product or Title
b) Author
c) Price
Step 2: Inspect Code
i) Product or Title:
We locate the code and tags. We copy and keep the code to later incorporate it into the ChatGPT prompt (if in doubt, review Step 02 of the first Walmart example)
The keyboard shortcut to inspect is: Ctrl + Shift + c(Windows) or Alt + Command + i(macOS). You can refer to Step 2 for more details
<span class=”a-size-base-plus a-color-base a-text-normal”>Lessons in Chemistry: A Novel</span>
ii ) Author
<a class=”a-size-base a-link-normal s-underline-text s-underline-link-text s-link-style” href=”/Bonnie-Garmus/e/B09964CPY4?ref=sr_ntt_srch_lnk_1&qid=1690568130&sr=8–1">Bonnie Garmus</a>
iii) Price
Let’s note that we are only going to extract the integer part of the price for this example
<span class=”a-price-whole”>14<span class=”a-price-decimal”>.</span></span>
Step 3: Save HTML File
We save the web page to be scraped as an HTML file. To do this, we use the shortcut Ctrl + S on the page we want to save. Let’s not forget to save the file in HTML format (check the details in Step 3 of the Walmart example)
Step 4: Upload HTML file + Generate Prompt
Now, let’s construct the prompt based on the fields we want to extract from the Amazon webpage, specifically from their Kindle books section. In this case, we want to extract the title, author, and prices.
Next, we load the HTML file and add the code to scrape each of the desired fields (title, author and price)
Prompt:
from the HTML file, extract the name of product, author and price, Put the data on a table and export it to a CSV file.Here is the element of one product:
<span class=”a-size-base-plus a-color-base a-text-normal”>Lessons in Chemistry: A Novel</span>Here is the element of the author:
<a class=”a-size-base a-link-normal s-underline-text s-underline-link-text s-link-style” href=”/Bonnie-Garmus/e/B09964CPY4?ref=sr_ntt_srch_lnk_1&qid=1690568130&sr=8–1">Bonnie Garmus</a>Here is the element of price:
<span class=”a-price-whole”>14<span class=”a-price-decimal”>.</span></span>In case the price of the product is missing, leave that price as a null data
Let’s see that the prompt in the examples we have seen has the same structure
Results
We download the CSV file
And we have succeeded!
Summary and Recommendations
- If we try to directly put the URL into ChatGPT, even with Code Interpreter activated, it won’t be able to perform Web Scraping. For that reason, we download the page to be scraped in HTML
- ChatGPT may not initially recognize the tags of the fields to extract and it may give us erroneous information. At that point, I recommend opening another chat and running the prompt again
- We should keep in mind that Code Interpreter uses Python and libraries such as BeautifulSoup for Web Scraping
- This method does not aim to replace traditional Web Scraping, however, it will save us time and lines of code
- What we’ve seen in the story through the 03 examples of Web Scraping is geared towards both people who work in programming as well as people who have little or no knowledge in this field
- It is interesting what we can accomplish through Web Scraping, as I mentioned above, we could focus on dropshipping, create Kindle books taking into account the best-selling books, analyze competitors’ prices, track certain products, and much more
This complete guide is intended for people who want to have an alternative for doing Web Scraping using ChatGPT. It’s not necessary to have prior programming knowledge, just curiosity and patience. See you in a next story, blessings!