Scraping a Dynamic Website, Without Selenium, Part-I

Published in

Geek Culture

5 min readJul 20, 2022

We, web scrapers, know to use Selenium for scraping a dynamic website. But, after some experience and exploration we find that Selenium is not always necessary.

It is simple to understand. After all all websites use some APIs or Web-hooks. Users interact with websites through GET, POST or any such request. As web scrapers we just want data from the website. And, websites load/send data from/to some database. To use/present that data, websites use APIs. We just have to find those requests, mimic them and save the data.

A few days back, I tried the same method. I had to collect Products’ data from Reservebar. But the challenge was that I had to collect all for a particular location.

On the home page, we can see categories in the header and an ‘Enter Delivery Address’ field just below the title.

So, we have 4 categories, spirits to gifts and accessories. To find products at a particular location we have to click on the location field and enter the address, then it shows a list of addresses and we have to click one to fill in the field, then click on ‘save’. This saves the location and we are able to get products for the location. Doing this adds a tag to the products showing if a product is available at the location or not.

www2.reservebar.com category page showing show more button

When we open a category, it does not show all the products, rather it shows some of its products and a ‘show more’ button. To get more products we have to click on this button until we get all of the products.

Of course, this can be done with selenium, very easily.

But, I did it without selenium.

Solution without Selenium

Here is how I managed it without selenium.

For our purpose, getting all the products from the four categories in the header were enough. So, I just created a list of four categories.

collections = ['spirits', 'wine', 'discover', 'gifts-accessories']

For URL of each category I just appended the names to our main link that is

https://www2.reservebar.com/collections/{category}

This would search for each link rather than clicking each link on the home page.

Using developer tools I was able to find the the request that was triggered on clicking ‘show more’ button. So, I copied that request with its headers and cookies. The cookies helped set the location.

Using the request

So, here is how I clicked the ‘show more’ button without actually opening a browser. Let me break the whole script into parts.

1- Following lines of code include cookies for the request

import requestsfrom bs4 
import BeautifulSoup as bs
import pandas as pdroot_url = "https://www2.reservebar.com" # products' addresses will be appended to this root URLpayload={} # an empty data payload
# headers for our requestheaders = {  'authority': 'www2.reservebar.com',  'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',  'accept-language': 'en-US,en;q=0.9',  'cache-control': 'max-age=0',  'sec-ch-ua': '"Chromium";v="103", ".Not/A)Brand";v="99"',  'sec-ch-ua-mobile': '?0',  'sec-ch-ua-platform': '"Linux"',  'sec-fetch-dest': 'document',  'sec-fetch-mode': 'navigate',  'sec-fetch-site': 'none',  'sec-fetch-user': '?1',  'upgrade-insecure-requests': '1',  'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.53 Safari/537.36',  'Cookie': '__cq_dnt=1; dw_dnt=1; dwanonymous_a40f731d0711af5eb64499a73962349d=acufdphn0DuqzTqpqehaZp0DTc; dwsid=__EYAfXIo7BoEcYW1pQyOh0NcUmbp9wkncr3rR5ZPH6TknOGh6nHxPMEspRdXYKdgSkuZdsQqkrhWpEOf8P0NA==; sid=P67PVuk3mQcAmmYLM16Xkt1Sph6HWOaT9UE'}

Cookies in these headers play the role to setting the location.

2- These lines given below are lists for storing data

# list of categories
collections = ['spirits', 'wine', 'discover', 'gifts-accessories'] # list to store products' links
products_links = []

3- All the lines below are inside a for loop

# for each category in the categories' list
for cat in collections:    # set its URL
    cat_url = f"https://www2.reservebar.com/collections/{cat}"
    # send a GET request to the URL
    cat_res = requests.request("GET", cat_url, headers=headers, data=payload)
    # find the total number of results/products in that category
    soup = bs(cat_res.text, 'html.parser')
    more_results = soup.find('span', class_='results-span').get_text().strip().split(' ')[0].split(',')
    results = int(''.join(more_results))    # print the total number of products in that category
    print(f"link: {cat_url}", " results: ",results)

First I used the following line to get all the products at once.

show_more_link = f"https://www2.reservebar.com/on/demandware.store/Sites-reservebarus-Site/default/Search-UpdateGrid?cgid={cat}&start={start}&sz={size}&selectedUrl=https%3A%2F%2Fwww2.reservebar.com%2Fon%2Fdemandware.store%2FSites-reservebarus-Site%2Fdefault%2FSearch-UpdateGrid%3Fcgid%3D{cat}%{results}start%3D{1}%{results}sz%3D{results}"

But it was still unable to load full source. So I put sent this request in parts i.e rather than loading all the products at once I loaded products in greater quantity than being loaded by the actual script. I loaded 100 products until all the products were loaded. And, for this purpose I used a nested ‘while loop’ inside the parent ‘for loop’.

The following lines are inside the main ‘For loop’

    # counter for the number of product to start from, to load from the API
    start = 1
    # number of products to load in a single request
    size = 100    # the while loop
    while True:        # if start position becomes equal to greater than the total number of products, break the while loop and move forward to next category
        if start>=results:
            break
        # the request to show more products, with variable paramaters indicated inside curly brackets
        show_more_link = f"https://www2.reservebar.com/on/demandware.store/Sites-reservebarus-Site/default/Search-UpdateGrid?cgid={cat}&start={start}&sz={size}&selectedUrl=https%3A%2F%2Fwww2.reservebar.com%2Fon%2Fdemandware.store%2FSites-reservebarus-Site%2Fdefault%2FSearch-UpdateGrid%3Fcgid%3D{cat}%{size}start%3D{start}%{size}sz%3D{size}"
                # send the GET request to load more products
        response = requests.request("GET", show_more_link, headers=headers, data=payload)        # from the response get all the products found which do not have a tag indicating not available at the location
        soup = bs(response.text, 'html.parser')
        all_products = soup.find_all('div', class_="product-tile")
        
        for prod in all_products:
            if "Not available in IL" in prod.get_text():
                ...
            else:
                a = prod.find('a')
                products_links.append(root_url+a['href'])
                # print(a['href'])        # increment start number by 1
        start += 100
     print("number of products' links: ", len(products_links))

Here ends the ‘for loop’ and products’ links scraping.

In this way I scraped nearly all the products of the categories and then saved them in a .CSV file for a record and latter use.

Here is the complete code of this application.

script to perform click events and to scrap reservebar products without using selenium, by Irfan Ahmad

Latter, I did not have time to check for more requests. So I just used Selenium to scrap each product’s information for the location. Because these links were using some different kinds of requests and in combination. That’s why to save my time from here I just implemented what I already knew.

Conclusion

We can monitor Website’s requests for some events from developer tools and can copy them to to get data rather than using Selenium for clicking or other such request based events. But, it’s not always applicable.

Learning never ends, so I shall continue to learn, experiment and share.

If you are new, I have also written on scraping dynamic websites with selenium Part-1, Part-II, Part-III, Part-IV and Part-V.

Scraping a Dynamic Website, Without Selenium, Part-I

Solution without Selenium

Using the request

Conclusion

Written by Irfan Ahmad