Web Scraping Booking.com with Beautiful Soup for Hotel Data Analysis
From HTML to CSV: Mastering Web Scraping with Beautiful Soup
Booking.com is one of the most popular online travel agencies in the world, providing travelers with access to millions of hotel and accommodation options across the globe.
Project Objective: The objective of this project is to collect data from Booking.com using web scraping techniques. The main goal is to extract data related to hotels, including prices, ratings, reviews, amenities, and locations. This data can be used in future to identify patterns and trends in customer behavior, such as popular destinations, preferred amenities, and booking patterns.
Importing Libraries
- BeautifulSoup (bs4) is used to extract information from HTML documents
- requests is used to send HTTP requests and receive responses
- pandas is used for data manipulation and analysis
from bs4 import BeautifulSoup
import requests
import pandas as pd
Overview of HTML structure
Understanding the HTML structure of a website is essential for web scraping, as it helps to identify the specific elements that need to be extracted.
In this project I am using London as the destination (link)
It looks like this:
To inspect HTML elements on a web page, you can use the browser’s built-in developer tools. Here’s how to do it in Google Chrome:
- Open Google Chrome and navigate to the web page you want to inspect.
- Right-click on the element you want to inspect and select “Inspect” or press the “Ctrl + Shift + I” (Windows/Linux) or “Cmd + Shift + I” (Mac) keyboard shortcut to open the Developer Tools panel.
- In the Developer Tools panel, you’ll see the HTML source code of the web page. The element you right-clicked on should be highlighted in the Elements tab.
- You can use the Elements tab to navigate the HTML tree and select any element you want to inspect. When you select an element, the corresponding HTML code will be highlighted in the panel, and you can view and edit its properties and attributes in the Styles and Computed tabs.
Using the browser’s developer tools makes it easy to inspect and analyze the HTML structure of a web page, which is useful for web scraping projects.
Getting HTML from a website
To get the HTML from a website with Bootstrap, you can use Python’s requests library to send an HTTP request to the website’s server and retrieve the HTML content.
url = 'https://www.booking.com/searchresults.html?ss=London&ssne=London&ssne_untouched=London&label=gog235jc-1DCAEoggI46AdICVgDaFCIAQGYAQm4ARfIAQzYAQPoAQH4AQKIAgGoAgO4ArDuuaEGwAIB0gIkZmJhYjE4YzAtNDdhMy00MmY1LTk2NWItN2UzOTgyNTk1OWEx2AIE4AIB&aid=397594&lang=en-us&sb=1&src_elem=sb&src=searchresults&dest_id=-2601889&dest_type=city&checkin=2023-05-06&checkout=2023-05-07<fd=6%3A1%3A5-2023%3A&group_adults=2&no_rooms=1&group_children=0&sb_travel_purpose=leisure&selected_currency=USD&soz=1&lang_changed=1'
headers = {
'User-Agent': 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36',
'Accept-Language': 'en-US, en;q=0.5'
}
response = requests.get(url, headers=headers)
After retrieving the page we create a BeautifulSoup object by passing the HTML content and the desired parser (in this case, we’re using the ‘html.parser’ parser provided by BeautifulSoup)
soup = BeautifulSoup(response.text, 'html.parser')
The resulting soup object can be used to navigate the HTML tree and extract data from the web page.
From a list of hotels, we will retrieve the following information:
- Hotel name
- Location
- Price
- Rating
Data Extraction
# Find all the hotel elements in the HTML document
hotels = soup.findAll('div', {'data-testid': 'property-card'})
hotels_data = []
# Loop over the hotel elements and extract the desired data
for hotel in hotels:
# Extract the hotel name
name_element = hotel.find('div', {'data-testid': 'title'})
name = name_element.text.strip()
# Extract the hotel location
location_element = hotel.find('span', {'data-testid': 'address'})
location = location_element.text.strip()
# Extract the hotel price
price_element = hotel.find('span', {'data-testid': 'price-and-discounted-price'})
price = price_element.text.strip()
# Extract the hotel rating
rating_element = hotel.find('div', {'class': 'b5cd09854e d10a6220b4'})
rating = rating_element.text.strip()
# Append hotes_data with info about hotel
hotels_data.append({
'name': name,
'location': location,
'price': price,
'rating': rating
})
Creating a DataFrame
Once you have extracted the desired data from a hotel list using Beautiful Soup, you can create a pandas DataFrame to store and manipulate the data.
hotels = pd.DataFrame(hotels_data)
hotels.head()
Creating a CSV file
hotels.to_csv('hotels.csv', header=True, index=False)
Done!
To summarize, web scraping with Python and Beautiful Soup is a useful technique for extracting data from websites. In this project, I covered how to extract hotel data from Booking.com and create a CSV dataset from it.Thank you for reading this article! I hope it was helpful.