Scraping Hotel Listings From Booking.com With Python and BeautifulSoup

Mohan Ganesan
Jul 16, 2020 · 4 min read

One of the biggest applications of Web Scraping is in scraping hotel listings from various sites. This could be to monitor prices, create an aggregator, or provide better UX on top of existing hotel booking websites.

Here is a simple script that does that. We will use BeautifulSoup to help us extract information and we will retrieve hotel information on Booking.com.

To start with, this is the boilerplate code we need to get the Booking.com search results page and set up BeautifulSoup to help us use CSS selectors to query the page for meaningful data.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaGyIAQGYATG4AQfIAQzYAQHoAQH4AQKIAgGoAgO4AvTIm_IFwAIB;sid=7101b3fb6caa095b7b974488df1521d2;city=-2109472;from_idr=1&;dr_ps=IDR;ilp=1;d_dcp=1'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')

We are also passing the user agent headers to simulate a browser call so we dont get blocked.

Now let’s analyze the Booking.com search results for a destination we want. This is how it looks.

And when we inspect the page we find that each of the items HTML is encapsulated in a tag with the class sr_property_block.

We could just use this to break the HTML document into these cards which contain individual item information like this:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaGyIAQGYATG4AQfIAQzYAQHoAQH4AQKIAgGoAgO4AvTIm_IFwAIB&sid=eae1a774e77c394c5e69703d37e033a3&sb=1&src=searchresults&src_elem=sb&error_url=https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaGyIAQGYATG4AQfIAQzYAQHoAQH4AQKIAgGoAgO4AvTIm_IFwAIB;sid=eae1a774e77c394c5e69703d37e033a3;tmpl=searchresults;city=-2109472;class_interval=1;dest_id=-2109472;dest_type=city;dr_ps=IDR;dtdisc=0;from_idr=1;ilp=1;inac=0;index_postcard=0;label_click=undef;offset=0;postcard=0;room1=A%2CA;sb_price_type=total;shw_aparth=1;slp_r_match=0;srpvid=7df1609ef03a0103;ss_all=0;ssb=empty;sshis=0;top_ufis=1&;&sr_autoscroll=1&ss=Rishīkesh&is_ski_area=0&ssne=Rishīkesh&ssne_untouched=Rishīkesh&city=-2109472&checkin_year=2020&checkin_month=3&checkin_monthday=4&checkout_year=2020&checkout_month=3&checkout_monthday=5&group_adults=2&group_children=0&no_rooms=1&from_sf=1'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')#print(soup.select('.a-carousel-card')[0].get_text())for item in soup.select('.sr_property_block'):
try:
print('----------------------------------------')
print('----------------------------------------')
except Exception as e:
#raise e
print('')

And when you run it…

python3 scrapeBooking.py

You can tell that the code is isolating the cards HTML.

On further inspection, you can see that the name of the hotel always has a sr-hotel__name class… Let’s also get the number of reviews, pricing, and ratings while we are at it.

for item in soup.select('.sr_property_block'):
try:
print('----------------------------------------')
print(item.select('.sr-hotel__name')[0].get_text().strip())
print(item.select('.hotel_name_link')[0]['href'])
print(item.select('.bui-review-score__badge')[0].get_text().strip())
print(item.select('.bui-review-score__text')[0].get_text().strip())
print(item.select('.bui-review-score__title')[0].get_text().strip())
print(item.select('.hotel_image')[0]['data-highres'])

print(item.select('.bui-price-display__value')[0].get_text().strip())

We have also tried to get the Hotel image and link, all crucial pieces of information.

The whole code looks like this..

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaGyIAQGYATG4AQfIAQzYAQHoAQH4AQKIAgGoAgO4AvTIm_IFwAIB&sid=eae1a774e77c394c5e69703d37e033a3&sb=1&src=searchresults&src_elem=sb&error_url=https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaGyIAQGYATG4AQfIAQzYAQHoAQH4AQKIAgGoAgO4AvTIm_IFwAIB;sid=eae1a774e77c394c5e69703d37e033a3;tmpl=searchresults;city=-2109472;class_interval=1;dest_id=-2109472;dest_type=city;dr_ps=IDR;dtdisc=0;from_idr=1;ilp=1;inac=0;index_postcard=0;label_click=undef;offset=0;postcard=0;room1=A%2CA;sb_price_type=total;shw_aparth=1;slp_r_match=0;srpvid=7df1609ef03a0103;ss_all=0;ssb=empty;sshis=0;top_ufis=1&;&sr_autoscroll=1&ss=Rishīkesh&is_ski_area=0&ssne=Rishīkesh&ssne_untouched=Rishīkesh&city=-2109472&checkin_year=2020&checkin_month=3&checkin_monthday=4&checkout_year=2020&checkout_month=3&checkout_monthday=5&group_adults=2&group_children=0&no_rooms=1&from_sf=1'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')#print(soup.select('.a-carousel-card')[0].get_text())for item in soup.select('.sr_property_block'):
try:
print('----------------------------------------')
print(item.select('.sr-hotel__name')[0].get_text().strip())
print(item.select('.hotel_name_link')[0]['href'])
print(item.select('.bui-review-score__badge')[0].get_text().strip())
print(item.select('.bui-review-score__text')[0].get_text().strip())
print(item.select('.bui-review-score__title')[0].get_text().strip())
print(item.select('.hotel_image')[0]['data-highres'])

print(item.select('.bui-price-display__value')[0].get_text().strip())
print('----------------------------------------')
except Exception as e:
#raise e
print('')

And when run.

Produces all the info we need.

In more advanced implementations you will need to even rotate the User-Agent string so Booking.com cant tell its the same browser!

If we get a little bit more advanced, you will realize that Booking.com can simply block your IP ignoring all your other tricks. This is a bummer and this is where most web crawling projects fail.

Overcoming IP Blocks

Investing in a private rotating proxy service like Proxies API can most of the time make the difference between a successful and headache-free web scraping project which gets the job done consistently and one that never really works.

Plus with the 1000 free API calls running an offer, you have almost nothing to lose by using our rotating proxy and comparing notes. It only takes one line of integration to its hardly disruptive.

Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.

  • With millions of high speed rotating proxies located all over the world,

Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.

The whole thing can be accessed by a simple API like below in any programming language.

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

We have a running offer of 1000 API calls completely free. Register and get your free API Key here.

The blog was originally posted at: https://www.proxiesapi.com/blog/scraping-hotel-listings-from-booking-com-with-python.html.php

The Startup

Get smarter at building your thing. Join The Startup’s +724K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store