Automate booking.com search using Python — Seamless Cloud Blog

Andrey
Seamless Cloud
Published in
7 min readAug 7, 2020

Hey!

In this post, you will learn how to use Python to automate the routine work of searching for hotels on booking.com. We will help you get the best deal for your next vacation. I’ve recently created this script to help pick the best option for my summer trip, so I decided to share it with the community.

This article is useful for you in case:

  1. You’re learning Python and want to apply your skills to some real-world problems.
  2. You already know Python and want a boilerplate code to monitor booking.com, so you don’t have to write it yourself.

The script we’re going to write works best if it runs on schedule automatically without you triggering it manually. There are multiple options to achieve it (you can set up a server and create a cron job, for example). I recommend using seamlesscloud.io, a tool I’m developing right now, and it’s built specifically for this purpose.

We’re going to write a script that finds three cheapest hotels with 9+ rating and show us the price for two rooms, four people total (cause I was traveling with friends). Okay, for those of you who just want the code, here it is:

import datetime
import urllib

import requests
from bs4 import BeautifulSoup

session = requests.Session()

REQUEST_HEADER = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 "
"Safari/537.36"}
BOOKING_URL = 'https://www.booking.com'

# https://core.telegram.org/bots
BOT_API_KEY = 'your-api-key'
CHANNEL_NAME = '@booking_monitoring'


class Hotel:
raw_html = None
name = None
score = None
price = None
link = None
details = None

def __init__(self, raw_html):
self.raw_html = raw_html
self.name = get_hotel_name(raw_html)
self.score = get_hotel_score(raw_html)
self.price = get_hotel_price(raw_html)
self.link = get_hotel_detail_link(raw_html)

def get_details(self):
if self.link:
self.details = HotelDetails(self.link)


class HotelDetails:
latitude = None
longitude = None

def __init__(self, details_link):
detail_page_response = session.get(BOOKING_URL + details_link, headers=REQUEST_HEADER)
soup_detail = BeautifulSoup(detail_page_response.text, "lxml")
self.latitude = get_coordinates(soup_detail)[0]
self.longitude = get_coordinates(soup_detail)[1]


def create_url(people, country, city, date_in, date_out, rooms, score_filter):
url = f"https://www.booking.com/searchresults.en-gb.html?selected_currency=USD&checkin_month={date_in.month}" \
f"&checkin_monthday={date_in.day}&checkin_year={date_in.year}&checkout_month={date_out.month}" \
f"&checkout_monthday={date_out.day}&checkout_year={date_out.year}&group_adults={people}" \
f"&group_children=0&order=price&ss={city}%2C%20{country}" \
f"&no_rooms={rooms}"
if score_filter:
if score_filter == '9+':
url += '&nflt=review_score%3D90%3B'
elif score_filter == '8+':
url += '&nflt=review_score%3D80%3B'
elif score_filter == '7+':
url += '&nflt=review_score%3D70%3B'
elif score_filter == '6+':
url += '&nflt=review_score%3D60%3B'
return url


def get_search_result(people, country, city, date_in, date_out, rooms, score_filter):
result = []
data_url = create_url(people, country, city, date_in, date_out, rooms, score_filter)
response = session.get(data_url, headers=REQUEST_HEADER)
soup = BeautifulSoup(response.text, "lxml")
hotels = soup.select("#hotellist_inner div.sr_item.sr_item_new")
for hotel in hotels:
result.append(Hotel(hotel))
session.close()
return result


def get_hotel_name(hotel):
identifier = "span.sr-hotel__name"
if hotel.select_one(identifier) is None:
return ''
else:
return hotel.select_one(identifier).text.strip()


def get_hotel_score(hotel):
identifier = "div.bui-review-score__badge"
if hotel.select_one(identifier) is None:
return ''
else:
return hotel.select_one(identifier).text.strip()


def get_hotel_price(hotel):
identifier = "div.bui-price-display__value.prco-text-nowrap-helper.prco-inline-block-maker-helper"
if hotel.select_one(identifier) is None:
return ''
else:
return hotel.select_one(identifier).text.strip()[2:]


def get_hotel_detail_link(hotel):
identifier = ".txp-cta.bui-button.bui-button--primary.sr_cta_button"
if hotel.select_one(identifier) is None:
return ''
else:
return hotel.select_one(identifier)['href']


def get_coordinates(soup_detail):
coordinates = []
if soup_detail.select_one("#hotel_sidebar_static_map") is None:
coordinates.append('')
coordinates.append('')
else:
coordinates.append(soup_detail.select_one("#hotel_sidebar_static_map")["data-atlas-latlng"].split(",")[0])
coordinates.append(soup_detail.select_one("#hotel_sidebar_static_map")["data-atlas-latlng"].split(",")[1])
return coordinates


def send_message(html):
resp = requests.get(f'https://api.telegram.org/bot{BOT_API_KEY}/sendMessage?parse_mode=HTML&'
f'chat_id={CHANNEL_NAME}&'
f'text={urllib.parse.quote_plus(html)}')
resp.raise_for_status()


def send_location(latitude, longitude):
resp = requests.get(f'https://api.telegram.org/bot{BOT_API_KEY}/sendlocation?'
f'chat_id={CHANNEL_NAME}&'
f'latitude={latitude}&longitude={longitude}')
resp.raise_for_status()


def main():
search_params = {
'people': 4,
'rooms': 2,
'country': 'United States',
'city': 'New York',
'date_in': datetime.datetime(2020, 8, 31).date(),
'date_out': datetime.datetime(2020, 9, 2).date(),
'score_filter': '9+'
}

print(f"Searching hotels using parameters: {search_params}")
result = get_search_result(**search_params)
top_3 = result[:3]
send_message(
f'Here are your search results for {search_params["people"]} people, {search_params["rooms"]} rooms in '
f'{search_params["city"]}, {search_params["country"]} for dates from {search_params["date_in"]} to '
f'{search_params["date_out"]} with {search_params.get("score_filter", "any")} rating')
for hotel in top_3:
send_message(f'<a href="{BOOKING_URL}{hotel.link}">{hotel.name} </a> ({hotel.score})\n'
f'Total price: {hotel.price}')
hotel.get_details()
send_location(hotel.details.latitude, hotel.details.longitude)
print('Notifications were sent successfully')


if __name__ == '__main__':
main()

You can also find the full code here.

Now for those of you who would like some explanations, let’s dive into details.

Website scraping

Let’s look into the get_search_result function. Inside you'll notice that we're creating the URL first.

def create_url(people, country, city, date_in, date_out, rooms, score_filter):
url = f"https://www.booking.com/searchresults.en-gb.html?selected_currency=USD&checkin_month={date_in.month}" \
f"&checkin_monthday={date_in.day}&checkin_year={date_in.year}&checkout_month={date_out.month}" \
f"&checkout_monthday={date_out.day}&checkout_year={date_out.year}&group_adults={people}" \
f"&group_children=0&order=price&ss={city}%2C%20{country}" \
f"&no_rooms={rooms}"
if score_filter:
if score_filter == '9+':
url += '&nflt=review_score%3D90%3B'
elif score_filter == '8+':
url += '&nflt=review_score%3D80%3B'
elif score_filter == '7+':
url += '&nflt=review_score%3D70%3B'
elif score_filter == '6+':
url += '&nflt=review_score%3D60%3B'
return url

This is the URL that you would see in your browser if you just manually search for hotels. We just programmatically insert filters and generate the URL from code, that’s it.

Next, we simply make a GET request to the URL and receive the result.

response = session.get(data_url, headers=REQUEST_HEADER)

Then we use the BeautifulSoup library to parse the response.

soup = BeautifulSoup(response.text, "lxml")

BeautifulSoup is the most popular Python library used to make sense of web pages (in our case, booking.com web page). The library helps convert the text representation of the page into an object with attributes and search methods that you can use in your code.

This is how you can get the list of hotels from the page:

hotels = soup.select("#hotellist_inner div.sr_item.sr_item_new")

What does this mean? What is this weird string we select? If you open the URL we generate in the browser and use developer tools (I use Chrome), you can see this:

hotellist_inner is an id of HTML element. It is highlighted in my browser, and I can see that it corresponds to the list of hotels in the search result.

div.sr_item.sr_item_new means div element with classes sr_item and sr_item_new.

And this is the example of such an element. We’re effectively selecting all div elements that have classes sr_item and sr_item_new and are located inside the element with id = hotellist_inner.

Our next step is to iterate over hotels and parse each one of them individually.

for hotel in hotels: 
result.append(Hotel(hotel))

If you look into the __init__ method of the Hotel class, you'll see that we use a bunch of functions to get various information from the hotel's element on the web page. I won't go into details here, but they work in a similar way to the logic that selects a hotel element, which I described above.

Sending messages to Telegram

After we found hotels, we need some way to be notified. In this section, I will explain how the code that sends information to my Telegram app works. Telegram is the messenger I use. You may use a different one. However, it should also be possible to send a message from Python. Most of the messengers nowadays have API for bots. You can read more about Telegram bots here.

Please follow the documentation and create your bot. The easiest way to receive messages from the bot is to create a public channel and then add the bot into this channel. Why public? Because then the bot only needs the name of the channel to send messages there. There are ways to send messages to a private channel or to you directly, but they require more steps, for our purposes public channel is the best approach.

Now in order to send a message to this channel all I need is to declare two constants:

BOT_API_KEY = 'you-api-key' CHANNEL_NAME = '@booking_monitoring'

BOT_API_KEY you'll get after you create your bot. CHANNEL_NAME is simply the name of a public channel. Please use the name of the channel you've created.

The code to send a message is a simple get request to the Telegram bot API.

def send_message(html):
resp = requests.get(f'https://api.telegram.org/bot{BOT_API_KEY}/sendMessage?parse_mode=HTML&chat_id={CHANNEL_NAME}&text={urllib.parse.quote_plus(html)}')

For each hotel I also send its location like this:

def send_location(latitude, longitude):
resp = requests.get(f'https://api.telegram.org/bot{BOT_API_KEY}/sendlocation?chat_id={CHANNEL_NAME}&latitude={latitude}&longitude={longitude}')

As a result, this is what I get in my Telegram app:

Pretty convenient, since I get the general information available right in my messenger, hotel’s location that I can explore right away and the link to booking.com with all other details.

Putting it all together

So, in the end, we have a script that gets the top 3 cheapest hotels based on the search using filters you define (sorting is specified in the function that creates the URL).

result = get_search_result(**search_params) top_3 = result[:3]

Then for each of the 3 hotels, we send a message:

for hotel in top_3:
send_message(f'<a href="{BOOKING_URL}{hotel.link}">{hotel.name} </a> ({hotel.score})\n'
f'Total price: {hotel.price}')
hotel.get_details()
send_location(hotel.details.latitude, hotel.details.longitude)

Making our script work in the background

If this script does not run on schedule in the background, it’s not very useful — you are better off just visiting the booking.come in the browser without using Python and searching manually. There are many ways to put your script to work. CRON is probably the most popular thing for this.

One thing to keep in mind is that you don’t want to run this script on your local machine since when you close your laptop or turn off your desktop — it will stop, and you won’t get any notifications. You can spin up a server somewhere and use CRON there. Or you can…

Warning, advertisement below. Sorry.

Or you can use seamlesscloud.io — a tool I’m developing with a couple of other engineers. It has one thing it does well — run Python scripts on a schedule. Feel free to check it out.

If you have any issues, please shoot me an email, and I’ll try to help you. You can also find the full code used in this post here.

You can read more of our blog here.

Originally published at https://blog.seamlesscloud.io on August 7, 2020.

--

--

Andrey
Seamless Cloud

Software Engineer with a passion for automating routine tasks.