Tutorial: Web Scraping Hotel Prices using Selenium and Python

Everyone would like to pay the least amount of money for the best hotel room — simple isn’t it?

In this tutorial we will show you how to make your own little tracking web scraper for scraping Hotels.com so that you can snag the room you want at the lowest rate. All you need to do is change the City, the Check In and Check Out date and run it on a schedule.

The idea and need being simple, let’s jump straight to the code.

Feel free to copy and modify it to your needs — that is the best way to learn ! You can download the code directly from here .

Pre-Requisites

Below are the frameworks used:

  1. Selenium Web Driver — a framework that is widely using for automating routines in Web Browsers for scraping and testing purposes. Selenium receives commands such as — load a page, click a location or button etc from the scraper. We can also read what is being rendered in the browser.
     Learn how to install Selenium here — http://www.seleniumhq.org/download and install the Python Bindings for Selenium here — http://selenium-python.readthedocs.io/installation.html
  2. LXML for extracting data from the page source HTML. LXML lets you parse HTML / XML tree structure using Xpaths. Read more on XPaths here XPaths and their relevance in Web Scraping. Learn how to install that here — http://lxml.de/installation.html
  3. Python 2.7 available here ( https://www.python.org/downloads/ )

The Code

Open your favorite text editor and modify the line below with — City Name, Check In Date, Check Out Date and you’ll get the top 5 cheapest hotels to stay.

def parse(url): searchKey = "Las Vegas" # Change this to your city checkInDate = '27/08/2016' #Format %d/%m/%Y - Replace date here checkOutDate = '29/08/2016' #Format %d/%m/%Y - replace date here response = webdriver.Firefox()

And run this from the command prompt like this ( if you name the file hotels_scraper.py )

python hotels_scraper.py

This should print the results in the command prompt as a python dictionary.

For Las Vegas the output looks like this

[ { "countryName": "United States", "rating": 2.0, "hotelName": "Antonio Hotel", "address": "229 N Soto Street", "postalCode": "90033", "price": "$241", "locality": "Los Angeles", "region": "CA" }, { "countryName": "United States", "rating": 2.0, "hotelName": "Comet Motel", "address": "10808 Avalon Boulevard", "postalCode": "90061", "price": "$124", "locality": "Los Angeles", "region": "CA" }, { "countryName": "United States", "rating": 2.5, "hotelName": "Arthur Emery", "address": "907 W 17th Street", "postalCode": "90015", "price": "$298", "locality": "Los Angeles", "region": "CA" }, { "countryName": "United States", "rating": 2.5, "hotelName": "LA Ramona Motel", "address": "3211 W. Jefferson Blvd", "postalCode": "90018", "price": "$125", "locality": "Los Angeles", "region": "CA" }, { "countryName": "United States", "rating": 2.0, "hotelName": "Central Inn Motel", "address": "954 E 88th St", "postalCode": "90002", "price": "$185", "locality": "Los Angeles", "region": "CA" } ]

You can modify this code a bit and connect it to chatbots in Slack, Facebook or email etc to find the cheapest room rates.

The code above is good for small scale scraping for fun. If you want to scrape some hotel pricing details from thousands of pages you should read Scalable do-it-yourself scraping — How to build and run scrapers on a large scale

If you need help with some complex web scraping projects let us know and we’ll be glad to help.

Related


Originally published at learn.scrapehero.com on July 28, 2016.