How to Scrape Flight Prices with Python using Selenium

In this post, you will learn to scrape Expedia flight prices with Selenium and ChromeDriver

Offir Inbar
Towards Data Science
4 min readApr 19, 2020

--

Photo by Author (Lukla, Nepal)

This article has been written by Roee Freund and Offir Inbar

Background

Stuck at home during the COVID-19 outbreak? Do you love traveling and data science?

You can combine both of them to buy cheap flights. This tutorial will teach you how to create a web scraper that will work for you to find flight prices for any destination you wish.

In this tutorial, you will scrape Expedia, which is known as one of the biggest Online Travel Agencies (OTA) in the world. it’s owned and operated by Expedia Group, ranked first on the list of top-earning travel companies.

If you want to see how the COVID-19 outbreak influenced flight prices, take a look at our Medium post.

It’s important to note that Web scraping is against most websites’ terms of service so your IP address may be banned from the website.

Environment setup

For this Scraper, make sure you are using Python 3.5 version (or newer). Confirm your environment contains the following packages and drivers:

  • Selenium package: a popular web browser automation tool
  • ChromeDriver: allows you to open a browser and perform tasks as a human being.
  • pandas package
  • DateTime package

This TDS post is a great introduction to Selenium.

Scraping Strategy

Before getting into the code, Let’s briefly describe the scraping strategy:

  • Insert into a CSV file the exact routes and dates you want to scrape. One can insert as many routes as you want but it’s important to use these columns names. the scraper works only for Roundtrips.
CSV routes file. dep = departure, arr = arrival.
  • Run the full code.
  • The output for each flight is a CSV file. Its file name will be the date and time that the scraping was performed.
  • All flights of the same route will automatically be located by the scraper in the appropriate folder (the name of the route).

Sounds complicated… it not! Let’s go over an example.

These are scraped routes as defined in the CSV file (the scraper creates those folders automatically):

Scraped routes

Here you can see the multiple scraped dates in Athens — Abu Dhabi route:

Scraped dates: Athens — Abu-Dhabi

The screenshot below demonstrates a single CSV file for each scraping sample for Athens — Abu Dhabi route. Its name represents the date and time that the scraper has been executed.

Single CSV output files for specific dates

We hope the process is clear!

Output Features

The scraper output will include these features:

  • Departure time
  • Arrival time
  • Airline
  • Duration (flight duration)
  • Stops (number of stops)
  • Layovers
  • Price
  • Airplane type
  • Departure Coach
  • Arrival Coach
  • Departure airport name
  • Arrival airport name
  • The exact time the scraping was performed

If it’s not a direct flight, the scraper will give you additional data (airport, airline name, Etc.) for each connection.

Code

In this section, we will go over the main parts of the code. You can find the full code Here

Firstly, as usual, one will need to import the relevant libraries, define the chrome driver and set round trip type.

In the next step, you will define a few functions using Selenium tools in order to find various features on the webpage. The function’s names imply about their role.

Each row (flight) in the CSV routes file goes over the following process:

Its time to collect the data from the web and insert it into the Pandas DataFrame.

Finally, export the data to CSV file directly to the desired folder.

Conclusion

In summary, you learned how to scrape Expedia flight prices with the Selenium package. After understanding the basics, you will be able to build your scraping tool for any other website.

Personally, As encaustics Travelers, we love to use data science to find great deals to amazing destinations around the globe.

Don’t forget to connect with Roee and Offir on Linkedin if you have any questions, comments or concerns.

Good luck!

References

https://en.wikipedia.org/wiki/Expedia

https://www.scrapehero.com/scrape-flight-schedules-and-prices-from-expedia/

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Responses (1)