Optimizing Your Airbnb Trip (I) | Defining the Problem and Gathering the Right Tools

Kai Lu
Kai Lu
Jul 2 · 7 min read
Taken on trip to Cancún

The purpose of this post is to give an initial overview of the problem I will be trying to solve in the next weeks as well as the tools and methods I would like to use to do so. I will try and separate my project into several bite-sized posts that are easy to digest. At this moment, the project is not finished and I would love any feedback and advice!

The Problem

I remember browsing through Airbnb listings in March in preparation for a Spring Break trip with some friends. It was a time-consuming process as it was difficult to pick the “optimal” listing from 17 pages of listings where the average rating was 4.7. Though we knew what price range we expected to pay, it ended up being quite a frustrating process to reach the final decision. If I recall correctly, it went something like this:

  1. Compile a list of “promising” listings on a Google Doc
  2. Plan a time and location to meet (which is way more difficult than it seems)
  3. Argue over which listing was the “best”
  4. Decide that it is probably better to narrow down the list than to argue
  5. Repeat Steps 3 and 4

You get the idea. It took us ages to reach an agreement and we spent much of our time comparing picture to picture which is at times subjective and inconsistent (Some listings may’ve hired professional photographers).

What if we could create a model that after considering our budget and logistical constraints (e.g. price, # of bedrooms, etc.) can rank listings according to a holistic score composed of key metrics that we all cared about?

Now a tool like that would speed up our decision process by quite a bit! At the very least, it would provide us an already curated and filtered list to have a polite and civil discourse about!

The Process

1. Collecting the data

Before we can build a model, it seems that we may need some data first. So let’s see if we can get our data through an API from Airbnb…

How unfortunate!

It looked to me that the only way to get our much needed data is through scraping it directly from the listings. Now, this turned out to be a more complicated task than I anticipated.

Since Airbnb uses JavaScript for rendering, common Python web-scraping libraries like Requests, BeautifulSoup, or Scrapy on its own don’t work as they simply cannot process or interpret JavaScript. You may’ve heard about using Selenium for JavaScript-rendered sites, but according to Ahmed Rafik (Instructor of the highest-rated web-scraping course on Udemy), such approach is incorrect.

“More than often I see students search for something called Selenium to scrape websites that uses JavaScript to render their content. And that’s totally wrong because Selenium was note created to scrape websites. Instead, it’s an automation framework.” — Ahmed

The solution he proposed was to use a plugin created by the Scrapy team that integrates nicely with Scrapy called Splash. This is what I used for my scrapy spider. Combined with scrapy , I was able to crawl Airbnb listings quickly and in a systemic way.

Note: When web-scraping, it’s best to always obey the robots.txt file and to not hit the website too frequently. You can toggle these in thesettings.py file in every scrapy project.

Scrapy’s speed with i3 processor and 8GB RAM

I will be going in depth with building a scrapy spider to scrape Airbnb in a separate post, which is now here. Note that this is more of an introductory tutorial.

You can also check out the spider (more complex) I made specifically for my project here:

It’s important to note that web-scraping is not a reliable way of getting data. Since Airbnb changes the layout/architecture of its site frequently, I cannot guarantee that the spider will always be working but I’ll try my best.

2. Cleaning the Data & Building the model

After concatenating my data files together from multiple scrapes and removing duplicate listings, I needed to do some data cleaning and preparation before the model building. This process included removing unnecessary, low-variance, high-correlation columns and dealing with missing values. I will go into more detail in my next post.

As for building the model, this is unlike any supervised machine learning problem. There is in fact no target variable to predict! Additionally, there is no way I’ll be able to measure the performance of this model (since we don’t have feedback from using the model).

Even so, the basics of optimization still holds. Like any optimization algorithm, there are three basic elements:

  • Variables: These are the parameters (e.g. price, # of reviews, rating) that I will use to build my holistic score.
  • Constraints: These are the boundaries in which my variables/parameters need to stay within (e.g. 200 < price< 300, bedrooms ≥ 3)
  • Objective Function: This will be the function that needs to be minimized or maximized.

Since performance cannot be measured, I believe it is highly preferable to create an easily-interpreted model. Right now, one of my ideas is adopting a similar logical structure to how we set up maximizing or minimizing problems in Economics. This is a structure heavily used in the Intermediate Microeconomics course I took last Fall. Essentially we minimize/maximize the objective function under certain constraints such as Income to determine the optimal strategy (usually by solving the Lagrangian).

Consider this problem:

Suppose two goods, 1 and 2 and denote by xᵢ the quantity of good i = 1, 2 purchased for consumption. The utility function of our agent is U(x₁, x₂) = 16x₁⁴x₂⁸. If pᵢ is the price per unit of good i, this agent’s utility maximization problem is (where I is the agent’s Income):

Example taken from Prof. Rakesh Vohra’s Lecture Notes

This problem can be solved using the method of Lagrangian multipliers and we could obtain the optimal buying strategy of the agent to maximize his utility under the budget constraint. Though the math is interesting, I won’t be going over it in this post.

Likewise for our problem in determining the best listings, we can define some utility function U(xᵢ, …, xₙ) for the party under multiple constraints.

Here the constraints simply act as filters which is the same as how you can select constraints and filter down the listings for your party’s needs on Airbnb’s website. After creating a list of acceptable listings, we can then rank them in terms of their utility which is determined by U(xᵢ, …, xₙ).

Now creating this utility function is the biggest challenge of this project. My initial thought is to create some normalized weightings of metrics/variables that are indicative of a better Airbnb experience. The criterion for this is at the moment in exploration phase. Some of the many variables that I will consider and easily come to mind are response rate, number of reviews, cleanliness, accuracy, and overall rating. The idea is to combine a hand-picked list of variables indicating success (a good experience) into a holistic score that I can rank listings by. This is a task i still need to put in some more thought in the next weeks. Any recommendations or advice are welcome!

3. Automating the Process

Automation! After creating the spider and the model to rank the listings, I would like to get from inputting my constraints to getting a ranked list of optimal listings in a single-click. I would be interested to look into Flask to see how I can deploy the model on the front-end and possibly even learning Airflow to manage my workflow. There is just so much to learn! More on this later.

Closing Remarks

I am super excited to be working on this project for the next few weeks! I plan to release the next two posts on a weekly basis. Currently, my plan looks like this:

How to Scrape Airbnb Listings with Scrapy and Splash → July 8th [Done: July 13th]

(II) Data Cleaning and Model Creation → July 15th [In Progress]

(III) Automating the Whole Process → Hopefully by end of July!

That it! Thanks for reading and Happy Canada Day! 🇨🇦

Kai Lu

Written by

Kai Lu

Studying Mathematical Economics and Statistics at UPenn | Data Science Intern @Shopify

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade