Pricing Strategy Development: Airbnb Market Analysis with Python

Blake Mullen
CodeX
Published in
7 min readAug 22, 2021

How do successful vacation rental companies develop a pricing strategy when planning on entering a new market or adding an additional property to an existing portfolio?

Image Captured by Author

Data analysis of course!

In this article I will walk through:

  1. How to obtain raw and complete market data from Airbnb using Python
  2. How to clean the data for manipulation with Pandas
  3. How to establish competitive unit pricing based on market-rate percentiles

See the project code on my Github here.

Market Analysis

Knowing how to accurately position your product within a market is imperative to running a successful business. Prior to establishing a pricing strategy, the rental company will want to conduct a market analysis to see what other industry members are charging. This allows them to gain an understanding of the pricing landscape in their territory and to create a data-driven strategy.

Oftentimes, property managers will purchase data via third-party vendors. While it can be advantageous to outsource data acquisition in some respects, the upfront cost can be a barrier to entry for some individuals or companies, and the figures provided might not always be as reliable as asserted by the vendor. Who knows if the data is up-to-date, contains errors, or is inaccurately skewed by irrelevant outliers?

When possible, go straight to the source.

Data Acquisition

In an effort to access curated, raw data from Airbnb on-demand, I decided to build a web scraper using Python that would allow full autonomy over data acquisition.

I was in France while writing the scraper and was curious about the total number of one-bedroom rentals available in my area, Deauville, between 12/23/21 - 01/02/21 and their pricing quartiles. A simple search on Airbnb returned 300 listings, but intuition told me there were probably far more than that.

Separating the search results by price range allowed me to find the breaking point of 300 listings. I found the price ranges (i.e. ≤60, 60–75…180–290, 290≤) that would result in slightly less than 300 results so as to not miss any data points in each query.

I then created a web scraper in Python using a Scrapy spider and ran it for each price range to acquire data on all available listings. This created eight JSON files containing ≤300 listings.

See the source code for the spider here.

The eight .json files.

Data Cleaning

So now that we have the data, we will want to get it into a format where it can be easily manipulated. See the full notebook here.

The eight JSON files are read into their own dataframe, each dataframe is put into a list called “dfs”, and using pd.concat, drop_duplicates() and reset_index(drop=True) I create a dataframe called “listings” with 2209 unique observations.

In order to facilitate aggregations, we must get the data into a type where it can be easily manipulated. I start with the price variable by removing the “€” with regular expressions and change the remaining value into an integer using “astype(int)”.

Moving onto the bedrooms variable, I replace “Studio” with “0”, remove “bedroom(s)” using regular expressions, and change the remaining value into an integer.

Now if we were interested in other variables such as guests, bathrooms, or beds, we could perform the same actions on them to easily change their types into integers or floats. But we are only interested in the bedrooms and price variables in this study.

So now that the price and bedrooms variables have been cleaned up, we can create a boxplot to see the price spread of one-bedrooms in Deauville.

We can see that there are more than a few outliers. In order to get the most out of our data, we are going to inspect what is going on with listings that have a price greater than 300 €.

We can see that while they may have a relatively high listing price, the units at index locations: [2165, 2168, 2204] appear to be normal listings and we will include them in our data. However, the units at index locations: [2197, 2201, 2206] seem to be listings that are renting out multiple rooms in a hotel, which tells us that these listings are in fact not one-bedrooms and we should exclude them from our study. We use “.drop” to remove them.

Taking another look at our data through a boxplot after removing the irrelevant outliers, we can see that our spread is a bit more normalized.

Boxplot of price spread of 1 bedroom units in Deauville.

Now we utilize one_br.shape to see how many unique one-bedroom observations we have in our “one_br” dataframe.

We can see that there are 939 unique observations of one-bedroom units.

Market Percentiles

As a short-term rental property investor, knowing the market rate percentiles of rental units within your category gives you an incredible advantage. You don’t want to be selling yourself short by leaving money on the table or overcharging and pricing yourself out of the market.

Not all products are created equal and having an understanding of how a product’s value compares with the value of similar products in a market is the main concept to understand when developing a pricing strategy.

We can easily find any percentile of our price data using “NumPy.percentile”. This function allows the user to compute any “q-th” percentile of the data along a specified axis. In this study, we will use “np.percentile” to find 25 (Q1), 50 (Q2/median), and 75 (Q3) and set them equal to variables “one_br_25”, “one_br_50", and “one_br_75” respectively.

From here we will create a KDE (Kernel Density Estimate) in Seaborn with axis lines representing the different percentiles. Similar to a histogram, the KDE plot is a method for visualizing the distribution of observations in a dataset. However, unlike a histogram which uses discrete bins, a KDE plot smooths the observations with a Gaussian kernel, producing a continuous density.

Now that we have our market percentiles chart, we can conduct sentiment analysis to find out more about each price range. Where are they located? How many bathrooms do they have? What are the amenities? What are guests saying in the reviews? This can help us to create a profile for each price range.

Generally:
● Lower Quartile (<25%): Low-cost provider, lowest price, inconvenient location, possibly new to the market, possibly a shared bedroom/bathroom, poor amenities/services.
● 25%-50%: Focused low-cost provider/Integrated low-cost differentiator, value price, convenient location, modest amenities/services
● 50%-75%: Differentiator, above-median price, great location, higher quality amenities/perks/services
● Upper Quartile(>75%): Focused differentiator, highest price, luxury units, best location/view, best amenities/perks/services.

Realistically, quartiles may not always be the best pricing model for every market. Some markets may fit better within quintiles, octiles, or even deciles. This is where your judgment as a subject matter expert of your market comes into play.

Conclusion

It’s important to focus on alternative offerings within your property’s unique category rather than only similar-sized units. Not all units are created equal. Each unit profile has its own segment of customers, with its own level of price sensitivity.

In order to get an understanding of the seasonality of the market, you can perform this study on different date sets: holidays, off-season, regular-season, events, etc. Also, be aware that many companies utilize dynamic pricing tools which change the prices of their properties based on days until arrival and occupancy levels.

Finding market percentiles is just one of the elements of developing a successful pricing strategy. No matter where you establish your initial price, it’s likely that you will need to make real-time adjustments based on the results of the feedback loop and market conditions.

It should be noted here that Airbnb does not provide an official API to the public and that excessive scraping of the site may violate their terms of service which could result in getting blocked. If you attempt to utilize the code I have provided, please understand that you do so at your own risk. I highly encourage you to refer to the Scrapy Documentation on Download Delay prior to implementing your own project.

Further Resources

Development of a Web Scraper:

Brad Traversy’s “Intro To Web Crawlers & Scraping With Scrapy

Thibaud Lamothe’s Scraping with Python 🐍four-part series.

  1. A Gentle Introduction to Web Data Extraction | Scraping With 🐍
  2. How to Parse a Webpage using Selectors | Scraping with 🐍
  3. Let’s Discover the Wonderful World of Scrapy | Scraping with 🐍
  4. Scraping Tutorial on Airbnb’s Website (with Scrapy) | Scraping with 🐍 | by Thibaud Lamothe | Geek Culture

--

--

Blake Mullen
CodeX
Writer for

Data Engineer | Research Analyst | Vacation Rentals | Finance | Pricing Strategy | Revenue Management | Market Analysis