Seattle AirBnb Price Predictions

Predicting AirBnb listing price and other insights from AirBnb Seattle data

Akhil Anurag
Akhil Anurag
5 min readJul 22, 2019

--

Photo by Roberto Nickson on Unsplash

Overview:

AirBnb is a tech global company that offers lodging, home-stays and tourism experience. Since 2008, millions of hosts and travelers choose to use AirBnb to list and book unique accommodations anywhere in the world.

In this article, I will analyze Seattle house listings data provided by AirBnb to answer questions related to listings price. I will try to explore -

  1. Can we predict listing price using the listing information like host & property information?
  2. Can we find out the important drivers to listing price?
  3. Are the factors influencing price also influences customer reviews example-Are reviews driven by neighborhood if price is driven by neighborhood?

Answering these questions, will help new host to decide on the listing price and can be used by AirBnb to benchmark different properties on price

Introduction to Data :

The data covers 3,818 listings on AirBnb in Seattle. It provides information on prices, review scores, host, availability, amenities and locations in the year 2016.

Along with the listing data, we are also provided with calendar and review information.

Putting the Data Together:

As listings data contains lot of features, the first task is to treat the data to have only relevant information. I started by looking at the missing values for each variables and keeping only the variables with less than 20% missing value. Also, dropped variables which either have very low variance (mostly id variables and constant value features) or very high variance (variables with lot of levels)

As we are interested in studying price variations, lets see it’s distribution

There is a large price range for rentals. Apartment owners charge as much as $1000 a day, but the majority of homeowners charge between $50 to $200

Part 1: Can we predict listing price using the listing information like host & property information?

Before we put the data into models for training, we first perform different cleaning method on numeric and categorical variables, by filling in null value, scaling and encoding data

Using the correlation plot between different numeric variables, we can observe price being strongly related to property space i.e. bedrooms, accommodates etc.

For the model, only the host and property information has been used. Excluded all the ratings variables from the model, as I don’t want to predict price with ratings as a predictor. Listings rating, ideally should be decided on all the factors including price eventually.

We applied Linear, SupportVector and RandomForest Regressors on the prepared data to see if any of these models is able to predict price. Of the three models, the best result is with Random Forest which is able to capture 63% of variance on the test data.

Even though RandomForest is far the best model, the model can be improved by:

  • Combine and construct some of the new and more meaningful feature.
  • Include some of the variables that I exclude in the analysis

Part 2: Can we find out the important drivers to listing price?

From the model built in part 1, we can find which all are the key drivers for listings price.

The top 5 drivers for listings price, from host and property information are:

  1. Host Listing Count : This could be proxy indicator for credible and experienced host. Host with more listings would know have the experience to get the best listing price.
  2. Neighborhood Group : Prices are also driven by the neighborhood location, which looks intuitive.
  3. Accommodates : This is a property information, as the space increases so would the price if everything else is constant
  4. Bedrooms : Similar to previous one
  5. Guests Included /Extra people: Similar to previous one

Part 3: Are the factors influencing price also influences customer reviews?

In this section, we explored the relationship between the factor identified as important for price like neighborhood, accommodates etc. and the customer reviews.

The though behind this is to see if customer having positive comments/sentiments are in a certain segment of price.

We mapped the reviews data to the listing information and derived polarity scores( high is positive sentiment, low is negative sentiment)on the comments provided by the customers.

Studying, the relationship of polarity scores by different drivers is not very clear. for ex — polarity scores of customers comments looks to be somewhat related by property information but not at all related to the neighborhood group.

Conclusion :

Whenever a host is listing a property, price of the listings is going to play a very important role in deciding if the host get returns on their investment.

Analyzing the Seattle data, we can conclude:

  1. AirBnb listing price can be predicted using host and property features. Although we are able to get a model which can predict listing price, but there is room to improve this model further to support hosts and homeowners in deciding the best price
  2. Neighborhood groups play an important role in price determination along with property space
  3. Customer comments on the property is not determined by the factors which determines listings prices.

To see more about the analysis, see the link to my Github available here

--

--

Akhil Anurag
Akhil Anurag

Building Scalable AI/ML Products|Product Manager|Data Scientist|Writes for Data Science Practitioners.