Data driven insights of Seattle AirBnB listings

Chaitanya Krishna Kasaraneni
The Startup
Published in
7 min readApr 20, 2020
AirBnB logo
AirBnB Logo (Source: Internet)

Introduction

AirBnb is an online marketplace for providing lodging, primarily b&b (bed and breakfast). The company does not own any of the listings on the application; it acts as a broker and receives commissions from each booking. Started in 2008, the company is based in San Francisco, California, US.

The company was conceived after its founders put an air mattress in their living room, effectively turning their apartment into a bed and breakfast, in order to offset the high cost of rent in San Francisco; AirBnB is a shortened version of its original name, AirBedandBreakfast.com

Airbnb’s market share has been on rise dramatically since 2010. 2019 statistics estimate that Airbnb now accounts for up to 20% of the vacation rental industry as a whole. The adaptable nature of listings on AirBnB, from renting an apartment in a metropolitan city to renting a getaway place in redwoods make it a huge competitor in the industry.

Fascinated with AirBnB’s growing usage, in 2016 Murray Cox, an independent digital storyteller, community activist and technologist, started an investigatory website named Inside AirBnB, which reports and visualizes listings data scraped from AirBnB.

In this post, we will be analyzing the AirBnB listings in Seattle. This data can be downloaded from kaggle . This is only a small part of Inside AirBnB data. The original data can be found here.

Space Needle Tower and Seattle Downtown
Seattle Downtown (Source: Flickr)

There are three files in this dataset.

  • listings.csv — includes full descriptions and average review score
  • calendar.csv — includes listing id and the price and availability for that day
  • reviews.csv — includes unique id for each reviewer and detailed comments We’ll be using all three files individually for analysis.

The aim of this project is to perform analyses on AirBnB data and answer the following questions which I constructed after initial assessment of the data:

  1. “What factors highly influence the prices of listings in Seattle?”
  2. “What is the seasonal pattern of prices?”
  3. “What is the relationship of reviews with price?”

To learn more about this analysis, see the link to my Github.

Part 1: Factors that Influence the prices of listings

After looking at the data and the prices of each listing, I was interested in finding out the parameters in the dataset that have an effect on the prices of the listings. For this part, listings.csv data is used.

  1. How Property Type affects prices?

Here you can see the chart that shows the frequency of property types in the listings.

Property Type Frequency

It can be seen that people are more inclined towards listing their entire property than posting private rooms or shared rooms. It can also be observed that property type plays an important factor. Not surprisingly, Apartments and Houses take up an overwhelming majority of all listings, although we do see few instances condominiums and townhouses.

Below is a chart showing prices of listings broken down by property type. This gives us a much better understanding of the price breakdown in Seattle based on property and room types.

Price Distribution over property type vs room type

It can be analyzed that for almost all property types, prices for Entire home/apartment are the maximum.

2. Neighborhood and number of rooms

In this step, we’ll be looking at the prices of listings based on number of bedrooms and the neighborhoods these listings are located in. I plotted a heatmap graph that shows the prices for each listing in a neighborhood with number of bedrooms it has.

HeatMap for variation of prices with number of bedrooms for listings

From the heatmap, it can be observed that there is an increase in the price of listing with increase in number of rooms. The amount increased depends on the neighborhood as well.

Depending on what time of the year it is, these prices vary.

Part 2: Seasonal Pattern Analysis of Prices

In this part we’ll be looking at variation of prices at different times of the year and what time is better for a traveler to visit Seattle. For this part, I used calendar.csv data. The data in calendar.csv looks as below:

   listing_id  date       available  price
0 241032 2016-01-04 t $85.00
1 241032 2016-01-05 t $85.00
2 241032 2016-01-06 f NaN
3 241032 2016-01-07 f NaN
4 241032 2016-01-08 f NaN

To get a better understanding of the prices variation during an year, I plotted a bar graph of average prices for each month.

Seattle AirBnB price trends over one year (2016–2017)

Clearly, we can see that prices were high during the summer months July and August. To analyze the data further, I have extracted the day from given date and checked whether that particular day was a holiday and the reason for holiday. To get the holidays data, I used the ‘holidays’ python library.

Next I calculated the average price for each day in a week and it can be seen in the following plot.

Seattle AirBnB price trends for each day in a week

It can be seen that the prices are fairly high for the weekends than that of weekdays.

Now let's dig into July 2016 and August 2016 data to find the reason behind increase in average prices. I plotted the list of holidays and the average prices for listings in descending order.

Holidays and average price plot

From the graph above, it can observed that the prices were high in July because of presence of US Independence Day (July 4th). As there is a presence of long weekend during July month, the prices were higher.

Price Plot from July 4, 2016 to July 13, 2016

Hence it can be seen that price of a listing increases over weekends that compared to weekdays.

Part 3: Relationship between customer reviews and prices

There are so many factors which contribute towards the price of a listing on AirBnB. While we already have few conclusions for relationship between prices and their dependency on various factors of a listing, let’s analyze if price of a listing is dependent on number of reviews or not and if the relationship exists, how does it affect the prices?

For this part, I used reviews.csv data. As a preprocessing step, I removed the comments which are not in English because I used built-in analyzer in the NLTK Python library to assign polarity score to each comment and this analyzer doesn’t interpret languages other than English.

After calculating polarity, and plotting it on a graph, most of the comments have 0 negative polarity. i.e. most of the comments are either neutral or positive.

Graph for Polarity Range vs Number of comments

Number of reviews versus Price

Out of curiosity I explored if there is any relationship between the number of reviews for a listing and it’s price.

No. of Reviews vs Price
  • From the graph, the reviews were most observed for the listings that have a price range around 100–300. The number quickly declines as the price goes up.
  • It shows that, there is no necessity for an expensive listing to have more reviews. Hence, the Prices have no relation with the Number of reviews.

Most used words in reviews

Here you can see the most used words by customers in reviewing a listing. I used wordcloud library of python to plot these words.

The words like “great host”, “definitely recommended”, “comfortable”, “everything needed” were most used by the customers. These reviews and comments play a big role in attracting the attention of travelers.

Conclusion

In this post, we took a look at analysis of Seattle AirBnB listings data from January 2016 to January 2017.

  1. We gathered some of the factors that influence the prices of listings, which showed that number of rooms, neighborhood locality, and type of listing played a major role.
  2. We then looked at the seasonal patterns in the prices. This showed that prices of listings go up during weekends and especially in July and August months.
  3. Finally we looked at the relationship between customer reviews and prices. This analysis showed us that number of reviews does not have a relation with prices but these reviews and comments play a big role in attracting the attention of travelers.

This analysis has a lot of scope for extension including:

  • What other factors boost the popularity of the listings ?
  • Time series forecasting of the prices.
  • Host analysis and recommending prices to the owners
  • Recommending user a better place to invest in order to obtain maximum revenue

To learn more about this analysis, see the link to my Github.

--

--