Data driven insights of Seattle AirBnB listings
AirBnb is an online marketplace for providing lodging, primarily b&b (bed and breakfast). The company does not own any of the listings on the application; it acts as a broker and receives commissions from each booking. Started in 2008, the company is based in San Francisco, California, US.
The company was conceived after its founders put an air mattress in their living room, effectively turning their apartment into a bed and breakfast, in order to offset the high cost of rent in San Francisco; AirBnB is a shortened version of its original name, AirBedandBreakfast.com
Airbnb’s market share has been on rise dramatically since 2010. 2019 statistics estimate that Airbnb now accounts for up to 20% of the vacation rental industry as a whole. The adaptable nature of listings on AirBnB, from renting an apartment in a metropolitan city to renting a getaway place in redwoods make it a huge competitor in the industry.
Fascinated with AirBnB’s growing usage, in 2016 Murray Cox, an independent digital storyteller, community activist and technologist, started an investigatory website named Inside AirBnB, which reports and visualizes listings data scraped from AirBnB.
In this post, we will be analyzing the AirBnB listings in Seattle. This data can be downloaded from kaggle . This is only a small part of Inside AirBnB data. The original data can be found here.
There are three files in this dataset.
- listings.csv — includes full descriptions and average review score
- calendar.csv — includes listing id and the price and availability for that day
- reviews.csv — includes unique id for each reviewer and detailed comments We’ll be using all three files individually for analysis.
The aim of this project is to perform analyses on AirBnB data and answer the following questions which I constructed after initial assessment of the data:
- “What factors highly influence the prices of listings in Seattle?”
- “What is the seasonal pattern of prices?”
- “What is the relationship of reviews with price?”
To learn more about this analysis, see the link to my Github.
Part 1: Factors that Influence the prices of listings
After looking at the data and the prices of each listing, I was interested in finding out the parameters in the dataset that have an effect on the prices of the listings. For this part, listings.csv data is used.
- How Property Type affects prices?
Here you can see the chart that shows the frequency of property types in the listings.
It can be seen that people are more inclined towards listing their entire property than posting private rooms or shared rooms. It can also be observed that property type plays an important factor. Not surprisingly, Apartments and Houses take up an overwhelming majority of all listings, although we do see few instances condominiums and townhouses.
Below is a chart showing prices of listings broken down by property type. This gives us a much better understanding of the price breakdown in Seattle based on property and room types.
It can be analyzed that for almost all property types, prices for Entire home/apartment are the maximum.
2. Neighborhood and number of rooms
In this step, we’ll be looking at the prices of listings based on number of bedrooms and the neighborhoods these listings are located in. I plotted a heatmap graph that shows the prices for each listing in a neighborhood with number of bedrooms it has.
From the heatmap, it can be observed that there is an increase in the price of listing with increase in number of rooms. The amount increased depends on the neighborhood as well.
Depending on what time of the year it is, these prices vary.
Part 2: Seasonal Pattern Analysis of Prices
In this part we’ll be looking at variation of prices at different times of the year and what time is better for a traveler to visit Seattle. For this part, I used calendar.csv data. The data in calendar.csv looks as below:
listing_id date available price
0 241032 2016-01-04 t $85.00
1 241032 2016-01-05 t $85.00
2 241032 2016-01-06 f NaN
3 241032 2016-01-07 f NaN
4 241032 2016-01-08 f NaN
To get a better understanding of the prices variation during an year, I plotted a bar graph of average prices for each month.
Clearly, we can see that prices were high during the summer months July and August. To analyze the data further, I have extracted the day from given date and checked whether that particular day was a holiday and the reason for holiday. To get the holidays data, I used the ‘holidays’ python library.
Next I calculated the average price for each day in a week and it can be seen in the following plot.
It can be seen that the prices are fairly high for the weekends than that of weekdays.
Now let's dig into July 2016 and August 2016 data to find the reason behind increase in average prices. I plotted the list of holidays and the average prices for listings in descending order.
From the graph above, it can observed that the prices were high in July because of presence of US Independence Day (July 4th). As there is a presence of long weekend during July month, the prices were higher.
Hence it can be seen that price of a listing increases over weekends that compared to weekdays.
Part 3: Relationship between customer reviews and prices
There are so many factors which contribute towards the price of a listing on AirBnB. While we already have few conclusions for relationship between prices and their dependency on various factors of a listing, let’s analyze if price of a listing is dependent on number of reviews or not and if the relationship exists, how does it affect the prices?
For this part, I used reviews.csv data. As a preprocessing step, I removed the comments which are not in English because I used built-in analyzer in the NLTK Python library to assign polarity score to each comment and this analyzer doesn’t interpret languages other than English.
After calculating polarity, and plotting it on a graph, most of the comments have 0 negative polarity. i.e. most of the comments are either neutral or positive.
Number of reviews versus Price
Out of curiosity I explored if there is any relationship between the number of reviews for a listing and it’s price.
- From the graph, the reviews were most observed for the listings that have a price range around 100–300. The number quickly declines as the price goes up.
- It shows that, there is no necessity for an expensive listing to have more reviews. Hence, the Prices have no relation with the Number of reviews.
Most used words in reviews
Here you can see the most used words by customers in reviewing a listing. I used wordcloud library of python to plot these words.
The words like “great host”, “definitely recommended”, “comfortable”, “everything needed” were most used by the customers. These reviews and comments play a big role in attracting the attention of travelers.
In this post, we took a look at analysis of Seattle AirBnB listings data from January 2016 to January 2017.
- We gathered some of the factors that influence the prices of listings, which showed that number of rooms, neighborhood locality, and type of listing played a major role.
- We then looked at the seasonal patterns in the prices. This showed that prices of listings go up during weekends and especially in July and August months.
- Finally we looked at the relationship between customer reviews and prices. This analysis showed us that number of reviews does not have a relation with prices but these reviews and comments play a big role in attracting the attention of travelers.
This analysis has a lot of scope for extension including:
- What other factors boost the popularity of the listings ?
- Time series forecasting of the prices.
- Host analysis and recommending prices to the owners
- Recommending user a better place to invest in order to obtain maximum revenue
To learn more about this analysis, see the link to my Github.