Analyzing Seattle Airbnb Data to find hidden traits

Helpful tips for airbnb hosts as well as guests using regression analysis of airbnb seattle data from 2016.

Shreyas Matade
7 min readMay 10, 2019
https://www.shutterstock.com/image-photo/seattle-skyline-night-172981529

Introduction:

I have been living in united states since 2011. I like to travel and one thing that holds me from traveling is the hotels and flight expenses. On top of that even though you get any decent hotel for moderate charge, you will miss some basic amenities such as kitchen,laundry, cable TV (or you have to pay excessive amount of money to get these facilities).

Airbnb has definitely changed hotel industry and the way people travel. Now I am more lenient towards traveling by car at my holiday destination, renting an airbnb to get all the comforts of home and also save some money :) Whenever I am planning for vacation, there are different things that I consider according my needs like Its type (single room, house/apt), its distance from attractions that I am visiting, and most important its price and reviews.

In this post , I analysed airbnb seattle data from 2016 to help renters (guests) and home owners (hosts) to maximize their airbnb experience. I would be finding answers to following questions

  1. What are the major factors in getting high review rating ?
  2. What drives the listing price ?
  3. What is causing property from being not rented ?
  4. What factors would help in becoming superhost ?

I am trying to fulfill my curiosity by looking at airbnb seattle data from 2016. hence it might not be absolutely relevant due to time difference but this shall give you some ideas and tips if you are host or guest and trying to make most out of your airbnb experience. So lets get started…

Part I-Review Rating

Intro: For Hosts getting high reviews definitely boosts overall revenue as better review would lead to attracting more guests. When I analysed the data, I found some interesting points that can help hosts gaining better review rating .

For Review Rating, after cleaning up the data, I trained data on linear regression model with review rating as target variable. This model gave me coefficient for each feature. From the model, it looked like zipcode, property type, review score value and bed type has maximum affect on the review rating.

Looks like zip code 98122 has highest review rating mean value. Also hosts with Yurt, Dorm or Cabin are getting better rating than tree house and boat. Review score value has strictly linear relation with review rating and that is expected. Most interesting finding is bed type,

Bed Type vs Mean Review Rating

It can be seen in above chart, properties with Airbed and Couch are getting significantly less rating than other conventional bed types.

Regression model’s score (RMSE) was ~68%, that means in layman’s term by using my model we can predict review rating of a property with 68% accuracy. As per model,

Properties in zip 98122 are tend to get better review rating. This can be inferred that guests have enjoyed their stay in this area.

Zip code 98122

Airbed,Couch and Futon bed types can lead to poor rating. Dorm rooms normally gets better rating; and on contrary Boats and Chalets tends to get less rating. Clean property is also essential in getting better rating.

Part II-Super-Host

Airbnb has laid out the guidelines to become superhost,
Airbnb requirement for super host:

  • Completed at least 10 trips OR successfully completed 3 reservations that total at least 100 nights
  • Maintained a 50% review rate or higher
  • Maintained a 90% response rate or higher
  • Zero cancellations, with exceptions made for those that fall under our Extenuating Circumstances policy
  • Maintain a 4.8 overall rating

However, I wanted to explore any hidden trends, apart from above factors, by analysing airbnb data. Unfortunately, after analyzing data, I could not find any statistically significant factors that can tell hidden traits.

Facilities and Host experience has positive correlation, that means higher the value better is the chance of getting super host title this goes in line with airbnb requirements. Also number of reviews, super-hosts has mean of 42 and non-super host has mean of 17. No surprises here either.

Logistic Regression model’s score was 77%. As expected, higher number of reviews, host acceptance rate, host response rate, host experience yields better chances of getting super-host status. And all these parameters are part of airbnb super-host requirement. As shown in below distributions, In line with the airbnb conditions mean number of reviews for super-host are significantly higher than that of no super-hosts.

Number of reviews distribution plot for Super and Non-Super Hosts

Interestingly, apart from expected factors, my Logistic Regression model shows that more the number of Facilities you have at your property more chances of receiving Super-host status.

Part III-Property Listing Prices

For Most of the guests, this might be deciding factor before renting any property. To find out the affecting factors on listing price, I ran regression model and got some interesting results. I have also tried to see the seasonal effect on listing price using calendar airbnb data.

Regression model performs with score of ~62% and from the regression coefficients, it was clear that property type, room type and location (zipcode) dictates the price more than any other factor. Boats seems to be having highest listing price mean. Dormitories seems to be cheaper along with Shared room. Also, zip codes 98177,98178 indicates lower prices and properties in zip code 98134 would normally cost more.

Cheaper area in Seattle for airbnb properties for rent.
Costilier area in Seattle for airbnb properties for rent.

Location effect of List price:

Below graph shows how mean listing prices differed by area in Seattle for year 2016. Looking at it, we can see mean list price of area 98134 was significantly higher than other areas since this area is close to shore and includes a island, also this area has lots of boats an that might have increased the mean price since boat prices are higher than other property prices. On contrary, areas 98125,98146,98177,98178 have significantly lower listing price mean.

Mean Listing price by Zipcodes

Seasonal affect on List price:

In 2016, Airbnb prices were at its lowest in January and February month. Then they seems to increase gradually reaching its peak in June. After June again we see slight drop in prices till November. December seems to have highest price mean.

Mean Listing Price by Month

Along with location, property type and room type, listing prices are also affected by month of the year. Guests can certainly save some money if they plan their travel in January or February.

Part IV-Lost Days

Lost days, for the hosts, the number of days a property was not rented by anyone. In turn, host should try to minimize lost days in order to maximize their airbnb revenue. Lesser the lost days more the revenue!

I tried to find what factors causing lost days to increase.
unfortunately, there were not too many variables that were statistically significant; however, I was able to find one variable that can help in reducing lost days, Property type.

Property type dormitory and chalet are showing significantly less number of lost days than other types in 2016.

Lost days by property type

Prediction: Predictive model performs really poor for Lost days, giving score of ~11% which makes any kind of prediction hard.

Even-though poor predictive model, Data suggest that Dormitory and Chalet has lower number of Lost days in 2016. So, Host can consider renting Dorms or Chalets in order to achieve more revenues.

Conclusion:

After analyzing data i believe I found answers that i was looking for,

Review Rating varies significantly by location which was interesting to find out. Property type — dormitory normally gets better rating.Bed type certainly affects review rating.

For SuperHost status,interestingly, higher the number of facilities (Total number of facilities provided in property) greater the chance of getting super host status. However, mostly results were inline with Airbnb’s guideline to achieve superhost.

For Lost Days, dormitory and chalet gets minimum lost days, so host may earn more revenue by renting out these type of property.

For Listing Price, Area, property type, room type and month of year are the measure factors that decides list price.

In Conclusion, since I have analysed airbnb 2016 data relevance to current year might not be absolutely relevant; having said that I recognized few traits that might be helpful for hosts as well as guest which I believe shall remain significant irrespective of timelines . Below are some tips to maximize the airbnb experience:

Tips for hosts:

  1. For better review ratings, avoid Futon, Airbed or Couches in your property. it is recommended to have conventional type of beds.
  2. Area of the property is important, so if buying property for airbnb then consider zipcode 98134 and avoid zipcodes 98125,98177,98178,98146.
  3. Property type Yurt might not be renters first choice and it might lead to more lost days. On contrary, Dorms and Chalet gets rented a lot.
  4. Boats tend to get more listing price and poor ratings, so if you have boats there is opportunity for improvement and get more revenue.

Tips for guests:

  1. To save money, it better to travel in month of January or February where prices are low.
  2. Zip 99122, has more satisfied guests than any other area.
  3. Area zip codes 98125,98177,98178,98146 may have below average price properties.

References:

Git Repo : https://github.com/shreyasmatade/airbnb-seattle

https://www.thestreet.com/lifestyle/travel/how-does-airbnb-work-14714337

--

--