Using data to understand the market for AirBnB rentals in Seattle

I looked at AirBnB data to see how homeowners could offer a great guest experience and maximize their revenues

Syuen Loh
Udacity Inc
9 min readSep 10, 2018

--

Image credit: allycatfoxtrot

Introduction

Before AirBnB, it would have been a nerve wracking prospect to let strangers stay in your home. AirBnB has changed this — with a mission that “connects people with places to stay and things to do around the world.”

The company has transformed the relationship between the homeowner and the renter. Most of us are familiar with the experience as guests, renting homes to stay in on AirBnB. But I was interested in the perspective of a homeowner.

AirBnB projects the prospect of making money by renting out your home with the platform. But homeowners, especially those renting out their homes for the first time, may have many questions: What price should I set my home at? Can I trust my home to guests? How can I ensure I get a good rating?

This dataset, generously provided by AirBnB on Kaggle, covering Seattle listings on the site has the potential to shed some light on these questions. The data covers 3,818 listings on AirBnB in Seattle. It provides information on home features, review scores, and their availability in the year 2016 and the very first days of 2017. We have only two day’s worth of data for 2017, so this analysis is focused on 2016 data.

Ultimately, as a prospective Seattle homeowner, my main objective would be to offer a great experience for my guests while maximizing revenues. To this end, I dove into the data.

Part I: What could be the main factors driving higher ratings?

The review scoring system on AirBnB is the main way in which guests can leave feedback on their hosts, and vice versa. Hence, as a Seattle homeowner, I would want to investigate what would drive guests to leave a high rating on AirBnB. A high rating would also have a positive feedback loop, in that homes with a higher rating would tend to attract other guests to stay there, ensuring that a home continues generating rentals.

In order to identify the drivers of higher ratings, I used a multiple linear regression model. An explanation of the math behind this model is beyond the scope of this article, but the general idea behind it is to determine the underlying trend of tendency (in math, called the correlation) of features in a home with the rating of that home.

In line with this idea, if the trend is a positive correlation, this means that the higher quantity of a feature of the home that is present, the higher the rating would tend to be. The model also gives the ‘weights’ of each feature that affects the rating. This means that the higher the weight of the feature, the bigger the effect of that feature on the overall rating.

Ultimately, what I discovered within the data was that the correlation was only slightly positive, meaning that I cannot say with certainty that factors present in the data are all of the drivers that lead to a higher rating. The relationship between the factors and that they lead to a higher rating is thus inconclusive.

Meanwhile, the top five factors—according to the weights that the model discovered are the ‘strongest’ factors (meaning that they have a higher tendency to lead to higher ratings)—are as follows:

  1. Count of host listings: the number of listings a host has

2. Review scores — value: did the guest feel the listing provided good value for the price?

3. Review scores — cleanliness: did the guests feel that the space was clean and tidy?

4. Review scores — accuracy: how accurately did the listing page represent the space?

5. Review scores — communication: how well did the host communicate with the guest before and during their stay?

The highest weighted feature—the count of host listings—could be a proxy indicator for credible and experienced hosts. It would therefore not be a surprise that more experienced hosts tend to be more knowledgeable in what guests would appreciate in a home, hence leading to higher ratings.

Meanwhile, the other four features constitute the review scores of other features of the home which guests are allowed to review (besides providing an overall rating) on AirBnB. It is no surprise that the model finds that guests appreciate good value for money when it comes to their stay.

However, it could be actionable to reduce the price if previous guests have been giving lower scores, as it indicates that the home might not be worth the price set. Some other actions could be to add additional amenities or extra services to the home to increase the guest’s perception of the value the home represents.

Meanwhile, a good rating is correlated with high cleanliness, as well as accuracy — hence, as a host, I would ensure my homes are clean and that the pictures and description are truthful and reflective of the home.

Lastly, good communication is weighted with the tendency to give a high rating. Hence, a Seattle homeowner should ensure they are responsive to guests, as this is a factor that leads to the guests’ rating an overall highly positive experience.

It is worth noting that the above features are weighted according to the same model that states the correlation is relatively weak, even if it is positive. However, even if the relationship is not conclusive, the factors above are still actionable, good practices that a prospective Seattle AirBnB host should incorporate when renting out their home on AirBnB. Not practicing them may hurt their overall rating.

Part II: When are the most popular times of the year for rentals of Seattle homes?

For a Seattle homeowner considering renting out a home on AirBnB, it would be useful to know the most popular times of the year for rentals of a Seattle home. This would enable them to plan the timing for any preparation of the home for the peak season, as well as planning maintenance or upkeep work for less popular months.

Using the data, I generated the chart below, which shows the number of listings that are unavailable in Seattle per day, through 2016 and the first two days of 2017.

Clear spikes of number of non-available listings at certain times of the month in 2016

The chart above clearly indicates certain spikes at certain times of the month. Moreover, the highest number of listings in the year occurred in January 2016. As we only have the first two days of data from January 2017, it remains to be seen if the pattern holds in January 2017.

Part III: When and where are the highest revenue-generating times of the year for Seattle homeowners?

As a Seattle homeowner, my motivation for renting out my home is to earn revenue from rentals. Hence, it is worth looking into the data to identify the highest revenue-generating times of the year. If the data could show me what the average prices are for the peak times, then I would be able to set prices competitively and maximize my revenues from renting out my property then.

When analyzing the data, I assumed that if a property is rented at the price of that date (hence it is unavailable), it means that that price is the market price, as it is the price that is willingly paid for by a guest. Conversely, if a listing is available during that period, this would imply that the price is not willingly accepted by the market.

This assumption follows the real estate definition of a home’s market price, as the price agreed upon by a willing renter and a willing homeowner (from here).

These may be simplified assumptions, but I decided that given the motivation of the analysis, they are acceptable simplifications. As a Seattle homeowner, I am more interested in the days of the year that guests/renters are willing to pay higher rent for my home, so that I can list a competitive price during that time. There is less of a priority to know all the other factors that may lead to a price being higher in that particular time.

The chart below charts the average price of all unavailable listings per day from 2016 to first two days of 2017. We can see that the peak average prices occurred around March 2016, while prices have started to drop towards 2017.

The average price for non-available listings has the highest spike in March 2016, and has been slowly decreasing towards the end of the year
Average prices for non-available listings range from $138–$140 in March 2016

Pulling out the specific average prices in the month of March in 2016, we can see that the price ranged from $138–$140. However, as we have no data for March 2017, we are unable to check if a similar price range holds in the next year.

We would need more data to validate if March is a seasonal peak for home rentals on AirBnB in Seattle, but this is a good start for a Seattle homeowner to explore the peak and trough seasons for home rentals.

Meanwhile, to identify the areas which would generate the most revenue for a Seattle homeowner, I combined the analysis from Part II to Part III. It is not necessarily the case that the highest priced area would give the most revenue to a homeowner, as the location may not be popular with guests. Hence, combining it with Part II analysis where I can identify where are the most popular areas, I can identify the highest generating revenue areas.

In the year 2016, the highest amount of revenue based on the number of rentals x the price of a listing is in Broadway neighbourhood

After combining the two analyses, I found that in the year 2016, the Broadway neighborhood was the clear top revenue generating neighborhood for Seattle homeowners, while Belltown is a distant second. Meanwhile, the other neighborhoods are about 60% less revenue-generating compared to Broadway.

Hence, an actionable next step for the prospective AirBnB homeowner is to check which neighbourhood their home is in and see what is the competitive prices for their home in that time of year, while checking how popular their neighbourhood is.

That being said, an important caveat for this analysis is that the data is only for one year’s worth of house listings. The trends might be different for 2018.

Conclusion

In this article, we explored how Seattle homeowners might best position themselves to provide a great experience for guests, while maximizing revenues from renting out their homes on AirBnB. We used data from AirBnB on Seattle listings in 2016 to the first two days of 2017 to achieve this objective, with the following findings:

  1. We attempted to discover if we can predict the features of homes that lead guests to give higher overall ratings. We found that while the relationship between the features of a particular home and the overall rating is only slightly positively correlated, there could be merit in trying to signal that you are a credible and experienced host, while having providing good value for the price of renting a home, high cleanliness, accurate descriptions and pictures of the home, and responsive communication with guests.
  2. We then explored the most popular times of the year for Seattle home rentals in 2016, with very limited data for 2017. We found that January 2016 had the highest number of home rentals, but this number rapidly decreased as the year progressed and even extended into the first two days 2017.
  3. Finally, we looked at the neighbourhoods and times of the year where average prices of listings were maximized, in order to determine the maximum revenue generating opportunities of a homeowner. We found that the Broadway neighbourhood generated the highest overall revenues, due to its popularity and average price combination. March 2016 provided the highest average price of a listing in the year 2016, with no data for 2017.

An important caveat for this analysis is that I only had data for the year of 2016 and only two days of 2017. Any trends or findings here are thus limited to within the year from which the data is provided. Further analysis on seasonal trends would require data from other years.

So, if you are a Seattle homeowner and are considering renting your home out on AirBnB, you could consider the factors above before deciding on when to rent out your home, and the price to set.

If you would like to see my code and analysis in further detail, the link to my Github is available here.

Syuen Loh produced this blog, based on the requirements of a project from Term 2 of Udacity’s Data Scientist Nanodegree program. In it, students develop real-world data skills and learn by doing. They choose a dataset, identify three questions, and analyze the data to find the answers to these questions. They then write a blog post to share their findings.

Editor’s Note: This article is the sole work of the author. Its views, findings, and conclusions do not represent those of Udacity.

--

--