Budget-Holidays for little money in the best areas

The analysis involves Seattle, WA, USA Airbnb Dataset

E Neuburg
Geek Culture
8 min readJun 21, 2021

--

Photo by Leon LEE on Unsplash

Introduction

Even if, there are many online platforms which battle for the best service and price it is difficult and time consuming to find the wished accommodation which is a good value for money.

The aim of this paper is to discern and clearly distinguish a price listing analysis on how tourists can spent their holidays with a case study in USA, specifically in Seattle’s best areas for little money (Airbnb).

Due to economic growth, globalisation and the curiosity that motivates people to travel/learn new cultures and to broaden their horizons, travelling abroad as well as in one’s own country sharply increased over the last years.

This paper will explore the possibilities to stay in the best areas for little money thus the term budget holiday.

Question is: have you also spent hours and hours undecided between a holiday rental’s value for money in a hotel or should it be holiday home? Hours spend online websites looking for diverse accommodations, comparing the price and asking yourself should I pay more for especially good area near all touristic activities, or should I choose something cheaper far away from the city centre?

Therefore, to explore the potential economic costs and benefits I used the Airbnb data to take a deep dive into the world of prices and accommodations in Seattle, WA, USA.

“What factors highly influence the prices of holiday homes in Seattle?”

1. How does the type of the property influence the price?

2. How does the neighbourhood have impact on price?

3. How well can the price for premises be predicted?

Part 1: How does the type of the property influenced the price?

Bevor taking closer look at the different prices per rental type it is necessary to investigate the distribution of price in Seattle.

The figure shows that the price is normally distributed with few outliers. If we look at the table below, we can see, that the aggregated price range extends from min 20 $ and max 1000 $. Even though, the mean is 128 $, the standard deviation is 90 $.

This analysis shows that indeed there are different price compositions, and ones can find something for every budged. However, we can’t see for what kind of accommodation you are exactly paying? Is it probably a tent? While you are really like to save your money, you don’t want to do it at any cost.

Turning to the figure above, we can see that almost all properties have a wide range in their prices from very low to considerably high. Nevertheless, the apartments and houses have the greatest difference comparing min and max price. One possible explanation could be that apartments and houses are also the most liked rental types as other kind of properties.

Due to the outliers, it is difficult to evaluate, the common prices for different types of rents in Seattle. For that reason, looking at the table under (mode: the price that appears more often for every accommodation) you will find that the prices altogether remain “moderate” especially for expensive property types such as apartments and houses.

Part 2: How does the neighbourhood have impact on price?

If you look at the figure down (“Listing Price in Neighbourhood”) assessing price distribution in different areas, it can be clearly seen that the price range in most of the areas is enormous wide. What is especially noticeable, is that such areas as Portage Bay, Southeast Magnolia, Westlake are most affected by this enormous range differences.

On the next stage of this analysis for the sake of simplicity, this paper will concentrate on the top ten most expensive areas (using mean for selection) as well as on two rental types such as apartments and houses because they are the most popular for rent.

The two graphs show information about minimum and maximum prices for houses and apartments in the top ten expensive areas. Comparing these two values, it is very clear that the min price of houses is steeply higher than min price by apartments. Very interesting is that if you look at the max price between houses and apartments you find that the houses remain to be more expensive in most neighbourhoods compared to apartments.

Below the max price figures highlight the cases where the apartments outperform the houses e.g., Windermere or Briarcliff.

The answer to budget holiday makers is yes, it’s possible to stay in the best area for little money than you thought if e.g. you choose to stay in an apartment in the same area other in a house.

Part 3: How well can we predict the price for premisses?

There are different factors that have influence on the premisses price. However, probably the most important question is how can we use this information? What advantages can the business get from it. As a host you almost interested in the topic: Is my price adequate? Am I below or too high above the common prices for the equivalent properties?

The prediction is one of the ways to estimate the correct price taking into consideration different factors automatically. For that reason, I will build a model using Linear Regression in order to find the dependencies between price and other factor that may have impact on it.

Bevor we start to build a model it is necessary to define, which features we would use for training. Using all of them is an option but still it won’t guarantee you, that you will get the best model. Why ? Because it increases the dimensions as well as training time.

Therefore, we start with exploring numerical variables in our dataset. The good practice in this case is to look at the heatmap.

Since the major point of this project is the price, we will look at the relationship between price and other features in the dataset. The matrix shows that the price is highly correlated with accommodates, bathrooms, bedrooms, bed, and square feet. Yes, it seems to be natural, that you must pay more for more extras.

Nevertheless, the reviews have negative impact on the price development. The way to solve this challenge will be to apply sentiment analysis and so to figure out whether the reviews are positive or negative. The scope of this project doesn’t cover this part.

What’s more, the feature selection is based on the heatmap matrix as well as on the analysis above. Unfortunately, 75 % of properties doesn’t include information about square feed. Thus, in this case it’s not advisable to use or choose this feature for the training.

To build and then to analyse the model properly, it’s necessary to split out dataset into training and test data (70% and 30 %). For the performance evaluation I will use R squared and MSE metrics. R squared measures a goodness of the fit for linear regression and it is between 0 and 1 (1 for a good relationship otherwise 0). Additionally, I will use MSE (Mean Squared Error) to investigate the variance and bias of the model.

Turning to the model, comparing the two data sets below we can see that the test set outperforms a little the training set comparing R squared 0.57 and 0.56 accordingly. Overall, the training set shows, that it has more variance (training contains 1000 $ price for rental) as the test set which explains the difference.

However, in case of MSE it is higher for test data (3636) as for training data (3425) that indicates, that the model is overfitting the training data.

In summary, it is necessary for future investigation to experiment with other types of Regression e.g. RANdom SAmple Consensus, data wrangling and feature selection.

Conclusion

In this article, we had a closer look at the different price compositions depending on a rental type and neighbourhood in Seattle.

1. We looked at the diverse properties prices and the results were that for all rental types there are an offer to get something for all budgets.

2. In addition, we investigated the combinations of price, rental type and neighbourhood and clearly found out that the price for houses is almost higher as for apartments. However, even in these areas it is possible to home-away-holiday “rental’s value for money” in Seattle’s best areas.

3. Finally, we developed the Linear Regression model to make the best price prediction. In a nutshell: the results of Linear Regression model shows that even though, the model works fine for the lower prices, it struggles with prediction of the higher prices. Taking this into account other further experimentations and investigations should be done in order to improve the model.

The above findings and other factors such as size of the property, as well as the amenities were not included in this study. Thus, there is a need to be extended into further analysis regarding the potential economic costs and benefits for holiday homeowners on one side and on the other side to not only attract tourist at their vacation rentals but also to offer them rental’s value for money”.

Since the above analysis doesn’t cover all parts involving vacation there are still more questions to be answered like:

Why some properties are so outrageous expensive?

To see more about the analysis please feel free to investigate using the link to my Github here.

--

--

E Neuburg
Geek Culture

Data Scientist / Machine Learning/ Deep Learning / Natural Language Processing/ Project Management