Seattle and Boston Airbnb Listings Analysis

As part of Udacity’s Data Scientist Nanodegree Program’s the first project is to analyze Seattle and Boston Airbnb datasets from Kaggle .

This project follows the CRISP-DM process.First, we need to understand the business and the data. Initial analysis has been performed to see what variables we have in each of the datasets provided from Airbnb in order to create some useful business-related questions and insights. The raw data will be read in and examined using basic statistical analysis and visualizations. I did data cleansing, data wrangling and data preparation before answering the question or doing data modeling. Then I tried to answer three questions which are:

Question 1: Which neighborhood has the most amount of rent in each state?

Question 2: Among Boston and Seattle on average which one has higher listing prices?

Question 3: How can I predict the price listings prices in Boston and Seattle?

For the lat question we need to use machine learning algorithms.

Question 1: Which neighborhood has the most amount of rent in each state?

As we can see, there are different neighbouhood in Seattle and Boston. In Boston is “Allstone-Brrighton”, and in Seattle Capital Hill have the highest amount of rent.

Question 2: Among Boston and Seattle on average which one has higher listing prices?

The price is skewed signifcantly in both Seattle and Boston. The lower end after 500$ ,the properties are more spread out across the range as a whole whereas Seattle has a huge spike around 100. I replot the above plots to show only prices less than 1000 to better show the differences between them.

Answer: According to the Calender ,from these plots we can conclude that data Boston on average has higher listing prices than Seattle.

Question 3: How can I predict the price listings prices in Boston and Seattle?

The r2 score of the prediction model for Seattle database with Random Forest algorithm,R² are 0.951 for train, and 0.540 for test . with XGBoost algorithm,R² are 0.998 for train, and 0.608 for test. There is no overfitting in the model, but I think the model needs some improvments to increase the r2 score for test dataset. XGBoost algorithm has better accuracy and R².
In Random forest algorithms the important features that impact on price are slightly different from XGBoost. But they have many common features that have impact on price. Bedrooms and accomodationin both are the most important featurs that influence on price.

We can repeat steps 1 to 8 for Boston to create a model.

--

--

--

blog post to communicate your business insights:

Recommended from Medium

How to become a Data scientist?

Decorators in Python

Li-ion cell capacity estimation: LSTM neural network vs. Kalman Filter-based methods

Graphing Phoenix’s Extreme Heat

A fiery sunset above Phoenix sahuaro cacti

NumPy for Data Science Interviews: Part 02

NumPy for Data Science Interviews

How To Choose Right Data Visualization Charts For Your Data?

Can Spotify’s Audio Features Differentiate Drake and Ed Sheeran’s Music

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Azadeh Iranmehr

Azadeh Iranmehr

More from Medium

Investors series — What’s your valuation? And how to answer that question.

On Moral and Analytical Confusion about WW2 Online

Duckhorn Portfolio Sees Double-Digit Growth on Demand for Luxury Wines

Why are Burundian refugees coming to Serbia?