# Seattle and Boston Airbnb Listings Analysis

As part of Udacity’s Data Scientist Nanodegree Program’s the first project is to analyze Seattle and Boston Airbnb datasets from Kaggle .

This project follows the CRISP-DM process.First, we need to understand the business and the data. Initial analysis has been performed to see what variables we have in each of the datasets provided from Airbnb in order to create some useful business-related questions and insights. The raw data will be read in and examined using basic statistical analysis and visualizations. I did data cleansing, data wrangling and data preparation before answering the question or doing data modeling. Then I tried to answer three questions which are:

Question 1: Which neighborhood has the most amount of rent in each state?

Question 2: Among Boston and Seattle on average which one has higher listing prices?

Question 3: How can I predict the price listings prices in Boston and Seattle?

For the lat question we need to use machine learning algorithms.

# Question 1: Which neighborhood has the most amount of rent in each state?

As we can see, there are different neighbouhood in Seattle and Boston. In Boston is “Allstone-Brrighton”, and in Seattle Capital Hill have the highest amount of rent.

# Question 2: Among Boston and Seattle on average which one has higher listing prices?

The price is skewed signifcantly in both Seattle and Boston. The lower end after 500\$ ,the properties are more spread out across the range as a whole whereas Seattle has a huge spike around 100. I replot the above plots to show only prices less than 1000 to better show the differences between them.

Answer: According to the Calender ,from these plots we can conclude that data Boston on average has higher listing prices than Seattle.

# Question 3: How can I predict the price listings prices in Boston and Seattle?

The r2 score of the prediction model for Seattle database with Random Forest algorithm,R² are 0.951 for train, and 0.540 for test . with XGBoost algorithm,R² are 0.998 for train, and 0.608 for test. There is no overfitting in the model, but I think the model needs some improvments to increase the r2 score for test dataset. XGBoost algorithm has better accuracy and R².
In Random forest algorithms the important features that impact on price are slightly different from XGBoost. But they have many common features that have impact on price. Bedrooms and accomodationin both are the most important featurs that influence on price.

We can repeat steps 1 to 8 for Boston to create a model.

--

--

--