Boston AirBnb Data Analysis

Akshay Jaitly
5 min readJun 7, 2020

--

AirBnb has taken the world by storm and has been making planning vacations in a new city exciting and invigorating. AirBnb has over 150 million users worldwide and 6 users check into AirBnb every second. Isn’t that amazing?

So, what better way to enhance this amazing service by analyzing data and the key trends.

Crisp DM is a strategy that is a key component to help us analyze the data better. Now, you might have the question, What is Crisp DM?

Crisp DM is known as Cross-industry standard process for data mining.

Let us deep dive and further understand the data.

Motivation:

The motivation for analysis of this data is to deep dive into Data Science field after having experience as a Software Engineer for around 2 years. It is further enhanced by the Data Science Nano degree that I am pursuing at Udacity.

Data Understanding:

The data (available at Kaggle) has 3 files

  • listings: Full descriptions and average review score
  • calendar: Listing id and the price and availability for that day
  • reviews: Unique id for each reviewer and detailed comments
Listing data

The listing data is the primary dataset over which I will perform my data analysis.

It has 3585 rows and 95 columns.

Business Understanding:

The most common logical questions that anyone would ask in order to make profit as a new AirBnb host or as a user of the service.

These are the common questions that we will ask to better analyze the service.

  • Most common price listings for AirBnb?
  • What is the relation between price and property type?
  • Which room types in each neighbourhood have high prices?
  • What are the top 5 amenities?

Q1. What are the most common Price listings?

Data cleaning for price : making it a float type

Data preparation:

In order to better perform our analysis, we have cleaned the Price field of the listings data frame and converted it into a float.

Price vs Number of AirBnbs

As we can clearly see, most of the AirBnbs are lesser than 700 dollars in rent which means that the data is mostly in the range of 0–500 dollars.

Highest price is 4000 dollars

The following plot shows that the highest price is 4000 dollars. The plot shows that the count of premium priced AirBnbs above 700 dollars is less.

Breakdown of prices below 700 dollars

The following plot shows that most of the AirBnbs are in the 50–200 range. Thus, the most common price listings are in this range.

Results: The following analysis is pretty insightful and shows us the range of Price points that are available with the highest being 4000 dollars and the most common price listings being in the 50–200 USD range.

Q2.What is the relation between price and property type?

Heatmap for Property type

The following heatmap shows the average prices for each property type. The Shared room type is mostly the cheapest while the bed and breakfast property type is generally the cheapest.

Q3. Which room types in each neighbourhood have high prices?

Average prices per neighbourhood

I calculated the mean price for each neighbourhood and sorted them to indicate the most expensive neighbourhoods in each location. South Boston WaterFront and Bay Village are the 2 most expensive neighbourhoods. On the other hand, Mattapan and Dorchester are the cheapest neighbourhoods.

Avaerage Price per neighbourhood

Now, coming to the key question. Which room types in each neighbourhood command a higher price. Let us understand with the help of the following scatterplot.

Price per room type in each neighbourhood

The general trend is that each neighbourhood, private rooms are the cheapest while the Entire apartments are expensive.

Q4. What are the top 5 amenities?

Data preparation:

Cleansed the amenities column

I have performed data cleaning in order to better use the amenities column.

Multi Label Binarizer to find top 5 amenities

I have used scikit learn’s Multi label binarizer to find out the top 5 amenities that are the most frequent ones.

Top 5 amenities

The following graph shows us that Wireless Internet, heating, Kitchen, Essentials and Smoke detector are the most common amenities that are there in most of the AirBnbs.

Deployment

I have used Google collab to deploy my code. You may also use Jupyter Notebooks to run the code. The code libraries used and detailed code breakdown is available at GitHub and is explained in the Readme.

This is the GitHub Link: https://github.com/AkshayJaitly/Boston-AirBnb-Analysis.

What’s next:

Use predictive analytics to determine prices in the future. Determine seasonal effects on pricing. Understand more about Superhosts and what makes them special.

Feel free to follow me on GitHub.

--

--

Akshay Jaitly
0 Followers

Software Engineer Master’s in Computer Science from New Jersey Institute of Technology