Relationship Between Popular and Unpopular Neighborhoods for Airbnb listings in Los Angeles, and comparisons using price and review activity

My Tran
INST414: Data Science Techniques
5 min readFeb 12, 2024

The question at hand is this: what are the most popular neighborhoods for Airbnb listings in Los Angeles and how do they compare in terms of average price and average reviews? Property investors and Airbnb hosts are important stakeholders to this information because they would love to have insights into neighborhood popularity and its influence on pricing, as well as review engagement. This analysis will inform decisions regarding property investment as well as pricing strategies for our stakeholders.

To answer this question, we will utilize a dataset with information on Airbnb listings in Los Angeles as of December 03, 2023. The dataset includes fields such as ‘neighbourhood_group’, ‘neighbourhood’, ‘room_type’, ‘price’, and ‘number_of_reviews’. These fields are what I will be using to do exploratory data analysis, and will be crucial when it comes to providing information about neighborhoods within Los Angeles, types of rooms available for booking, prices of the listings, and total review activity. This enables me to analyze neighborhood popularity based on listing count, average price, and reviews.

This dataset is sourced and publicly available on InsideAirbnb.com, which is a project “sourced from publicly available information from the Airbnb site. The data has been analyzed, cleansed and aggregated to faciliate public discussion.” (InsideAirbnb). This data is provided in CSV format, and we will be loading it into a Pandas dataframe for analysis.

Before going into this analysis, I have the following assumptions as well as predictions for results.

  1. Popular neighborhoods will have more listings
  2. The above does not necessarily mean that those listings will, on average, have higher prices compared to areas with less listings
  3. Total number of reviews reflect total consumer activity. This also means that if there are less reviews of a listing, then it means there has been less people that booked that listing at all.
  4. Price and amount of reviews probably have some type of relation, where the higher the price, the less people would want to book the listing (which would be reflected in less total reviews)

First, I cleaned the data. This involved removing unneeded columns for the purposes of this data analysis. I also removed rows/listings with values that were missing. Following data cleaning, I calculated simple summary statistics for price and number of reviews, grouped by neighborhood groups and neighborhoods. This will provide an overview of the data distribution and help identify any anomalies. The following are my findings:

Top 10 Median Airbnb Neighborhoods by Price
Top 10 Airbnb Los Angeles Neighborhoods by Price
Top 10 Airbnb Los Angeles Neighborhoods by Reviews

At a glance, you can see that neighborhoods with high prices tend to have less total number of reviews. While places with the most reviews have prices more in the range of 80–350 dollars.

Next, we will analyze neighborhood popularity by counting the number of listings in each neighborhood group and neighborhood. Here is that data at a glance:

Top 5 and bottom 5 Los Angeles Neighborhoods by amount of listings

Additionally, I will compare the average listing prices between the above neighborhoods:

Average price of listings for specified neighborhoods sorted from highest to lowest prices.

For the most part, the areas with highest listings have higher average prices compared to the areas with little listings.

Overall, I would say that the findings somewhat support my previously stated assumptions, but leave questions as well. I would say that more popular listings did not necessarily mean higher prices because the price differential between Hollywood and Beverly Hills is significant despite being relatively close in number of listings. However, all of the neighborhoods with little listings are at the bottom of the average price list, which proves otherwise. As for the relation between price of listings versus amount of reviews, I would say that in a larger sense, there were less reviews in high-price neighborhoods compared to neighborhoods more reasonably priced. However if you look closer, it is difficult to determine if that is true if you look at the median Airbnb analysis by reviews, since there is a bump in price despite being high in reviews.

I would say there are a great amount of limitations to my analysis. The dataset may not capture all Airbnb listings in Los Angeles, potentially introducing sampling bias. This is because it only represents listings at the time of the data collection (December), so listings added or removed after such a busy time of year were not included. Additionally, the data relies on self-reported information by hosts and guests, which may be subject to inaccuracies or biases. This is because some columns are listed on the site by the hosts, but they could technically change them often over time. External factors such as seasonality or events may also influence neighborhood popularity and pricing trends, which are not accounted for in the analysis.

Limitations to the analysis itself include the lack of weight and consideration for other aspects that may influence both price and popularity of a neighborhood. I may have also not adequately cleaned the data, as I noticed while using averages that there was quite high-highs to the top results in price. This may have occured due to extreme outliers in the price of listings, so going through the data and making sure that it did not include those outliers may have been better for the data analysis.

I would also like to note that the 2 assumptions I listed have to be valid in order for the analysis to be valid as well. In the case those assumptions are wrong at all, it may just throw my analysis out the window. In the future, it would be a good idea to make sure those assumptions are clear and valid.

In the future, I would also like to include analysis using visual data, such as heatmaps, using the other columns available in the data. This might make it easier to draw conclusions, especially for stakeholders.

The code I used for this analysis can be found here: https://github.com/vitamyon/INST414.git

--

--