An Exploratory Data Analysis using MySQL to find the perfect Airbnb

Follow this guide to Discover how you can save up to €500 while getting top-reviewed Airbnbs

Noshin Nawar Neha
Learning Data
Published in
6 min readAug 15, 2024

--

When planning a family trip, it is important to manage a living budget. After all, accommodation is one of the largest expenses, and striking the right balance between comfort and cost is essential for enjoying the trip without breaking the bank.

This is where Airbnb comes in as a popular choice for families. Unlike traditional hotels, Airbnb offers the flexibility of having a kitchen, multiple bedrooms, and a living space, meaning families can enjoy their home's comforts while exploring a new city. Plus, the variety of listings ensures something for everyone, making it easier for families to find a place that fits their needs and wallets.

Objective

Illustration of a family

Meet Mr. Harry and his family. They live in Malaysia. Though he has traveled to Singapore and Hong Kong before, he has always stayed at his relatives' or friends’ houses while on trips. Now he is planning to visit Berlin for the first time and wants to find an Airbnb that suits his plan perfectly.

So, in my latest data analysis project, my goal is to assist Mr. Harry and his family in finding the perfect Airbnb that suits their preferences and budget by analyzing various factors such as price, location, and reviews. I want to simplify his finding process so that he does not have to worry about finding a suitable place to stay.

Requirements

They have some requirements for the project in terms of finding a place:

  1. The family wants to rent an Airbnb that is within their budget and well-reviewed.
  2. They would like to see the average prices first, and as per the average prices, they are open to adjusting their budget of €120 per night if needed.
  3. They prefer renting an entire apartment or a hotel room as a family.
  4. They need the Airbnb to be available for a continuous 10-day stay.
  5. They are particularly interested in popular neighborhoods like Mitte, Friedrichshain-Kreuzberg, Pankow, or Neukölln.

Dataset

The Inside Airbnb website offers seven different files for each city or region. And the best thing about this website is that the datasets are updated every once in a while. So this is the best resource to get an up-to-date dataset to do projects with. This dataset was last updated on June 22, 2024. After reviewing the project requirements, I started with the ‘listings.csv’ file. This file contains information and key metrics for listings in Berlin, Germany. The dataset contains 13759 rows and 18 columns. An added benefit is that it comes with a data dictionary for the listings in this CSV file.

Data Preprocessing

The updated CSV file is clean and formatted. I did not have any problem with that. However, I faced some problems while importing the data from my local drive to MySQL Workbench.
There are some tools available for that: MySQLdb, MySqlConnector, and PyMySQL.

Firstly, I was using MySQLdb and constantly getting errors. Then I googled the problem and in a Youtube video, it showed MySqlConnector will solve the problem. Again, nothing happened. Lastly, the problem was solved by the PyMySQL package. I am not saying this is the best package. Try and see what works for you.

First, create a database in MySQL. I named it ‘airbnb_analysis’. You can use any name of your choice. This name will be used in the code as well.

CSV transfer from local drive to MySQL

Analysis Begins

Number of Listings and Average Prices by Room Type in Popular Neighborhoods

Here I searched for the listings and their average prices in the famous neighborhoods, as it will be helpful for Mr. Harry to get an idea about the prices and fix the budget for further analysis.

Number of Listings and Average Prices by Room Type in Popular Neighborhoods

Recommendation

As there are lots of listings that are around €120, I suggested going with the initial budget, which is €120/night.

Total Number of Apartments and Hotel Rooms Available Under €120 in Preferred Neighborhoods

Mr. Harry preferred apartments and hotel rooms. So I wanted to see the total number of apartments and hotels that are available under €120, and the number was surprisingly high.

Total Number of Apartments and Hotel Rooms Available Under €120 in Preferred Neighborhoods

Recommendation

As there are lots of options for apartments compared to hotel rooms, I advised him to focus only on the apartments for a better selection within the budget.

Total Number of Listings by 10-Day Price Range (Available for at Least 50 Days for the next 1 Year)

Now my findings will be limited to apartments under €120/night (€1200/10 days). Although Mr. Harry is only looking for a continuous 10-day stay, I will focus on finding apartments that are available for at least 50 days from June 22, 2024 (the last updated day of the dataset) to June 22, 2025.

Total Number of Listings by 10-Day Price Range in Preferred Neighborhoods
Number of Listings by 10-Day Price Range

Recommendation

Although the budget of €1200 for 10 days was fixed initially, I recommended considering the €260–€700 range instead, as it also offers a lot of options and allows for better cost management.

Curated List of Highly Reviewed Apartments in Berlin within the €260–€700 Range

This is the most-awaited last part of the analysis. But it is done in two parts:

  1. First, I gathered information about the maximum review for the €260–€700 price range. Though the maximum review number was 520, I rounded that to 600 as the highest review and took the top 15 review numbers. Now the review range becomes 260–600.

2. After getting the review range and maintaining all the client requirements, I got the list of 15 available apartments.

Review range and Curated List of Highly Reviewed Apartments

To ensure a smooth and enjoyable trip, it is important to carefully select accommodations that align with his budget, preferences, and availability. This analysis reduced the list from 13,759 to just 15 listings while bringing the budget down by up to €500 from the initially fixed budget.

In Closing

Thank you for taking the time to read through this analysis. All the codes are also listed in my Github.

The contents of external submissions are not necessarily reflective of the opinions or work of Maven Analytics or any of its team members.

We believe in fostering lifelong learning and our intent is to provide a platform for the data community to share their work and seek feedback from the Maven Analytics data fam.

Submit your own writing here if you’d like to become a contributor.

Happy learning!

-Team Maven

--

--

Noshin Nawar Neha
Learning Data

I am not a robot, I am just a data science enthusiast. LinkedIn -> https://www.linkedin.com/in/noshin-nawar-neha/ Let's connect...