Exploratory Data Analysis on Airbnb Properties in London
In this post, i am going to try to answer that Where should I buy new property for investment in London? This is my first project at Istanbul Data Science Academy about Exploratory Data Analysis (EDA).
At the beginning of my project, I would like to explain What is Exploratory Data Analysis(EDA)?
What is Exploratory Data Analysis
“Exploratory Data Analysis is a detective work. EDA can never be the whole story, but nothing else can serve as the foundation stone — as the first step ” –John W. Tukey, 1970
Exploratory Data Analysis is a crucial step for all data science projects. In real world problems, data never ever used directly to your data science projects. Collecting datasets, cleaning, organising and wrangling data is critical process to develop your model. In this project, I spent nearly %80 of my time about collecting, cleaning, organising and wrangling data at my project.
Exploratory Data Analysis helps you that,
- understanding the structure of data ,
- handling missing values and outliers,
- recognising relationship beetween variables,
Briefly, Exploratory Data Analysis tell us almost everything about data.
Introduction to Project
Every data science project starts with a problem / question. In my project, my business need scenario is:
Finding best location to buy a new property that will be hosted on Airbnb for investment. Constraints regarding this need are as follows:
Total Budget : £750.000
Preferred Location: Inner London
My solution is : Using Airbnb London data, House Prices Index (House Price, Sales Volume, Yearly Increment) data, Council Tax data, GIS Boundary data and analyse to find best location for buying new property.
My objectives about this project : Answering these questions for each borough;
- Number of properties that host on Airbnb
- Average price per night
- Estimate Occupancy rate
- Calculate annual income, annual expense, annual return, initial expense (tax, convenyacing,furniture etc.)
- Calculate average house price, annual increment and sales volume
- Calculate return of investment rate and total profit
You can see my workflow that I apply in this project below;
Data Sources and Tools
I obtained 2 types of data sources. One is Inside Airbnb data for detail property listing data. I decided to use the data for the 2019 to avoid being affected by the epidemic (covid-19). This dataset also include GeoJSON file of neighbourhoods of the city and airbnb properties location (latitude and longitude). So after my finishing analysing, I will be create some thematic maps.
Other data source is obtained from Mayor of London. London Data Store has hundreds of dataset group by 18 topics. My focus dataset is “UK House Price Index” located under housing topic.
My tools set is shown below;
I used;
- Jupyter Notebook on Anaconda distribution with Python Programming Language.
- Pandas and Scikit Learn libraries to data acquiring, cleaning and wrangling,
- Seaborn and Matplotlib libraries to data visualisation.
- QGIS to creating thematic map and spatial analysis.
- Tableau Public to analytics platform.
Data Wrangling and Analysing
Data Wrangling is a road from raw data to insights, report and visual graphs.
When you explore your data, you can use several inbuilt method that you need in Python such as info, shape, isnull, unique, count, describe, groupby, sort_values, pivot, merge etc.
Brief insights from data is shown below:
The following fields were derived and added to the airbnb listing dataset:
average: Average house price. Data obtained from UK_House_Price_Index and calculated mean values from monthly for each year, based on borough includes from 1996 to 2019.
est_night/year: Airbnb dataset didn’t include occupancy rate, so you need to estimated occupancy rate because this is a very critical metric to calculate annual income. Anyone who stayed at Airbnb Property, may not write review after their accommodation, so it would be appropriate to multiply by a factor. I assume that ; minimum_nights * reviews_per_month * 12 * factor
factor: When I estimate of occupancy rate for each borough, I use factor (coefficient) value. For Inner London’s Borough, factor is 1.50 and Outer London’s Borough, factor is 1.25.
occ_rate: The rate of occupancy per year. est_night/year / 365
annual_income: price * estimated night / year
avg_increment: average yearly increment for house price.Data obtained from UK_House_Price_Index and calculated for each month compared to the previous month with pct_change() method.
inner_outer: Borough’s locotion. This columns set manually for these boroughs [‘Kensington and Chelsea’, ‘Westminster’, ’Hammersmith and Fulham’, ‘Wandsworth’, ’Hackney’, ‘Southwark’, ’Tower Hamlets’, ‘Camden’, ’Lambeth’, ‘Greenwich’, ‘City of London’,’Islington’] to inner and others are outer.
avg_council_tax: Average council tax calculated from 8 different band’s (Band A, Band B,Band C, Band D, Band E, Band F, Band G, Band H) mean value belong to 2020–21 year
montly_fixed_cost: This value includes for monthly expensed such as insurance, electric, gas, water, internet, maintenance and other.
annual_expense: Annual expenses for your property. (montly_fixed_cost * occ_rate * 12) + (avg_council_tax)
annual_return: annual_income-anual_expense
sd_cost: Stamp Duty cost. If you buy a property at over £125.000 you have to pay tax (Stamp Duty Land Tax). I calculated for each boroug’s tax regard on average house price. The calculation of stam duty land tax shown below;
other_initial_cost: If you buy a new property that host on airbnb, you need to buy furniture also and you need to pay for other cost such as conveyancing, removal etc. I assume that £10.000 is needed for initial cost only once.
total_initial: sd_cost + other_initial_cost
total_budget: When you buy a new property you need to pay your house price + total initial (stamp duty land tax ant other initial cost). For example; average house price for Wandsworth is £588613 but you need budget £618043 because of total initial expenses.
avg_volume/year: average sales volume per year. Data obtained from UK_House_Price_Index
avg_volume/year_scale: average sales volume per year normalized beetween 0 and 1.
avg_h_pri_after1: Average house price after 1 year. If you need to sell your house after 1 year, it is important to estimate your house price.
(average) + ((average * avg_increment * avg_volume/year_scale) / 100)
ROI: Return of Investment. This metric calculated like that;
(annual_return / total_budget)*100
profit_aft1: Net Profit value after 1 year. If you need to sell your house after 1 year, your net profit calculated shown below ;
avg_h_pri_after1 + annual_income-total_budget-annual_expenses-(avg_h_pri_aft1*1.00/100)
I assumed that when you sell your property you can pay some cost nearly 1% of your sales price.
Some Graphs
Conclusion
At the end of the analysing, I used normalised value (between 0 and 1) for all metrics . 0 shows unfavourable (negative) for investor and 1 shows favourable (positive) for investor.
After analysing all data ragards on my client’s constraints (total budget and preferred location) I suggested on 2 different topic.
According to the Return of Investment Rate:
Tower Hamlets, Southwark and Wandsworth are most suitable boroughs for buying new property. Because of metrics that shown below:
- Price/Night
- Occupancy
- Annual Return (Income — Expense)
- House Price (buying)
- Initial Expenses
According to the Net Profit (After 1 year)
Wandsworth, Lambeth and Soutwark are most suitable boroughs for buying new property. Because of metrics that shown below:
- Price/Night
- Occupancy
- Annual Return (Income — Expense)
- House Price (buying)
- Initial Expenses
- Annual Increment
- Sales Volume
- House Price (selling)
Future Work
This project can be extended further by;
- Analysing UK House Price Index based on zip code at Wandsworth , Tower Hamlets, Lambeth and Southwark.
- Using booking.com data for average accommodation price.
- Using reviews data (airbnb, booking) & social media (Instagram, Twitter, Foursquare) and acquire comments about location.
I have used fundamental skills and tools required by a data scientist. This was a great way to wrap up my first project at Istanbul Data Science Academy!
If you interested Python Notebook Code and additional data you can reach my GitHub repo.
If you would like to explore my Vizzes you can look at my Tableau Public profile.
You can access the presentation of project at my SlideShare.
Thanks for reading my post. Feel free to contact me if you have any questions or if you’d like to share your comments.