Exploratory Data Analysis Of Istanbul Rental Flat Market FEB 2023 via Web Scraping

Hikmet Emre Guler
9 min readFeb 21, 2023

--

Hello everyone,
Today we will talk about one of the most beautiful city on earth Istanbul.

Istanbul is a vibrant and bustling city located at the crossroads of Europe and Asia. As the largest city in Turkey, Istanbul is home to over 15 million people and is a hub of culture, commerce, and tourism. Its real estate market has seen significant growth in recent years, with the rental flat market being particularly dynamic. With its unique blend of history and modernity, Istanbul is a fascinating subject for data analysis in the real estate sector.

Unfortunately, economic situation in turkey is not going very well. Over the last year Turkey has 85.51% inflation rate. Still there is no economic stability in Turkey! Analyses have to up to date!

I decided do data analysis of Istanbul Rental Flat Market. For doing this, I have to create my own data from web.

Web Scraping is a brilliant idea for creating up to date dataset. My data source is hepsiemlak.

Via web scraping, I collected 11143 rental flat data. After EDA process ,we have some interesting numbers and visualizations…

Required Tools For Data Analysis

A Short HTML Introduction

HTML (Hypertext Markup Language) is a markup language used to create and structure content for the web. It contains necessary data for web scraping, such as text content, images, links, and other elements that make up a webpage. With the right tools and techniques, web scrapers can extract and parse this data from HTML files to build datasets, analyze trends, and inform business decisions. Understanding HTML and its structure is a crucial step in successful web scraping.

HTML attributes provide additional information about an HTML element. They are used to modify the behaviour and appearance of HTML elements and can be added to opening tags. Attributes can be used to specify the size, colour, font, and other features of an element, or to provide links, alt text, and other metadata. Attributes can also be used to define classes and IDs, which are important for styling and script targeting. Understanding how to use HTML attributes is essential for creating well-structured, accessible, and responsive web pages.

  1. class attribute: This is used to define a class for an HTML element, which can then be used for CSS styling or JavaScript targeting. Example: <div class="container">
  2. id attribute: This is used to define a unique identifier for an HTML element, which can be used for JavaScript targeting or as a fragment identifier in a URL. Example: <h1 id="main-title">
  3. src attribute: This is used to define the source URL for an image or media element. Example: <img src="image.jpg">
  4. href attribute: This is used to define the destination URL for a link element. Example: <a href="https://www.example.com/">Link</a>
  5. alt attribute: This is used to provide alternative text for an image element, which is displayed if the image cannot be loaded or for accessibility purposes. Example: <img src="image.jpg" alt="Alternative text">
  6. title attribute: This is used to provide a title or tooltip for an HTML element. Example: <a href="https://www.example.com/" title="Visit Example">Link</a>

We have to discover of website’s html elements. For doing this I am using Google Chrome Browser. Just do a right click and click inspect on your specific target.

When doing web scraping, usually we need href links and class names for reaching necessary data. We will see a good example of this on bottom!

So, Let’s Start Web Scraping!

### Necessary Libraries ####

from urllib.request import urlopen
import requests
from bs4 import BeautifulSoup as bts
import pandas as pd
import re
import time

### Don't forget Keep your browser version up to date!###

def getAndParseURL(url):
result=requests.get(url,headers={"User-Agent":"Chrome/110.0.5481.78"})
soup=bts(result.text,"html.parser")
return soup

### Define iteration for collecting website pages. ###
pages=["https://www.hepsiemlak.com/istanbul-kiralik"]
for page in range(2,466):
pages.append("https://www.hepsiemlak.com/istanbul-kiralik?page=" +str (page))

pages

['https://www.hepsiemlak.com/istanbul-kiralik',
'https://www.hepsiemlak.com/istanbul-kiralik?page=2',
'https://www.hepsiemlak.com/istanbul-kiralik?page=3',
'https://www.hepsiemlak.com/istanbul-kiralik?page=4',
'https://www.hepsiemlak.com/istanbul-kiralik?page=5',
'https://www.hepsiemlak.com/istanbul-kiralik?page=6',
'https://www.hepsiemlak.com/istanbul-kiralik?page=7',
'https://www.hepsiemlak.com/istanbul-kiralik?page=8',
'https://www.hepsiemlak.com/istanbul-kiralik?page=9',
'https://www.hepsiemlak.com/istanbul-kiralik?page=10',
'https://www.hepsiemlak.com/istanbul-kiralik?page=11',
'https://www.hepsiemlak.com/istanbul-kiralik?page=12',
'https://www.hepsiemlak.com/istanbul-kiralik?page=13',
'https://www.hepsiemlak.com/istanbul-kiralik?page=14',
'https://www.hepsiemlak.com/istanbul-kiralik?page=15',
'https://www.hepsiemlak.com/istanbul-kiralik?page=16',
'https://www.hepsiemlak.com/istanbul-kiralik?page=17',
'https://www.hepsiemlak.com/istanbul-kiralik?page=18',
....]

Finding href links from HTML

### Define iteration for collecting all rental flat links ###

links = []
for page in pages:
html = getAndParseURL(page)
for sonuc in html.findAll("a",{"class":"card-link"}):
links.append("https://www.hepsiemlak.com" + sonuc.get("href"))

links

['https://www.hepsiemlak.com/istanbul-sariyer-kumkoy-kiralik/villa/81759-413',
'https://www.hepsiemlak.com/istanbul-besiktas-levazim-kiralik/daire/119257-27',
'https://www.hepsiemlak.com/istanbul-besiktas-levazim-kiralik/daire/119257-33',
'https://www.hepsiemlak.com/istanbul-besiktas-levazim-kiralik/daire/119257-32',
'https://www.hepsiemlak.com/istanbul-besiktas-levazim-kiralik/daire/119257-30',
'https://www.hepsiemlak.com/istanbul-besiktas-levazim-kiralik/daire/119257-20',
'https://www.hepsiemlak.com/istanbul-besiktas-levazim-kiralik/daire/119257-34',
'https://www.hepsiemlak.com/istanbul-uskudar-kucuksu-kiralik/daire/135125-24',
'https://www.hepsiemlak.com/istanbul-kagithane-ortabayir-kiralik/daire/23180-5622',
'https://www.hepsiemlak.com/istanbul-kagithane-ortabayir-kiralik/daire/23180-5623',
'https://www.hepsiemlak.com/istanbul-umraniye-cakmak-kiralik/daire/135686-15',
'https://www.hepsiemlak.com/istanbul-kagithane-ortabayir-kiralik/daire/23180-5621',
'https://www.hepsiemlak.com/istanbul-besiktas-etiler-kiralik/daire/5575-12603',
'https://www.hepsiemlak.com/istanbul-beykoz-acarlar-kiralik/daire/1045-4481',
'https://www.hepsiemlak.com/istanbul-basaksehir-bahcesehir-2-kisim-kiralik/daire/128883-154',
'https://www.hepsiemlak.com/istanbul-besiktas-nisbetiye-kiralik/residence/23180-5620',
'https://www.hepsiemlak.com/istanbul-besiktas-nisbetiye-kiralik/residence/23180-5619',
'https://www.hepsiemlak.com/istanbul-fatih-sehremini-kiralik/daire/114578-1110',
'https://www.hepsiemlak.com/istanbul-fatih-sehremini-kiralik/daire/114578-1097',
'https://www.hepsiemlak.com/istanbul-fatih-sehremini-kiralik/daire/114578-1109',
'https://www.hepsiemlak.com/istanbul-sisli-tesvikiye-kiralik/residence/129096-2',
'https://www.hepsiemlak.com/istanbul-zeytinburnu-gokalp-kiralik/daire/3448-14022',
'https://www.hepsiemlak.com/istanbul-basaksehir-bahcesehir-2-kisim-kiralik/daire/128883-157',
'https://www.hepsiemlak.com/istanbul-fatih-mevlanakapi-kiralik/daire/128930-325',
'https://www.hepsiemlak.com/istanbul-beyoglu-cihangir-kiralik/daire/33012-5989',
'https://www.hepsiemlak.com/istanbul-sisli-halaskargazi-kiralik/daire/15308-811',
'https://www.hepsiemlak.com/istanbul-zeytinburnu-seyitnizam-kiralik/daire/52328-327',
.......................]

Getting Rental Flat Data From HTML

We need all information about rental flat as like Heat Type, Rent, Age of Building, Safety Deposit, Dues, District etc.

First we have to find where are these data!

### Define iteration for collecting all necessary rental flat data ###
result = []
for sonuc in links:
html = getAndParseURL(sonuc)

try:
disctrict = html.find("div", {"class":"det-title-bottom"}).find(text=re.compile("İstanbul")).findNext().text.strip()
except:
disctrict = np.nan
try:
rent= html.find("p",{"class":"fontRB fz24 price"}).text.replace("TL","").replace(".","").strip()
except:
rent = np.nan
try:
net_area = html.find("div", {"class":"container det-container"}).find(text=re.compile("Brüt / Net M2")).findNext().text.replace("m2","").strip()
except:
net_area = np.nan
try:
rooms = html.find("ul", {"class":"short-info-list"}).find(text=re.compile("Daire")).findNext().text.replace(" ","").replace("\n","")
except:
rooms = np.nan
try:
bath = html.find("div", {"class":"container det-container"}).find(text=re.compile("Banyo Sayısı")).findNext().text.strip()
except:
bath = np.nan
try:
number_of_floors = html.find("div", {"class":"container det-container"}).find(text=re.compile("Kat Sayısı")).findNext().text.strip()
except:
number_of_floors = np.nan
try:
floor = html.find("div", {"class":"container det-container"}).find(text=re.compile("Bulunduğu Kat")).findNext().text.strip()
except:
floor = np.nan
try:
heat_type = html.find("div", {"class":"container det-container"}).find(text=re.compile("Yakıt Tipi")).findNext().text.strip()
except:
heat_type = np.nan
try:
age_of_flat = html.find("div", {"class":"container det-container"}).find(text=re.compile("Bina Yaşı")).findNext().text.strip()
except:
age_of_flat = np.nan
try:
dues = html.find("div", {"class":"container det-container"}).find(text=re.compile("Aidat")).findNext().text.replace("TL","").replace(".","").strip()
except:
dues = np.nan
try:
deposit = html.find("div", {"class":"container det-container"}).find(text=re.compile("Depozito")).findNext().text.replace("TL","").replace(".","").strip()
except:
deposit = np.nan
time.sleep(2)

### Concat all the lists in a one list ###
result.append([disctrict,rent,net_area,rooms,bath,number_of_floors,floor,heat_type,age_of_flat,dues,deposit])

### Define a dataframe ###
result.append([disctrict,rent,net_area,rooms,bath,number_of_floors,floor,heat_type,age_of_flat,dues,deposit])

### Save as a csv file ###
df=pd.read_csv("./hepsirent_df.csv")

Exploratory Data Analysis

Alright, now we have a data which contains 11143 Rental Flat In Istanbul. It has many information about Rental Flats. Their Location, Heat Type, Price, Due, Safety Deposit etc.

### Getting rid of spaces in columns names. ###

df.columns = df.columns.str.strip()

### Handling with NaN values. ###

df=df.fillna({"District":"Unknown"})

df=df.fillna({"Heat Type":"Unknown"})
df=df.fillna({"Rooms":"Unknown"})
df=df.fillna({"Bath":"1"})

df=df.fillna({"NOF":"Unknown"})
df=df.fillna({"FLOOR":"Unknown"})

df["Net Area"]=df["Net Area"].fillna(df["Net Area"].mean())
df["Rent"]=df["Rent"].fillna(df["Rent"].mean())

df['Deposit'] = pd.to_numeric(df['Deposit'], errors='coerce')
df['Dues'] = pd.to_numeric(df['Dues'], errors='coerce')

df["Deposit"]=df["Deposit"].fillna(df["Deposit"].mean())
df["Dues"]=df["Dues"].fillna(df["Dues"].mean())

### Filtering Rent,Deposit and Dues Values. ###

df= df.loc[(df["Rent"] >= 5000) & (df["Rent"] <= 80000),:]

df= df.loc[(df["Deposit"] >= 5000) & (df["Deposit"] <= 100000),:]

df= df.loc[(df["Dues"] >= 50) & (df["Dues"] <= 5000),:]
### Grouping Data For Seeing various distributions by District ###

grouped=df.groupby("District")["Rent"].mean().sort_values(ascending=False)
gropued2=df.groupby("District")["Deposit"].mean().sort_values(ascending=False)
groped3=df.groupby("District")["Dues"].mean().sort_values(ascending=False)
### Visualization of "Average Cost Of Rents by Region Istanbul" ###
sns.set_style("whitegrid") # set the style
fig, ax = plt.subplots(figsize=(10,8))
ax = sns.barplot(x=grouped.values, y=grouped.index, palette='Blues_d', orient='h')

ax.set_xlabel('Rent Means', fontsize=14,weight="bold")
ax.set_ylabel('Region', fontsize=14,weight="bold")

plt.xticks(rotation=45, fontsize=14,weight="bold")
plt.yticks(fontsize=12,weight="bold")

plt.xticks(fontsize=12)
ax.xaxis.set_major_formatter('₺{x:,.0f}')

ax.set_title('Average Cost Of Rents by Region Istanbul', fontsize=16,weight="bold")

plt.show;
### For Looking to specific range for Rental Flats ###

top_5 = grouped.sort_values(ascending=False).head(5)
last_5 = grouped.sort_values(ascending=True).head(5)
filtered = grouped[(grouped >= 10000) & (grouped <= 16000)]
sns.set_style("whitegrid") 
fig, ax1 = plt.subplots(figsize=(8,5))
ax1 = sns.barplot(x=top_5.values, y=top_5.index, palette='Blues_d', orient='h')

ax1.set_xlabel('Rent Means', fontsize=14,weight="bold")
ax1.set_ylabel('Region', fontsize=14,weight="bold")

plt.xticks(rotation=45, fontsize=14,weight="bold")
plt.yticks(fontsize=12,weight="bold")

plt.xticks(fontsize=12)
ax1.xaxis.set_major_formatter('₺{x:,.0f}')

ax1.set_title('HIGHEST COST OF RENT BY REGION', fontsize=16,weight="bold")

plt.show;
sns.set_style("whitegrid")
fig, ax2 = plt.subplots(figsize=(8,5))
ax2 = sns.barplot(x=last_5.values, y=last_5.index, palette='Blues_d', orient='h')

ax2.set_xlabel('Rent Means', fontsize=14,weight="bold")
ax2.set_ylabel('Region', fontsize=14,weight="bold")

plt.xticks(rotation=45, fontsize=14,weight="bold")
plt.yticks(fontsize=12,weight="bold")

plt.xticks(fontsize=12)
ax2.xaxis.set_major_formatter('₺{x:,.0f}')

ax2.set_title('LOWEST COST OF RENT BY REGION', fontsize=16,weight="bold")

plt.show;
sns.set_style("whitegrid") 
fig, ax2 = plt.subplots(figsize=(8,5))
ax2 = sns.barplot(x=filtered.values, y=filtered.index, palette='Blues_d', orient='h')

ax2.set_xlabel('Rent Means', fontsize=14,weight="bold")
ax2.set_ylabel('Region', fontsize=14,weight="bold")

plt.xticks(rotation=45, fontsize=14,weight="bold")
plt.yticks(fontsize=12,weight="bold")

plt.xticks(fontsize=12)
ax2.xaxis.set_major_formatter('₺{x:,.0f}')

ax2.set_title('Budget Friendly Regions', fontsize=16,weight="bold")

plt.show;
### Distribution of Average Deposit by Region ###
sns.set_style("whitegrid")
fig, ax = plt.subplots(figsize=(10,8))
ax3 = sns.barplot(x=gropued2.values, y=gropued2.index, palette='Set2', orient='h')

ax3.set_xlabel('Deposit Means', fontsize=14,weight="bold")
ax3.set_ylabel('Region', fontsize=14,weight="bold")

plt.xticks(rotation=45, fontsize=14,weight="bold")
plt.yticks(fontsize=12,weight="bold")

plt.xticks(fontsize=12)
ax3.xaxis.set_major_formatter('₺{x:,.0f}')

ax3.set_title('Average Cost Of DEPOSIT by Region Istanbul', fontsize=16,weight="bold")

plt.show;
### Distribution of Average Dues by Region ###
sns.set_style("whitegrid")
fig, ax = plt.subplots(figsize=(10,8))
ax4 = sns.barplot(x=groped3.values, y=groped3.index, palette='Set1', orient='h')

ax4.set_xlabel('Due Means', fontsize=14,weight="bold")
ax4.set_ylabel('Region', fontsize=14,weight="bold")

plt.xticks(rotation=45, fontsize=14,weight="bold")
plt.yticks(fontsize=12,weight="bold")

plt.xticks(fontsize=12)
ax4.xaxis.set_major_formatter('₺{x:,.0f}')

ax4.set_title('Average Cost Of DUE by Region Istanbul', fontsize=16,weight="bold")

plt.show;
### Distribution of Number of Rooms by Region ###


room_counts = filtered_df.groupby(['District', 'Rooms']).size().unstack(fill_value=0)


room_counts.plot(kind='bar', stacked=True, figsize=(12,8))


plt.title('Count of Rooms by District',weight="bold",fontsize=14)
plt.xlabel('District',weight="bold",fontsize=14)
plt.ylabel('Count',weight="bold",fontsize=14)

plt.xticks(fontsize=12,weight="bold")
plt.yticks(fontsize=12,weight="bold")

plt.legend(title='Rooms', bbox_to_anchor=(1, 1), loc='upper left')


plt.show();
room_counts = df['Rooms'].value_counts()
filtered_room_counts = room_counts[room_counts/len(df) > 0.01]

plt.figure(figsize=(8, 8))
plt.pie(filtered_room_counts, labels=filtered_room_counts.index, autopct='%1.1f%%')


plt.title('Distribution of Istanbul Rental Flat Rooms',weight="bold")

plt.show;

### Average Net Area of Rental Flats ###

### Limit For Rental Flat Net Area ###
df= df.loc[(df["Net Area"] >= 30) & (df["Net Area"] <= 200),:]

district_net_area = df.groupby('District')['Net Area'].mean().sort_values(ascending=False)


plt.figure(figsize=(10,8))
sns.barplot(x=district_net_area.values, y=district_net_area.index, palette='viridis')


plt.xlabel('Average Net Area',weight="bold")
plt.ylabel("District",weight="bold",fontsize=12)
plt.xticks(rotation=45, fontsize=12,weight="bold")
plt.yticks(fontsize=12,weight="bold")


plt.title('Average Net Area by District(107m2 Average For Istanbul)',weight="bold")


plt.show();
### Average Rent by District and Heat Type ###

df_grouped2 = df.groupby(['District', 'Heat Type'])['Rent'].mean().unstack()

ax8 = df_grouped2.plot(kind='bar', stacked=True, figsize=(12, 8), width=0.8)

ax8.set_title('Average Rent by District and Heat Type')
ax8.set_xlabel('District')
ax8.set_ylabel('Average Rent')

ax8.legend(title='Property Type', loc='upper left', bbox_to_anchor=(1.0, 1.0))


plt.show();
### Distribution of Heat Type ###

heat_counts = df['Heat Type'].value_counts()

# filter the value counts to include only values bigger than 2%
filtered_heat_counts = heat_counts[heat_counts/len(df) > 0.02]


plt.figure(figsize=(8, 8))
plt.pie(filtered_heat_counts, labels=filtered_heat_counts.index, autopct='%1.1f%%')

plt.title('Percentage of Heat Types Istanbul Rental Flats ')

plt.show();

IN ESSENCE!

Istanbul is a unique city. It is still centre of Business, Entertainment and Diversity.. Rental Flat Market is going up every single day! Prepare yourself before arrived the Istanbul..

Thank you for your time

--

--