Exploratory Data Analysis of Metro Istanbul

Hikmet Emre Guler
The Modern Scientist
7 min readJan 23, 2023
image source =”https://data.ibb.gov.tr/uploads/group/2022-04-04-072550.4874577660eceb-7c95-4790-9369-e78e6559bf37.JPG

Perhaps, you didn’t hear yet but Istanbul Municipality has a Dataset platform! It has many datasets in very different categories.

Some of the categories ;
- Mobility
- Energy
- Economy
- Transportation
- Security
- Environment

Good news is it’s a open source database!

Metro Istanbul Ridership analysis by Stations and Months

Mostly datasets are storage as csv or xlsx files. For this project my data source was a PDF file. So, my first duty is converting PDF file to Pandas DataFrame.

The link = “https://www.metro.istanbul/yolcuhizmetleri/yolcuistatistikleri

Importing necessary Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import FuncFormatter
from matplotlib.font_manager import FontProperties
import dataframe_image as dfi
import tabula as tb

Converting PDF File to Pandas DataFrame

The df_1 “output” is on the below. With the help of Tabula Library we took the tables inside of PDF file and stored as List. Every element of this list equals to one year’s table.

pdf_path="./metro_table.pdf"

df_1=tb.read_pdf(pdf_path,pages="all")

df_1

[ Yıl Hatlarımız Ocak Şubat Mart Nisan Mayıs \
0 NaN NaN 4.878.475 5.278.766 7.409.209 5.529.247 3.927.121
1 NaN NaN 4.283.353 4.709.625 6.910.277 4.825.054 3.271.264
2 NaN NaN 1.327.764 1.442.069 1.827.432 1.279.304 678.106
3 NaN NaN 2.148.173 2.421.369 3.611.627 2.528.357 1.740.839
4 NaN NaN 2.171.272 2.457.110 3.715.022 2.581.446 1.807.516
5 NaN NaN 94.862 102.151 180.735 106.835 54.966
6 NaN NaN 1.578.726 1.758.439 2.547.936 1.507.433 919.143
7 2021.0 NaN NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN 11.644
9 NaN NaN 4.898.401 5.376.885 7.827.141 5.987.374 4.247.853
10 NaN NaN 3.623 18.324 37.258 24.878 13.448
11 NaN NaN 2.296.862 2.476.467 3.503.290 2.708.009 2.048.770
12 NaN NaN 132.549 110.824 200.118 169.323 135.599
13 NaN NaN 66.356 71.644 127.933 89.303 52.920
14 NaN NaN 4.955 5.992 9.606 7.398 5.190
15 NaN NaN 16.321 17.553 40.061 22.708 15.691

Haziran Temmuz Ağustos Eylül Ekim Kasım \
0 7.509.615 8.456.815 8.530.566 9.426.612 9.561.601 10.911.200
1 7.182.204 7.578.449 8.304.300 9.819.379 10.866.680 12.767.272
2 1.097.854 1.061.883 1.156.997 1.309.948 1.257.136 1.451.721
3 3.754.903 3.843.918 4.178.818 5.034.415 5.437.079 6.280.909
4 3.803.206 3.926.289 4.107.134 5.146.567 5.342.762 6.298.792
5 165.530 165.568 206.295 294.375 365.426 420.825
6 1.576.932 1.514.550 1.629.739 1.976.646 2.035.352 2.411.453
7 NaN NaN NaN NaN NaN NaN
8 263.303 289.447 301.068 336.003 351.515 416.937
9 8.452.460 9.567.566 9.747.200 10.441.056 10.483.639 11.476.966
10 39.306 52.485 55.204 62.813 61.986 70.178
11 3.566.329 3.613.829 3.641.133 4.260.338 4.237.688 4.781.522
12 291.585 361.027 308.782 421.573 486.151 559.712
13 182.511 320.287 133.807 313.474 388.150 415.225
14 10.217 15.426 15.555 18.105 30.257 34.365
15 62.917 137.842 135.469 119.625 111.114 109.297

Aralık Toplam
0 10.975.974 92.395.201
1 13.153.933 93.671.790
2 1.751.001 15.641.215
3 6.620.437 47.600.844
4 6.714.625 48.071.741
5 448.067 2.605.635
6 3.085.951 22.542.300
7 NaN NaN
8 445.166 2.415.083
9 11.113.283 99.619.824
10 75.599 515.102
11 5.024.962 42.159.199
12 558.454 3.735.697
13 339.447 2.501.057
14 29.538 186.604
15 88.409 877.007 ,
Yıl Hatlarımız Ocak Şubat Mart Nisan \
0 NaN NaN 12.150.118 12.083.527 7.887.978 1.441.190
1 NaN NaN 14.374.039 13.918.491 7.975.273 847.118
2 NaN NaN 2.097.934 2.056.329 1.495.111 417.754
3 NaN NaN 7.556.741 7.261.688 4.204.623 492.611
4 NaN NaN 6.249.010 6.103.077 3.762.304 500.965
5 NaN NaN 507.327 590.195 308.772 21.766
6 2020.0 NaN NaN NaN NaN NaN
7 NaN NaN 11.417.759 11.008.299 6.958.495 1.141.567
8 NaN NaN 103.115 106.267 58.753 436
9 NaN NaN 5.117.675 4.991.473 3.362.238 759.692
10 NaN NaN 465.987 399.411 197.537 1.526
11 NaN NaN 19.186 28.298 18.104 1
12 NaN NaN 113.991 102.192 57.473 7

Mayıs Haziran Temmuz Ağustos Eylül Ekim \
0 1.833.621 5.819.072 7.452.001 7.611.430 7.378.377 7.856.622
1 1.112.014 4.094.515 5.381.247 5.734.316 6.152.757 6.875.580
2 453.410 1.203.306 1.427.459 1.413.647 1.503.893 1.595.124
3 625.126 2.324.910 3.111.000 3.297.862 3.399.425 3.787.061
4 652.340 2.378.715 3.193.923 3.374.478 3.446.433 3.852.124
5 24.158 103.552 135.061 145.649 160.868 181.850
6 NaN NaN NaN NaN NaN 278.321
7 1.698.580 5.152.649 6.765.495 7.533.423 7.423.260 8.284.385
8 NaN NaN 29.155 41.155 42.800 48.479
9 924.179 2.558.486 3.163.192 3.244.762 3.266.270 3.551.321
10 6.401 64.413 119.069 160.274 162.217 195.848
11 4 1 NaN 8.133 11.053 12.036
12 3 1 298 61.954 69.386 52.994
.
.
.
.
.

Defining and Looking into to our Data Frames.

df_2021=df_1[0]
df_2020=df_1[1]
df_2019=df_1[2]
df_2018=df_1[3]

df_2021

Modifying Dataframe

We have one unnecessary column and one miss column. The “Yıl” column is describe for Year we already know it is 2021 table, we can get rid of it.
The “Hatlarımız” column is describe for “Stations” but it has only NaN values.

We have another problem which is all the value types are “object”. We must convert them to “integer” or “float” for doing mathematical operations. So we need to get rid of special characters in numbers.

Right, we have to go back to PDF File and create manually a column which is “Stations”.

del df_2021["Yıl"]
del df_2021["Hatlarımız"]

df_2021.rename(columns={'Ocak': 'January', 'Şubat': 'February',"Mart":"March","Nisan":"April",
"Mayıs":"May","Haziran":"June","Temmuz":"July","Ağustos":"August","Eylül":"September",
"Ekim":"October","Kasım":"November","Aralık":"December","Toplam":"Total Rider"}, inplace=True)

df_2021.replace(np.nan,0,inplace=True)
df_2021.drop(index=df_2021.index[7],axis=0,inplace=True)
df_2021 = df_2021.replace(to_replace='[.,"]', value='', regex=True)

stations_2021=["M1","M2","M3","M4","M5","M6","M7","M9","T1","T3","T4","T5","F1","TF1","TF2"]
df_2021["Stations"]=stations_2021

Grouping By Stations

grouped=df_2021.groupby(["Stations"]).sum("Total Rider")
.sort_values("Total Rider",ascending=False)

display(grouped)

For Showing Distribution Ridership Number by Months

df_2021_months=df_2021.drop(columns=["Total Rider","Stations"])

months_sum=df_2021_months.sum().sort_values()

df_2021_months

Data visualization of “Ridership Distribution of Metro Istanbul by Stations For Each Year”

First, we are defining a function for make our values more readable!

def human_format(num, pos):
magnitude = 0
while abs(num) >= 1000:
magnitude += 1
num /= 1000.0

return '%.0f%s' % (num, ['', 'K', 'M', 'G', 'T', 'P'][magnitude])
font = FontProperties()
font.set_weight('bold')
ax.set_xticklabels(ax.get_xticklabels(), fontproperties=font)

### Ridership 2021###
ax=grouped.plot(kind="bar", y="Total Rider",ylabel="Total Ridership Number of Istanbul Metro 2021",figsize=(12,7),color="Orange")
ax.yaxis.set_major_formatter(FuncFormatter(human_format))
ax.set_xticklabels(ax.get_xticklabels(), fontproperties=font)
ax.set_yticklabels(ax.get_yticklabels(), fontproperties=font)
plt.xticks(rotation=45)
plt.title("Ridership distribution of Istanbul Metro by stations ",weight="bold");

### Ridership 2020###
ax3=grouped2.plot(kind="bar", y="Total Rider",ylabel="Total Ridership Number of Istanbul Metro 2020",figsize=(12,7),color="Red")
ax3.yaxis.set_major_formatter(FuncFormatter(human_format))
ax3.set_xticklabels(ax3.get_xticklabels(), fontproperties=font)
ax3.set_yticklabels(ax3.get_yticklabels(), fontproperties=font)
plt.xticks(rotation=45)
plt.title("Ridership distribution of Istanbul Metro by stations ",weight="bold");

### Ridership 2019###
ax5=grouped3.plot(kind="bar", y="Total Rider",ylabel="Total Ridership Number of Istanbul Metro 2019",figsize=(12,7),color="Cyan")
ax5.yaxis.set_major_formatter(FuncFormatter(human_format))
ax5.set_xticklabels(ax5.get_xticklabels(), fontproperties=font)
ax5.set_yticklabels(ax5.get_yticklabels(), fontproperties=font)
plt.xticks(rotation=45)
plt.title("Ridership distribution of Istanbul Metro by stations ",weight="bold");

### Ridership 2018###
ax7=grouped4.plot(kind="bar", y="Total Rider",ylabel="Total Ridership Number of Istanbul Metro 2018",figsize=(12,7),color="Purple")
ax7.yaxis.set_major_formatter(FuncFormatter(human_format))
ax7.set_xticklabels(ax7.get_xticklabels(), fontproperties=font)
ax7.set_yticklabels(ax7.get_yticklabels(), fontproperties=font)
plt.xticks(rotation=45)
plt.title("Ridership distribution of Istanbul Metro by stations ",weight="bold");

Data visualization of “Ridership Distribution of Metro Istanbul by Months For Each Year”

### Months 2021###
months_sum=df_2021_months.sum().sort_values()
months_sum.plot(kind='bar', xlabel='Months of 2021', ylabel='Total Ridership Number of Istanbul Metro 2021')

ax2=months_sum.plot(kind='bar', xlabel='Months of 2021', ylabel='Total Ridership Number of Istanbul Metro 2021')
ax2.yaxis.set_major_formatter(FuncFormatter(human_format))
ax2.set_xticklabels(ax2.get_xticklabels(), fontproperties=font)
ax2.set_yticklabels(ax2.get_yticklabels(), fontproperties=font)
plt.xticks(rotation=45)
plt.title("Ridership distribution of Istanbul Metro by Months ",weight="bold");
plt.show()

### Months 2020###
months_sum2=df_2020_months.sum().sort_values()
months_sum2.plot(kind='bar', xlabel='Months of 2020', ylabel='Total Ridership Number of Istanbul Metro 2020')

ax4=months_sum2.plot(kind='bar', xlabel='Months of 2020', ylabel='Total Ridership Number of Istanbul Metro 2020')
ax4.yaxis.set_major_formatter(FuncFormatter(human_format))
ax4.set_xticklabels(ax4.get_xticklabels(), fontproperties=font)
ax4.set_yticklabels(ax4.get_yticklabels(), fontproperties=font)
plt.xticks(rotation=45)
plt.title("Ridership distribution of Istanbul Metro by Months ",weight="bold");
plt.show()

### Months 2019###
months_sum3=df_2019_months.sum().sort_values()
months_sum3.plot(kind='bar', xlabel='Months of 2019', ylabel='Total Ridership Number of Istanbul Metro 2019')

ax6=months_sum3.plot(kind='bar', xlabel='Months of 2019', ylabel='Total Ridership Number of Istanbul Metro 2019')
ax6.yaxis.set_major_formatter(FuncFormatter(human_format))
ax6.set_xticklabels(ax6.get_xticklabels(), fontproperties=font)
ax6.set_yticklabels(ax6.get_yticklabels(), fontproperties=font)
plt.xticks(rotation=45)
plt.title("Ridership distribution of Istanbul Metro by Months ",weight="bold")
plt.show();

### Months 2018###
months_sum4=df_2018_months.sum().sort_values()
months_sum4.plot(kind='bar', xlabel='Months of 2018', ylabel='Total Ridership Number of Istanbul Metro 2018')

ax8=months_sum4.plot(kind='bar', xlabel='Months of 2018', ylabel='Total Ridership Number of Istanbul Metro 2018')
ax8.yaxis.set_major_formatter(FuncFormatter(human_format))
ax8.set_xticklabels(ax8.get_xticklabels(), fontproperties=font)
ax8.set_yticklabels(ax8.get_yticklabels(), fontproperties=font)
plt.xticks(rotation=45)
plt.title("Ridership distribution of Istanbul Metro by Months ",weight="bold")
plt.show();

In Conclusion

The Busiest Metro Lines of Metro Istanbul

  • M1
  • M2
  • T1

Top Busiest Lines are on the European Side!

“Yenikapı” is a Transfer Centre and and Busiest Place!

The Busiest Metro Months of Metro Istanbul
(Except for pandemic restrictions)

  • October
  • December
  • November

Pandemic effect on Ridership Number of Metro Istanbul

  • Pre-Pandemic term Number of Ridership is likely 2 times more than Pandemic term.
  • In 2021 Number of Ridership is still approximately 35% less than the pre-Pandemic period.
  • Except for pandemic restrictions, top busiest months are same.

Thank you for having time!

--

--