Hasan Sajedi
sajedi
Published in
8 min readSep 30, 2018

--

Data Science project (911 Calls Capstone Project)

My first complete project to solve problem: 911 Calls Capstone Project (Using Python — numpy, pandas, matplotlib and seaborn)

This is first complete project to solve problem of 911 calls Capstone.

When i spent the Online Course “ Python for Data Science and Machine Learning Bootcamp “ , author create a project and i must to solve it at below description:

Main capstone project exist [here]https://www.kaggle.com/mchirico/montcoalert).

Online Course:

[here]https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp/.

To describe columns of file:

  • lat : String variable, Latitude
  • lng: String variable, Longitude
  • desc: String variable, Description of the Emergency Call
  • zip: String variable, Zipcode
  • title: String variable, Title
  • timeStamp: String variable, YYYY-MM-DD HH:MM:SS
  • twp: String variable, Township
  • addr: String variable, Address
  • e: String variable, Dummy variable (always 1)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
df = pd.read_csv('911.csv')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
lat 99492 non-null float64
lng 99492 non-null float64
desc 99492 non-null object
zip 86637 non-null float64
title 99492 non-null object
timeStamp 99492 non-null object
twp 99449 non-null object
addr 98973 non-null object
e 99492 non-null int64
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MB
df.head()# Output
lat lng desc \
0 40.297876 -75.581294 REINDEER CT & DEAD END; NEW HANOVER; Station ...
1 40.258061 -75.264680 BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...
2 40.121182 -75.351975 HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...
3 40.116153 -75.343513 AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...
4 40.251492 -75.603350 CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...
zip title timeStamp twp \
0 19525.0 EMS: BACK PAINS/INJURY 2015-12-10 17:40:00 NEW HANOVER
1 19446.0 EMS: DIABETIC EMERGENCY 2015-12-10 17:40:00 HATFIELD TOWNSHIP
2 19401.0 Fire: GAS-ODOR/LEAK 2015-12-10 17:40:00 NORRISTOWN
3 19401.0 EMS: CARDIAC EMERGENCY 2015-12-10 17:40:01 NORRISTOWN
4 NaN EMS: DIZZINESS 2015-12-10 17:40:01 LOWER POTTSGROVE
addr e
0 REINDEER CT & DEAD END 1
1 BRIAR PATH & WHITEMARSH LN 1
2 HAWS AVE 1
3 AIRY ST & SWEDE ST 1
4 CHERRYWOOD CT & DEAD END 1

The top 5 zipcodes for 911 calls

df['zip'].value_counts().head(5)
# Output
19401.0 6979
19464.0 6643
19403.0 4854
19446.0 4748
19406.0 3174
Name: zip, dtype: int66

The top 5 townshup for 911 calls

df['twp'].value_counts().head(5)# Output
LOWER MERION 8443
ABINGTON 5977
NORRISTOWN 5890
UPPER MERION 5227
CHELTENHAM 4575
Name: twp, dtype: int64

The most common reason for 911 calls

df['title'].nunique()
# Output
110
df['title'].unique()#Output
['EMS: BACK PAINS/INJURY' 'EMS: DIABETIC EMERGENCY' 'Fire: GAS-ODOR/LEAK'
'EMS: CARDIAC EMERGENCY' 'EMS: DIZZINESS' 'EMS: HEAD INJURY'
'EMS: NAUSEA/VOMITING' 'EMS: RESPIRATORY EMERGENCY'
'EMS: SYNCOPAL EPISODE' 'Traffic: VEHICLE ACCIDENT -'
'EMS: VEHICLE ACCIDENT' 'Traffic: DISABLED VEHICLE -'
'Fire: APPLIANCE FIRE' 'EMS: GENERAL WEAKNESS'
'Fire: CARBON MONOXIDE DETECTOR' 'EMS: UNKNOWN MEDICAL EMERGENCY'
'EMS: UNRESPONSIVE SUBJECT' 'Fire: VEHICLE ACCIDENT'
'EMS: ALTERED MENTAL STATUS' 'Fire: FIRE ALARM' 'EMS: CVA/STROKE'
'Traffic: ROAD OBSTRUCTION -' 'EMS: SUBJECT IN PAIN' 'EMS: HEMORRHAGING'
'EMS: FALL VICTIM' 'EMS: ASSAULT VICTIM' 'EMS: SEIZURES'
'EMS: MEDICAL ALERT ALARM' 'EMS: ABDOMINAL PAINS' 'Fire: PUMP DETAIL'
'Fire: FIRE INVESTIGATION' 'EMS: OVERDOSE' 'EMS: MATERNITY'
'EMS: UNCONSCIOUS SUBJECT' 'EMS: CHOKING' 'EMS: LACERATIONS'
'Fire: TRASH/DUMPSTER FIRE' 'Fire: UNKNOWN TYPE FIRE'
'Fire: BUILDING FIRE' 'Fire: ELECTRICAL FIRE OUTSIDE'
'Fire: DEBRIS/FLUIDS ON HIGHWAY' 'Traffic: DEBRIS/FLUIDS ON HIGHWAY -'
'EMS: FEVER' 'EMS: ALLERGIC REACTION' 'Traffic: VEHICLE LEAKING FUEL -'
'EMS: FRACTURE' 'Fire: BURN VICTIM' 'EMS: BURN VICTIM'
'Fire: RESCUE - GENERAL' 'Fire: WOODS/FIELD FIRE' 'EMS: RESCUE - GENERAL'
'Fire: FIRE SPECIAL SERVICE' 'Fire: VEHICLE FIRE'
'Traffic: VEHICLE FIRE -' 'EMS: WARRANT SERVICE'
'Fire: S/B AT HELICOPTER LANDING' 'EMS: EMS SPECIAL SERVICE'
'Traffic: HAZARDOUS ROAD CONDITIONS -' 'Fire: RESCUE - ELEVATOR'
'EMS: FIRE SPECIAL SERVICE' 'EMS: DEHYDRATION'
'EMS: CARBON MONOXIDE DETECTOR' 'EMS: BUILDING FIRE'
'EMS: APPLIANCE FIRE' 'EMS: SHOOTING' 'EMS: POISONING'
'Fire: TRANSFERRED CALL' 'Fire: RESCUE - TECHNICAL'
'EMS: RESCUE - TECHNICAL' 'Fire: VEHICLE LEAKING FUEL' 'EMS: EYE INJURY'
'EMS: ELECTROCUTION' 'EMS: STABBING' 'Fire: FIRE POLICE NEEDED'
'EMS: AMPUTATION' 'EMS: ANIMAL BITE' 'EMS: FIRE ALARM'
'EMS: VEHICLE FIRE' 'EMS: HAZARDOUS MATERIALS INCIDENT'
'EMS: RESCUE - ELEVATOR' 'EMS: FIRE INVESTIGATION'
'Fire: MEDICAL ALERT ALARM' 'EMS: UNKNOWN TYPE FIRE' 'EMS: GAS-ODOR/LEAK'
'Fire: TRAIN CRASH' 'Fire: HAZARDOUS MATERIALS INCIDENT'
'EMS: TRANSFERRED CALL' 'EMS: TRAIN CRASH' 'EMS: RESCUE - WATER'
'EMS: S/B AT HELICOPTER LANDING' 'Fire: UNKNOWN MEDICAL EMERGENCY'
'Fire: RESCUE - WATER' 'EMS: CARDIAC ARREST' 'EMS: PLANE CRASH'
'Fire: PLANE CRASH' 'EMS: WOODS/FIELD FIRE' 'Fire: CARDIAC ARREST'
'Fire: EMS SPECIAL SERVICE' 'Fire: UNCONSCIOUS SUBJECT'
'EMS: HEAT EXHAUSTION' 'EMS: DEBRIS/FLUIDS ON HIGHWAY'
'EMS: ACTIVE SHOOTER' 'EMS: DISABLED VEHICLE' 'Fire: POLICE INFORMATION'
'Fire: DIABETIC EMERGENCY' 'EMS: BOMB DEVICE FOUND'
'Fire: SYNCOPAL EPISODE' 'EMS: INDUSTRIAL ACCIDENT' 'EMS: DROWNING'
'EMS: SUSPICIOUS']

Create column with “Reason” name and set value from the seperate with “title” column value

df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])

Show most common Reason for a 911 call

df['Reason'].value_counts()
# Output
EMS 48877
Traffic 35695
Fire 14920
Name: Reason, dtype: int64

Create 911 calls by Reason graph

sns.countplot(data=df, x='Reason', palette='viridis')
Output for the 911 calls by Reason

Craete number of the 911 calls during a week graph

print(type(df['timeStamp'].iloc[0]))
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
type(df['timeStamp'].iloc[0])
print(df['timeStamp'].iloc[0])
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.day)
print(df['Hour'].iloc[0])
print(df['Month'].iloc[0])
print(df['Day of Week'].iloc[0])

dmap = {0: 'Mon', 1: 'Tue', 2: 'Wed', 3: 'Thu', 4: 'Fri', 5: 'Sat', 6: 'Sun'}
df['Day of Week'] = df['Day of Week'].map(dmap)

'''Now use create a countplot of the Day of Week'''
sns.countplot(data=df, x='Day of Week', hue='Reason')
Output for the 911 calls by Reason during a week

Now do the same for Month and create graph

sns.countplot(x='Month', data=df, hue='Reason', palette='viridis')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
You missed 3months (9,10, and 11)

With the linear graph, decreasing call with 911 at the year after July and can see the peak of calls is at July :

byMonth = df.groupby('Month').count()
byMonth.head()
# Output
lat lng desc zip title timeStamp twp addr e \
Month
1 13205 13205 13205 11527 13205 13205 13203 13096 13205
2 11467 11467 11467 9930 11467 11467 11465 11396 11467
3 11101 11101 11101 9755 11101 11101 11092 11059 11101
4 11326 11326 11326 9895 11326 11326 11323 11283 11326
5 11423 11423 11423 9946 11423 11423 11420 11378 11423
Reason Hour Day of Week
Month
1 13205 13205 2206
2 11467 11467 2396
3 11101 11101 2127
4 11326 11326 2562
5 11423 11423 1963
# Shown with any columnbyMonth['twp'].plot()
Simple plot of the dataframe indicating the count of calls per month.

Create a linear fit on the number of calls per month

Linear regression plot to see its trend and confidence interval
# Create a new column called 'Date'
df['Date'] = df['timeStamp'].apply(lambda time: time.date())
Now groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.

Now recreate this plot but create 3 separate plots with each plot representing a Reason for the 911 call

Traffic
Fire
EMS

The heatmap of days of a week and hours

Using heatmap of seaborn to show data.

After can see at the which time is the heat or cool.

dayHour = df.groupby(by=['Day of Week', 'Hour']).count()['Reason'].unstack()
dayHour.head()
plt.figure(figsize=(12, 6))
sns.heatmap(dayHour, cmap='viridis')
Per hour at the day of week
Per day of week
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('whitegrid')

df = pd.read_csv('911.csv')
df.info()
print(df.head())

# What are the top 5 zipcodes for 911 calls?
df['zip'].value_counts().head(5)

# What are the top 5 townships (twp) for 911 calls?
df['twp'].value_counts().head(5)

# Take a look at the 'title' column, how many unique title codes are there?
print(df['title'].nunique())
print(df['title'].unique())

'''
In the titles column there are "Reasons/Departments" specified before the title code. These are EMS, Fire, and Traffic. Use .apply() with a custom lambda expression to create a new column called "Reason" that contains this string value.
For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS.
'''
df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])

# What is the most common Reason for a 911 call based off of this new column?
print(df['Reason'].value_counts())

# Now use seaborn to create a countplot of 911 calls by Reason.
sns.countplot(data=df, x='Reason', palette='viridis')

print(type(df['timeStamp'].iloc[0]))
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
type(df['timeStamp'].iloc[0])
print(df['timeStamp'].iloc[0])
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.day)
print(df['Hour'].iloc[0])
print(df['Month'].iloc[0])
print(df['Day of Week'].iloc[0])

dmap = {0: 'Mon', 1: 'Tue', 2: 'Wed', 3: 'Thu', 4: 'Fri', 5: 'Sat', 6: 'Sun'}
df['Day of Week'] = df['Day of Week'].map(dmap)

'''
** Now use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column. **
'''
sns.countplot(data=df, x='Day of Week', hue='Reason', palette='viridis')

'''
**Now do the same for Month:**
'''
sns.countplot(x='Month', data=df, hue='Reason', palette='viridis')
# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
# It is missing some months! 9,10, and 11 are not there.

'''
** You should have noticed it was missing some Months,
let's see if we can maybe fill in this information by plotting the information in another way,
possibly a simple line plot that fills in the missing months,
in order to do this, we'll need to do some work with pandas...**

** Now create a gropuby object called byMonth,
where you group the DataFrame by the month column and use the count() method for aggregation.
Use the head() method on this returned DataFrame. **
'''
byMonth = df.groupby('Month').count()
byMonth.head()

'''
** Now create a simple plot off of the dataframe indicating the count of calls per month. **
'''
# Could be any column
byMonth['twp'].plot()

'''
** Now see if you can use seaborn's lmplot() to create a linear fit
on the number of calls per month.
Keep in mind you may need to reset the index to a column. **
'''
sns.lmplot(x='Month', y='twp', data=byMonth.reset_index())

'''
Create a new column called 'Date' that contains the date from the timeStamp column.
You'll need to use apply along with the .date() method.
'''
df['Date'] = df['timeStamp'].apply(lambda t: t.date())

'''
Now groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.
'''
df.groupby('Date').count()['twp'].plot()
plt.tight_layout()

'''
Now recreate this plot but create 3 separate plots with each plot representing a Reason for the 911 call
'''
df[df['Reason'] == 'Traffic'].groupby('Date').count()['twp'].plot()
plt.title('Traffic')
plt.tight_layout()
#
df[df['Reason'] == 'Fire'].groupby('Date').count()['twp'].plot()
plt.title('Fire')
plt.tight_layout()
#
df[df['Reason'] == 'EMS'].groupby('Date').count()['twp'].plot()
plt.title('EMS')
plt.tight_layout()
#
'''
Now let's move on to creating heatmaps with seaborn and our data.
We'll first need to restructure the dataframe so that the columns become the Hours and the
Index becomes the Day of the Week. There are lots of ways to do this,
but I would recommend trying to combine groupby with an
[unstack](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html) method.
Reference the solutions if you get stuck on this!**
'''
dayHour = df.groupby(by=['Day of Week', 'Hour']).count()['Reason'].unstack()
dayHour.head()
plt.figure(figsize=(12, 6))
sns.heatmap(dayHour, cmap='viridis')

sns.clustermap(dayHour, cmap='viridis')
#
dayMonth = df.groupby(by=['Day of Week', 'Month']).count()['Reason'].unstack()
dayMonth.head()
plt.figure(figsize=(12, 6))
sns.heatmap(dayMonth, cmap='viridis')
#
sns.clustermap(dayMonth,cmap='viridis')

plt.show()

Thank’s for:

[Jose Portilla]https://www.udemy.com/user/joseportilla/

[Udemy]https://udemy.com

--

--