Published in

sajedi

8 min readSep 30, 2018

Data Science project (911 Calls Capstone Project)

My first complete project to solve problem: 911 Calls Capstone Project (Using Python — numpy, pandas, matplotlib and seaborn)

This is first complete project to solve problem of 911 calls Capstone.

When i spent the Online Course “ Python for Data Science and Machine Learning Bootcamp “ , author create a project and i must to solve it at below description:

Main capstone project exist [here]https://www.kaggle.com/mchirico/montcoalert).

Online Course:

[here]https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp/.

To describe columns of file:

lat : String variable, Latitude
lng: String variable, Longitude
desc: String variable, Description of the Emergency Call
zip: String variable, Zipcode
title: String variable, Title
timeStamp: String variable, YYYY-MM-DD HH:MM:SS
twp: String variable, Township
addr: String variable, Address
e: String variable, Dummy variable (always 1)

import numpy as np
import pandas as pdimport matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')df = pd.read_csv('911.csv')
df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
lat          99492 non-null float64
lng          99492 non-null float64
desc         99492 non-null object
zip          86637 non-null float64
title        99492 non-null object
timeStamp    99492 non-null object
twp          99449 non-null object
addr         98973 non-null object
e            99492 non-null int64
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MBdf.head()# Output
         lat        lng                                               desc  \
0  40.297876 -75.581294  REINDEER CT & DEAD END;  NEW HANOVER; Station ...   
1  40.258061 -75.264680  BRIAR PATH & WHITEMARSH LN;  HATFIELD TOWNSHIP...   
2  40.121182 -75.351975  HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...   
3  40.116153 -75.343513  AIRY ST & SWEDE ST;  NORRISTOWN; Station 308A;...   
4  40.251492 -75.603350  CHERRYWOOD CT & DEAD END;  LOWER POTTSGROVE; S...zip                    title            timeStamp                twp  \
0  19525.0   EMS: BACK PAINS/INJURY  2015-12-10 17:40:00        NEW HANOVER   
1  19446.0  EMS: DIABETIC EMERGENCY  2015-12-10 17:40:00  HATFIELD TOWNSHIP   
2  19401.0      Fire: GAS-ODOR/LEAK  2015-12-10 17:40:00         NORRISTOWN   
3  19401.0   EMS: CARDIAC EMERGENCY  2015-12-10 17:40:01         NORRISTOWN   
4      NaN           EMS: DIZZINESS  2015-12-10 17:40:01   LOWER POTTSGROVEaddr  e  
0      REINDEER CT & DEAD END  1  
1  BRIAR PATH & WHITEMARSH LN  1  
2                    HAWS AVE  1  
3          AIRY ST & SWEDE ST  1  
4    CHERRYWOOD CT & DEAD END  1

The top 5 zipcodes for 911 calls

df['zip'].value_counts().head(5)
# Output
19401.0    6979
19464.0    6643
19403.0    4854
19446.0    4748
19406.0    3174
Name: zip, dtype: int66

The top 5 townshup for 911 calls

df['twp'].value_counts().head(5)# Output
LOWER MERION    8443
ABINGTON        5977
NORRISTOWN      5890
UPPER MERION    5227
CHELTENHAM      4575
Name: twp, dtype: int64

The most common reason for 911 calls

df['title'].nunique()
# Output
110df['title'].unique()#Output
['EMS: BACK PAINS/INJURY' 'EMS: DIABETIC EMERGENCY' 'Fire: GAS-ODOR/LEAK'
 'EMS: CARDIAC EMERGENCY' 'EMS: DIZZINESS' 'EMS: HEAD INJURY'
 'EMS: NAUSEA/VOMITING' 'EMS: RESPIRATORY EMERGENCY'
 'EMS: SYNCOPAL EPISODE' 'Traffic: VEHICLE ACCIDENT -'
 'EMS: VEHICLE ACCIDENT' 'Traffic: DISABLED VEHICLE -'
 'Fire: APPLIANCE FIRE' 'EMS: GENERAL WEAKNESS'
 'Fire: CARBON MONOXIDE DETECTOR' 'EMS: UNKNOWN MEDICAL EMERGENCY'
 'EMS: UNRESPONSIVE SUBJECT' 'Fire: VEHICLE ACCIDENT'
 'EMS: ALTERED MENTAL STATUS' 'Fire: FIRE ALARM' 'EMS: CVA/STROKE'
 'Traffic: ROAD OBSTRUCTION -' 'EMS: SUBJECT IN PAIN' 'EMS: HEMORRHAGING'
 'EMS: FALL VICTIM' 'EMS: ASSAULT VICTIM' 'EMS: SEIZURES'
 'EMS: MEDICAL ALERT ALARM' 'EMS: ABDOMINAL PAINS' 'Fire: PUMP DETAIL'
 'Fire: FIRE INVESTIGATION' 'EMS: OVERDOSE' 'EMS: MATERNITY'
 'EMS: UNCONSCIOUS SUBJECT' 'EMS: CHOKING' 'EMS: LACERATIONS'
 'Fire: TRASH/DUMPSTER FIRE' 'Fire: UNKNOWN TYPE FIRE'
 'Fire: BUILDING FIRE' 'Fire: ELECTRICAL FIRE OUTSIDE'
 'Fire: DEBRIS/FLUIDS ON HIGHWAY' 'Traffic: DEBRIS/FLUIDS ON HIGHWAY -'
 'EMS: FEVER' 'EMS: ALLERGIC REACTION' 'Traffic: VEHICLE LEAKING FUEL -'
 'EMS: FRACTURE' 'Fire: BURN VICTIM' 'EMS: BURN VICTIM'
 'Fire: RESCUE - GENERAL' 'Fire: WOODS/FIELD FIRE' 'EMS: RESCUE - GENERAL'
 'Fire: FIRE SPECIAL SERVICE' 'Fire: VEHICLE FIRE'
 'Traffic: VEHICLE FIRE -' 'EMS: WARRANT SERVICE'
 'Fire: S/B AT HELICOPTER LANDING' 'EMS: EMS SPECIAL SERVICE'
 'Traffic: HAZARDOUS ROAD CONDITIONS -' 'Fire: RESCUE - ELEVATOR'
 'EMS: FIRE SPECIAL SERVICE' 'EMS: DEHYDRATION'
 'EMS: CARBON MONOXIDE DETECTOR' 'EMS: BUILDING FIRE'
 'EMS: APPLIANCE FIRE' 'EMS: SHOOTING' 'EMS: POISONING'
 'Fire: TRANSFERRED CALL' 'Fire: RESCUE - TECHNICAL'
 'EMS: RESCUE - TECHNICAL' 'Fire: VEHICLE LEAKING FUEL' 'EMS: EYE INJURY'
 'EMS: ELECTROCUTION' 'EMS: STABBING' 'Fire: FIRE POLICE NEEDED'
 'EMS: AMPUTATION' 'EMS: ANIMAL BITE' 'EMS: FIRE ALARM'
 'EMS: VEHICLE FIRE' 'EMS: HAZARDOUS MATERIALS INCIDENT'
 'EMS: RESCUE - ELEVATOR' 'EMS: FIRE INVESTIGATION'
 'Fire: MEDICAL ALERT ALARM' 'EMS: UNKNOWN TYPE FIRE' 'EMS: GAS-ODOR/LEAK'
 'Fire: TRAIN CRASH' 'Fire: HAZARDOUS MATERIALS INCIDENT'
 'EMS: TRANSFERRED CALL' 'EMS: TRAIN CRASH' 'EMS: RESCUE - WATER'
 'EMS: S/B AT HELICOPTER LANDING' 'Fire: UNKNOWN MEDICAL EMERGENCY'
 'Fire: RESCUE - WATER' 'EMS: CARDIAC ARREST' 'EMS: PLANE CRASH'
 'Fire: PLANE CRASH' 'EMS: WOODS/FIELD FIRE' 'Fire: CARDIAC ARREST'
 'Fire: EMS SPECIAL SERVICE' 'Fire: UNCONSCIOUS SUBJECT'
 'EMS: HEAT EXHAUSTION' 'EMS: DEBRIS/FLUIDS ON HIGHWAY'
 'EMS: ACTIVE SHOOTER' 'EMS: DISABLED VEHICLE' 'Fire: POLICE INFORMATION'
 'Fire: DIABETIC EMERGENCY' 'EMS: BOMB DEVICE FOUND'
 'Fire: SYNCOPAL EPISODE' 'EMS: INDUSTRIAL ACCIDENT' 'EMS: DROWNING'
 'EMS: SUSPICIOUS']

Create column with “Reason” name and set value from the seperate with “title” column value

df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])

Show most common Reason for a 911 call

df['Reason'].value_counts()
# Output
EMS        48877
Traffic    35695
Fire       14920
Name: Reason, dtype: int64

Create 911 calls by Reason graph

sns.countplot(data=df, x='Reason', palette='viridis')

Craete number of the 911 calls during a week graph

print(type(df['timeStamp'].iloc[0]))
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
type(df['timeStamp'].iloc[0])
print(df['timeStamp'].iloc[0])
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.day)
print(df['Hour'].iloc[0])
print(df['Month'].iloc[0])
print(df['Day of Week'].iloc[0])

dmap = {0: 'Mon', 1: 'Tue', 2: 'Wed', 3: 'Thu', 4: 'Fri', 5: 'Sat', 6: 'Sun'}
df['Day of Week'] = df['Day of Week'].map(dmap)

'''Now use create a countplot of the Day of Week'''
sns.countplot(data=df, x='Day of Week', hue='Reason')

Output for the *911 calls by Reason during a week*

Now do the same for Month and create graph

sns.countplot(x='Month', data=df, hue='Reason', palette='viridis')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

With the linear graph, decreasing call with 911 at the year after July and can see the peak of calls is at July :

byMonth = df.groupby('Month').count()
byMonth.head()# Output
         lat    lng   desc    zip  title  timeStamp    twp   addr      e  \
Month                                                                      
1      13205  13205  13205  11527  13205      13205  13203  13096  13205   
2      11467  11467  11467   9930  11467      11467  11465  11396  11467   
3      11101  11101  11101   9755  11101      11101  11092  11059  11101   
4      11326  11326  11326   9895  11326      11326  11323  11283  11326   
5      11423  11423  11423   9946  11423      11423  11420  11378  11423Reason   Hour  Day of Week  
Month                              
1       13205  13205         2206  
2       11467  11467         2396  
3       11101  11101         2127  
4       11326  11326         2562  
5       11423  11423         1963# Shown with any columnbyMonth['twp'].plot()

Simple plot of the dataframe indicating the count of calls per month.

Create a linear fit on the number of calls per month

Linear regression plot to see its trend and confidence interval

# Create a new column called 'Date'
df['Date'] = df['timeStamp'].apply(lambda time: time.date())

Now groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.

Now recreate this plot but create 3 separate plots with each plot representing a Reason for the 911 call

The heatmap of days of a week and hours

Using heatmap of seaborn to show data.

After can see at the which time is the heat or cool.

dayHour = df.groupby(by=['Day of Week', 'Hour']).count()['Reason'].unstack()
dayHour.head()
plt.figure(figsize=(12, 6))
sns.heatmap(dayHour, cmap='viridis')

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('whitegrid')

df = pd.read_csv('911.csv')
df.info()
print(df.head())

# What are the top 5 zipcodes for 911 calls?
df['zip'].value_counts().head(5)

# What are the top 5 townships (twp) for 911 calls?
df['twp'].value_counts().head(5)

# Take a look at the 'title' column, how many unique title codes are there?
print(df['title'].nunique())
print(df['title'].unique())

'''
In the titles column there are "Reasons/Departments" specified before the title code. These are EMS, Fire, and Traffic. Use .apply() with a custom lambda expression to create a new column called "Reason" that contains this string value. 
For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS.
'''
df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])

# What is the most common Reason for a 911 call based off of this new column?
print(df['Reason'].value_counts())

# Now use seaborn to create a countplot of 911 calls by Reason.
sns.countplot(data=df, x='Reason', palette='viridis')

print(type(df['timeStamp'].iloc[0]))
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
type(df['timeStamp'].iloc[0])
print(df['timeStamp'].iloc[0])
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.day)
print(df['Hour'].iloc[0])
print(df['Month'].iloc[0])
print(df['Day of Week'].iloc[0])

dmap = {0: 'Mon', 1: 'Tue', 2: 'Wed', 3: 'Thu', 4: 'Fri', 5: 'Sat', 6: 'Sun'}
df['Day of Week'] = df['Day of Week'].map(dmap)

'''
** Now use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column. **
'''
sns.countplot(data=df, x='Day of Week', hue='Reason', palette='viridis')

'''
**Now do the same for Month:**
'''
sns.countplot(x='Month', data=df, hue='Reason', palette='viridis')
# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
# It is missing some months! 9,10, and 11 are not there.

'''
** You should have noticed it was missing some Months, 
let's see if we can maybe fill in this information by plotting the information in another way, 
possibly a simple line plot that fills in the missing months, 
in order to do this, we'll need to do some work with pandas...**

** Now create a gropuby object called byMonth, 
where you group the DataFrame by the month column and use the count() method for aggregation. 
Use the head() method on this returned DataFrame. **
'''
byMonth = df.groupby('Month').count()
byMonth.head()

'''
** Now create a simple plot off of the dataframe indicating the count of calls per month. **
'''
# Could be any column
byMonth['twp'].plot()

'''
** Now see if you can use seaborn's lmplot() to create a linear fit 
on the number of calls per month. 
Keep in mind you may need to reset the index to a column. **
'''
sns.lmplot(x='Month', y='twp', data=byMonth.reset_index())

'''
Create a new column called 'Date' that contains the date from the timeStamp column. 
You'll need to use apply along with the .date() method. 
'''
df['Date'] = df['timeStamp'].apply(lambda t: t.date())

'''
Now groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.
'''
df.groupby('Date').count()['twp'].plot()
plt.tight_layout()

'''
Now recreate this plot but create 3 separate plots with each plot representing a Reason for the 911 call
'''
df[df['Reason'] == 'Traffic'].groupby('Date').count()['twp'].plot()
plt.title('Traffic')
plt.tight_layout()
# 
df[df['Reason'] == 'Fire'].groupby('Date').count()['twp'].plot()
plt.title('Fire')
plt.tight_layout()
# 
df[df['Reason'] == 'EMS'].groupby('Date').count()['twp'].plot()
plt.title('EMS')
plt.tight_layout()
# 
'''
Now let's move on to creating  heatmaps with seaborn and our data. 
We'll first need to restructure the dataframe so that the columns become the Hours and the 
Index becomes the Day of the Week. There are lots of ways to do this, 
but I would recommend trying to combine groupby with an 
[unstack](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html) method. 
Reference the solutions if you get stuck on this!**
'''
dayHour = df.groupby(by=['Day of Week', 'Hour']).count()['Reason'].unstack()
dayHour.head()
plt.figure(figsize=(12, 6))
sns.heatmap(dayHour, cmap='viridis')

sns.clustermap(dayHour, cmap='viridis')
#
dayMonth = df.groupby(by=['Day of Week', 'Month']).count()['Reason'].unstack()
dayMonth.head()
plt.figure(figsize=(12, 6))
sns.heatmap(dayMonth, cmap='viridis')
#
sns.clustermap(dayMonth,cmap='viridis')

plt.show()

Thank’s for:

[Jose Portilla]https://www.udemy.com/user/joseportilla/

[Udemy]https://udemy.com