Calgary Fire Emergency Response Calls 2010–2018 EDA

Exploring Seaborn library for EDA for Emergency Response System.

Alejandro Coy
Alejandro-DataScience Journey
8 min readJan 8, 2019

--

Today I want to share my exploratory analysis using the seaborn library. This work is based on the capstone project for the Python and machine learning Bootcamp course I’ve been working on. In the course, a generic kaggel dataset was used, but I thought more interesting would be to analyze data from my city.

Project Description:

The city of Calgary has a population of around 1.3 million people, allocation of resources it is critical in the management of the city. In the case of the response of the emergency the fire department, keep a record of daily incidents since 2010 classified by type of incident.

With the steady increase in population would be great to be able to predict the direction of the demand for city emergency response and the number of resources that would need to be invested in this direction.

For this study the incidents reported to the city of the last 8 years were studied. Data was retrieved from the city open data system for incidents demographic data and response time.

The objective of this work? To explore the data visualization library Seaborn and practice the exploratory data methodology. Establish question of interest, gather and clean the data and finally use visualization tools to answer the initial questions and establish new questions.

Questions of interest:

· Are the emergency calls increasing through the years?

· Is there any incident type more common than others?

· Is there a relationship between the time of the year or day of the week for the number of incidents?

· Is there a relationship between incidents and population growth?

Gathering, Importing and Data Cleaning

For this project, the two data sets were obtained from the city database. The data sets were imported using Pandas built function and creating three initial data frames.

The data cleaning consisted of the following steps:

Response call data set

· Sort the incidents by year, month and day.

· Modify the date field since the day and month were inverted. This was needed in order to use pandas datetime function.

· Modify the datatype of the date column to date.

· Using the function weekday () the day column was generated. Then the name of the day was obtained using panda’s map function

Census data set

· Modify the datatype of the year column to datetime and index the data frame with this column.

· Create a new numeric column extracting just the year for the index using .year() property for datetime objects.

Basic data insights

Now that we have a tidy data set, we could start to answer basic questions for the data. For example, using describe () for the ‘Major Incident Type’, we could find that we have 8 unique types of incidents with the most frequent being False Alarm.

However, we have to be careful since the most frequent incident doesn’t have the most incident counts.

The next two questions would know the total incident counts and the average of incidents per day.

Total Incidents

501801

Average Incidents per Day

154.305

Incidents Through the Years

The first step it is to plot the total incidents through the years in order to identify any trend or find out any possible outliers.

From the plot, we can see one clear outlier and another three or four peaks which indicates more incidents. To identify these points, we filter the dates with incident counts higher than 400.

We could match all the dates with significant events that occurred in the last eight year in the city.However, the 09/10/2014 seems quite high so we studied in more detail.

Although the numbers are significantly higher than reported in similar storms, the data seems to correspond with the snow storm event having 1252 incident counts just for Hazardous conditions and 698 severe weather incidents. This just indicates that there is very unlikely that there is a mistake in the data source.

After these result, I was curious to explore how the latest big snowstorm I could recall compare with the previous storms. On October 2nd, 2018, the city of Calgary had a “record-breaking” October snowfall with more than 32 cm of snow for a single day.

However, in terms of emergency calls, it wasn’t as dramatic as past storms. In total 221 incidents were reported which put this day in the 65th position in terms of incident counts in the last 8 years. When is compared with the rest of the month the incidents of the day of the storm were just slightly higher.

Type of Incidents

The most common type of incident is by far Medical/Rescue accounting more than 51 % of incidents. A far second place with 15.3 % was the false alarm incident which was a surprise for me, imagine how much resources could be saved in that number of incident could be reduced?

We could use another visualization to analyze the number of events by type for each individual year .

Or we could plot each type of event individually through all the 8 years.

The first visualization presents a lot of information per plot and it is difficult to find any trend except by the outliers. On the contrary the second visualization is more clear and allow us to identify the growing trend in medical/rescue incidents while the other incident types seems the same through the years.

Distribution of Incidents by Month and Day of the Week

For allocation of the resources would be the great interest if there are any relationships between the day of the week or month in the year were the incidents were more common. For this purpose, we will be using different visualization tool from the seaborne library, so let’s start:

Using a simple barplot and one line of code per graph, we could plot incidents per month and day for the last 8 years, including standard deviations. What a powerful tool seaborne is!!

From these two plots, can be inferred that Fridays and summer months is when more incidents are occurring in Calgary. Although Friday result would be expected, I was expecting that winter months would be the busiest due to the fridge temperatures and the snow that Calgary is known for.

Another interesting visualization tool that could be used for establishing this kind of relationships is the heat-maps. The technical definition is “a graphical representation of data where the individual values contained in a matrix are represented as colors.’ https://www.r-graph-gallery.com/heatmap/. The only trick for using this representation is the data frame has to be arranged in a matrix form. For example:

Once the information is a matrix from the implementation in seaborne is again one line of code. For this study, will establish the relationship between year and day of the week.

In the heat map, the lighter color is a reference to a high number of incidents.

The plot shows an increase of incident over the years. As well, confirms that the most incidents are reported in the summer months.

The second heat map is a confirmation from the bar plots above, where Friday and summer are the time where more incidents are reported. The big advantage of heatmaps is that with one graph the association between these three variables is very clear.

City growth and the relationship with the Emergency Calls.

As we could see from our previous analysis, the Response calls were increasing through the years. The first possible cause that comes to mind is that Calgary is a relatively young city with continuous population growth.

Now let’s plot in in the same graph population growth and the count of incidents.

The plot shows a positive relationship with the number of inhabitants of the city. As the city grows it is fair to say that the number of incidents will be increased. But by how much?

Linear Regression

Linear regresion is the simplest tool for studying the realtionship between two variables. It is not the intention of this project to dig deep in the topic since I will be working in a project exclusively around it. However, Seaborn allow us to create a linera regression plot with 95% interval confidence.

The positive relationship between the resident and number of incident it is more evident. We could calcualte the regressiion parameter using one of multiple metohds such ordinary least squares(OLS), but thi will be topic of the next topic.

With this we will wrap this EDA project.

Thanks for reading.

• All the code of the project can be found in here

• If you have any questions or are interesting in any other analysis leave me a comment or send me an email: acoydata@gmail.com

--

--