Exploring Toronto cleared building permits data

A visual approach to data analysis

Carlos Hernandez
Open Data Toronto
Published in
9 min readJan 22, 2020

--

This is an oldie — first published in the Toronto Open Data portal on 28/08/2017 before we decided to get on Medium. This was the prototype “data story” concept we have been working on at Open Data.

The City of Toronto publishes data on building permits going as far back as 2000. Excited about this, we at the Open Data Team asked ourselves: how might we learn from the Cleared Building Permits dataset? Could we use it for improving our understanding of Toronto and how we deliver services to the community? In this data story, we walk through our process and share the materials created, from data preparation to visualization, for you to use as a starting point in your own analysis or to follow along.

I specifically wanted to limit analysis to visual means because, despite how cool they are, sophisticated analysis tools like machine learning and neural nets are not the answer to every analysis challenge; in this case, the data was small and the nature of what we wanted to find out was more descriptive (i.e. what happened?) rather than predictive (i.e what will happen?) or prescriptive (i.e. how do we make something happen?) in nature.

Background: About the data

A building permit is formal permission from the city to begin construction, demolition, addition, or renovation on your property. Permits move through five stages:

  1. Application: drawings, documents, and forms depending on the type of permit submitted
  2. Review: City Building staff review plans to ensure compliance, applicants may have to revise plan until compliant
  3. Issue: Plans are approved and construction can begin
  4. Inspection: Toronto Buildings staff inspect the project to ensure adherence with permit submitted, construction is deemed compliant after inspections are completed and passed
  5. Close: Applicant contacts the city, inspection results are confirmed, and permit is considered completed

Permits are considered Active until closed. Both Active and Closed permits are available in the open data catalogue, under separate datasets, however the scope of this story is limited to Closed permits — permits going through the process will not be present.

Finally, note that in the portal “Closed Permits” are referred to as “Cleared Permits”. The goal is to standardize this in the future.

1. Define research questions

“If you don’t know where you are going you might not get there.”
– Yogi Berra

Data analysis projects must begin by defining, at least, one question you hope to answer with the data.

It is normal to revise these research questions throughout the process, as they they primarily serve as a launch pad from which to begin, but without them it is simple to get caught in an endless cycle of analysis without synthesizing it into a practical application.

Given their open-ended nature, formulating these questions can be quite difficult. We followed a two-pronged approach consisting of personal interests and the data itself.

First, initial ideas were derived from the team’s personal interests in Toronto housing-related trends. For instance, questions revolved around types of permits issued, availability of housing over time or place, and how trends compare to the cost of real estate.

Then, we familiarized ourselves with the data to refine our questions and come up with completely new ones. In-depth understanding of the data was not needed at this stage so, instead, we used the column descriptions available in the portal. This is known as the metadata.

A better understanding of the metadata helped us determine if the initial questions could be answered by the data and come up with new ideas.

Here are some example questions derived from the metadata alone

Essentially, this two-pronged approach enabled us to balance what we want to learn (e.g. personal interests) with what we think we can learn given the data and analysis toolkit (i.e. visualization in this case).

After this we narrowed scope of analysis the following research questions:

  1. How has the distribution of permit types issued changed over time?
  2. Has inspection period length improved over time?
  3. Has review period length improved over time?
  4. How has the number of units created changed over time?

2. Prepare data for analysis

Past experience has taught me that the vast majority of time in analysis projects is spent on preparing data for analysis and this time was no exception. This usually involves tasks such as:

Cleaning, e.g. identifying and addressing errors in the data
Reshaping, e.g. breaking up datasets into multiple datasets
Transforming, e.g. calculating number of days from two date fields

Consolidation

Cleared building permits are available in multiple comma-separated value (.csv) files, one per year from 2000 to 2017 (year to date).

These 17 files were consolidated for analysis via a Python script

Transformation

Work on the dataset beyond file consolidation was necessary to be able to analyze the data. Various issues were addressed during transformation, including but not limited to:

  1. Ensuring columns contain the correct data types so math can be performed on numerical fields and time operations on date fields
  2. Calculating time intervals between dates to calculate review and inspection periods from the columns
  3. Sorting permits to calculate time between permit revisions, which must be calculated by rows
Data was transformed through another Python script

3. Visualize data for exploration

Next, visualized the data to explore it and hopefully generate some insight. Although the data could not have been processed raw due to issues like the ones above, data was ready for consumption by data visualization software after the transformations performed in Step #2.

As mentioned, experts in this dataset revealed to us there are two distinct aspects to the data: permits issued up to 2005 and permits issued after 2005, due to changes to several changes to rules around the data, such as classification. Visualized only the permits issued post-2005 because they are more relevant and because the changes rendered time periods incomparable.

Created the dashboard in Tableau Public, the free version of Tableau Software’s leading data visualization software, to create an interactive dashboard for “slicing and dicing” data — feel free to explore it yourself below.

An example dashboard created in Tableau, primarily for exploration purposes — to slice and dice the data, guided first by the research questions. Best viewed in full screen.

4. Analyze patterns and outliers

Data exploration revealed 6. Of these, 4 help answer the research questions posed at the beginning and 2 were observations uncovered while attempting to answer the questions — perhaps not directly related but valuable still.

Insights reached from visual analysis

These insights are covered in detail (and I think more engaging) in the visualization dashboard above, but the highlights are below.

Question 1: Active permits are needed for an accurate view of permits issued over time

An accurate representation of the number of permits issued over time requires active and closed permits. Close permits are underrepresented in more recent years, as the complex ones have not had enough time to close.

Question 2: Inspection periods have improved, though closed permits are only part of the picture

Since permits can remain active after inspections, looking at closed permits alone paints an incomplete picture. With this in mind, the number of permits issues between 2013–2016 is on average higher than 2006–2013 which points to an overall increase in issuances. However, we would need the active permits to validate this theory.

Question 3: Review periods may have improved throughout the years

Once again, for a full picture we really need to complement this data with active permits. That being said, a visualization of the distribution of review periods indicates the median may have improved slightly over the years although not evident enough to be particularly certain of it without actual statistical analysis.

We can see from the visualization that review period durations are closer together in later years, which could mean improvement in terms of greater consistency — essentially, fewer outliers. Again, however, one must look at active permits for a more complete picture because this bias could be baked into data selection: complex permits, requiring longer review periods, are not yet closed in more recent years.

Question 4: Dwelling units increased; however, data collection caveats make this visual unreliable

Although the number of dwelling units created far outstripped that of number of units lost, this number refers to a net across permits — and, again, it is incomplete without considering the active permits.

Bonus Observation 1: Most permits fall in a small number of types

After removing common permit types included in more complex permit requests, such as Plumbing and Mechanical and often multiples of each, found that over 80% of permit types in our data are Small Residential Projects, Building Additions/Alterations, and Drain and Site Service.

Bonus Observation 2: Spike in Drain and Site Services permits in 2013 are attributed to flooding prevention subsidies

There was a four-fold increase in Drain and Site Service permits from 2012 and 2013, which caught my attention for being remarkably outside the norm. Discussions with the Buildings team (and some Googling) revealed an interesting bit of history: in 2013 a storm cause massive damage which , in turn, led to the Basement Flooding Protection Subsidy Program to help reduce these risks.

5. Engage with data owners

The next step was chatting with internal subject matter experts to review findings, get clarification on anomalies identified, and uncover other insights. Knowledge sharing from the experts proved essential towards validating the process and findings and providing much-needed context. Indeed, that insight enabled a much deeper understanding of the data and thus a higher quality analysis.

A great deal was learned from the experts. Although too much to list everything, the major lessons were:

  • The relationship between closed and active permits, and why considering both is key for understanding the whole building permits story. Having only one is inherently limiting, primarily because issued permits can remain active for long time periods until the final step is taken by the applicant.
  • The relationship between Mechanical and Plumbing permits with other permit types, and how multiple of these can be issued with other permits (a new building, for example, may entail several of these).
  • Spike in Drain and Site Service permits in 2013 onwards was due to the floods and the prevention subsidy program

Every lesson learned enhanced the quality of the analysis. My advice is to always speak with the experts of the data as their knowledge adds an invaluable dimension to an analytics project.

Next Steps: Ideas

Sometimes analytics projects pose more questions than they answer, maybe because the more you know, the more you realize you don’t know. And to a degree this held true in this case.

This analysis can certainly be improved in multiple ways and can be taken in many directions. Personally, the most promising of which would be inclusion of Active Permits to the data being analyzed. This will provide a much more holistic picture of permits across the city. Understanding the relationships between various permits types could enable normalization and allow more accurate comparison between them. Finally, joining this data to different datasets would lead to new opportunities for insight — personally, I am interested in exploring what can be learned from mapping permits by address.

Depending on the team’s priorities around data stories, these enhancements may follow on a Part II. More rewarding for me would be if someone in the community to uses this story as a starting point and builds on it! We invite you to do so and share with us what you find.

We are always seeking to improve data stories — please share your thoughts on how to make them better. Feedback is welcomed via the comments, email or Twitter.

--

--