Hortons with Henrik: In Search of Data

Published in

Budgetpedia

5 min readNov 29, 2016

Chats in Tim Hortons with Budgetpedia Lead, Henrik Bechmann — part one of three.

Walking hurriedly towards the Tim Hortons near Bloor and Dufferin, I saw Henrik sitting against the back wall with his wide brimmed outdoor research cap perched beside him. He looked ever the avid explorer, and today we would be discussing his foray into the uncharted territory of the Toronto municipal budget. Like an Amazonian jungle featuring pockets of humanity — beautiful but largely untamed, the Toronto municipal budget data is messy and challenging to wrangle. It does however contain valuable nuggets. These nuggets of data, visualized in the right way, are useful not just for those enthusiastic about budgets, but those who wish to see themselves and their city grow. The budget is a great place to start if you want to see change. When you look at the budget across the years, it maps how money is being allocated and spent; it tells you what is of value in each respective year.

As a curious, but largely technically obtuse individual, I brought along my roommate well versed in information systems management and backend development to act as a sort of translator; a safety net to ensure we were all speaking the same language, and that I was asking the right questions. What is interesting is that when looking at data sets across the years, often the language changes for a particular service or expenditure. What requires attention is recognizing patterns and the various ways in which the same thing can be said in order to track and normalize the data. We will delve deeper into Henrik’s data normalizing process in part two of this series.

This question of language and translation — making sure everyone is on the same page, and the right questions are being asked, is pertinent to the budget project. You can have all the data cleaned and normalized, but without the right questions, the data doesn’t say very much.

Over the course of a year and several months, Henrik and a group of over sixty volunteers (who weaved in and out of the project) brought to life an explorer tool for the Toronto budget. Through brainstorming and exploring this largely uncharted territory, one of the first steps in the process was locating the data sources. Recounting the adventure, Henrik noted that there are some continuity issues with the data sets that the city releases, of which there are four:

i. Annual Summary Budgets

ii. Audited Financial Statements

iii. Budget Open Data Sets

iv. Financial Information Returns (FIR)

The Annual Summary Budget dataset identifies the approved and recommended annual operating budget summary by expenditure category in each program or division starting from 2011. A new budgetary file is published annually in this dataset. Similarly, the Consolidated (Audited) Financial Statements are intended to provide Council, the public, the City’s debenture holders, and other stakeholders, an overview of the state of the City’s finances at the end of the fiscal year and indicate revenues, expenses and funding for the year. Management is responsible for the preparation, content and accuracy of the Consolidated Financial Statements and all other information included in the financial report.

The Toronto Budget Open Data sets are available under the city’s Open Data Catalogue: Budget — Capital Budget & Plan By Ward (10 Year Recommended), Budget — Capital Budget & Plan By Ward (10 yr Approved), Budget — Operating Budget Program Summary by Expenditure Category. Note that ‘recommended’ refers to the budget that was initially proposed, while ‘approved’ refers to the budget expenditures that were accepted for that year.

Differing from the City of Toronto reports and data sets, the FIR is the main data collection tool used by the Ministry of Municipal Affairs to collect financial and statistical information on Municipalities. The FIR is a standard document comprised of a number of Schedules which are updated each year to comply with current legislation and reporting requirements. It is designed to include many automatic calculations, including formulas which carry-forward data from one schedule to another. To further assist municipalities with completing the FIR, a series of data verification checks are built in. These checks verify information as it is entered by the municipality, and ensure that accurate data is submitted. Additionally, before a municipality begins completing the FIR, several data points from the previous year FIR are pre-loaded into the current report. This ensures that closing balances from the previous year match opening balances for the current year.

The FIR report is an interesting data set as each municipality in Ontario files their own report, and the province releases them. These reports are released both individually as Excel files, and are also amalgamated into a CSV and an R Data frame. This report could allow for future iterations of Budgetpedia that not only explores Toronto’s data, but compares and contrasts budgets Ontario-wide.

Henrik was careful to mention that many of the unique IDs do not match from the sets, creating reconciliation issues. Unique IDs had to be created for several expenditure buckets. As well, continuity measures were applied to account for historical blips in the data. As an example, there was an issue with the Youth Employment Program as its description and start and end dates fluctuate with funding cuts. Prior to 2014, the program was titled the Youth Employment Toronto (YET) program, until federal funding ended to this initiative. On June 23rd 2015, the Economic Development Committee recommended to the Budget Committee to reinstate the program entitle the Toronto Youth Employment Program (TYEP). In this instance, research into code and description changes was required, and an educated assumption addressed the continuity issue.

In addition to the four data sets released by the city, Henrik is also interested in (1) the Sunshine List and (2) budget data with geocoded cost centres. These data sets would allow for comparative analysis of individual incomes within the municipal budget, and the geographical location of specific cost centres around the city, respectively. As one example, geocoded cost centres would allow analytical reporters to visualize the hot spots for expenditures within the city.

The goal of Budgetpedia is to make the budget process more transparent and accessible, and the purpose of this three part series is to assist with the transparency of the project. As the project lead, Henrik has a wealth of knowledge to share regarding data sets used (part one), the normalizing process of the data sets (part two), and the types of technical tools used to build the explorer (part three). The Budgetpedia project is releasing version 0.1 on Tuesday November 29th which is an exciting culmination for the weary, intrepid explorers. To keep the momentum that Henrik has built, fresh faces can and should be tagged into to assist with new adventures in the data sets; bringing informational nuggets to the layperson in the most engaging way possible.

Hortons with Henrik: In Search of Data

Written by Kira McCutcheon