Mapping Billions in NYC Capital Spending

Chris Whong
qri.io
Published in
9 min readDec 4, 2020

Update: 2:26pm EST on December 4th. We noticed an error in the scraper code that resulted in a major issues with the total spending amounts used in this analysis. We are currently working to fix these errors and update the visualizations and numbers used in this blog post.

Update: 4:19pm EST on December 4th. The errors have been corrected, and all all numbers and visuals in this blog post have been updated. An update to the dataset has also been pushed to qri.cloud, and an update to the scraper code has been pushed to github.

If you follow the exciting world of municipal budgets, you may know that New York City’s operating budget is close to $90 Billion for the current fiscal year. This is fairly cut and dry, money is raised from taxes and other sources, and spent on salaries, rent, and other operating costs. Repeat annually.

The capital budget, however, is a more elusive beast. Capital projects may span years or even decades, and the enormity of the numbers and the complexity of the process make it quite difficult to figure out where the money goes, and when.

A Quick Primer on Capital Budget Information

If you want to inspect the capital budget, here’s the official capital budget document for Fiscal Year 2021 approved by the city council and mayor. It’s 604 pages long, and is mostly screen dumps from a government computer system. If you scan these pages, which are already cryptic and confusing, you may see some specific project-related descriptions for major projects like bridges, but most of the lines have something like:

CONSTRUCTION, RECONSTRUCTION, ACQUISITION OR INSTALLATION OF A NON-CITY OWNED PHYSICAL PUBLIC BETTERMENT OR IMPROVEMENT WITH A CITY PURPOSE, WHICH WOULD BE CLASSIFIED AS A CAPITAL ASSET UNDER GENERALLY ACCEPTED ACCOUNTING PRINCIPLES FOR MUNICIPALITIES; FOR THE GRAND STREET SETTLEMENT.

Sounds pretty vague, right? This is because the Capital Budget deals in appropriations to “budget lines”, not projects. These can be thought of as buckets of money reserved for a specific type of use. To get down to the project level, where money turns into real things that humans need or use, we must look to the Capital Commitment Plan.

The Capital Commitment Plan is a 4000 page document of more computer system screen dumps, and is so large, it has to be published in 4 parts. Take a look at one of the pdfs over at https://www1.nyc.gov/site/omb/publications/fy20-accp.page, and you will find screens that list projects under each budget line, along with cryptic information about milestones, planned completion dates, and expected costs over time.

A page from the capital commitment plan, detailing several projects that receive funding from a single budget line.

So, we’ve got thousands of pages of PDFs to pore over, or we can go look at high-level numbers, but how am I as a citizen supposed to figure out what is happening in my neighborhood? How is an analyst supposed to figure out if these funds are being allocated in a fair and equitable way? There’s hope in a new set of publications that the Office of Management and Budget recently started publishing.

A New Hope: Capital Project Detail Data PDFs

Recently, a new set of budget publications appeared on the OMB website, immediately following the capital commitment plan PDFs. They are known as Capital Project Detail Data, and are a wonderful improvement on making all of the above information more digestible by normal humans.

They include a nicer layout with a single page of detailed information for every capital project in the city. Here’s the page for the carousel improvement in Prospect Park, Brooklyn.

A page from the new “Capital Project Detail Data” pdfs, where each capital project gets its own full page of details

From this page, we can see:

  • They have spent $90,000 on this project so far
  • They plan to spend $78k in FY 2020, and $422k in 2021
  • The total spent +planned is $590k, which is higher than the original budget of $500k
  • There’s actually an error in this PDF where the community districts are prefixed with the wrong borough code. It should be “3” for Brooklyn.
  • This project will serve Brooklyn Community Districts 6, 7, 8, 9, and 14 (aka 306, 307, 309, and 314 in the data)

We also have milestones with original vs current start and end dates, to get an idea of the timeliness/delays of the project. There are also some goodies in here that are not included in any of the other PDFs, such as “Project Location” and “Scope Summary”, all of which help to move this project out of paper and into the real world of things that New Yorkers need and use.

PDFs Are Where Data Goes to Die

This is an old saying in the circles I run in (civic open data nerd circles, to be precise). There’s a ton of data in these pdfs that’s been turned into page layouts for human eyes, but in the process is made non-machine-readable, so we cannot use it for analysis purposes. Wouldn’t it be great to do a simple sum() on the total spend on all projects in my community district, or to group all projects in Brooklyn by their category and get a simple run-down of how we’re spending all this money?

Better yet, can we map where these projects are happening and see where we’re spending all this money?

OMB has actually published two datasets that represent the Capital Project Detail Data PDFs Capital Project Detail Data — Dollars, and Capital Project Detail Data — Milestones.

A noble effort and an excellent commitment to transparency, but unfortunately at the moment they are both stale and incomplete. Specifically, I wanted to do analysis of the community_districts_served for all these projects, so had to take matters into my own hands. The data from these pdfs can be scraped!

Scraping the Data

I won’t bore you with the details (the code is on github), but we scraped the Capital Project Detail Data PDFs and published the data on qri.cloud, so anyone can have access to granular NYC Capital Project Data. We also published two tables:

The meat is in the first table, where we have one row for each of the 5,200 discrete capital projects currently happening in New York City! The second table contains the milestones, which have a many-to-one relationship with projects.

Some top-level highlights:

  • The sum of combined_total for all rows is $105.8B ($40.9B prior actual spending/$64.9B planned spending)
  • While many projects are assignable to one the five boroughs, we see that the largest chunk of this spending ($36.4 B) is for citywide projects . We can also see a detailed borough breakdown of the non-citywide projects.
Every project is assigned to a borough, or is designated as citywide.

Citywide vs. Boroughwide vs. Local Projects

Below the borough level, things get trickier. As we saw in the PDFs, a single project can serve multiple community districts (for example, it looks like most of the projects in Prospect Park are listed as serving all of the surrounding districts. The data also includes some nonexistent community district ids meant to describe projects with borough-wide benefit. Community districts ids usually contain a single digit for the borough and two more for the district number, so Bronx Community District 4 = ‘204’. The data include district ids such as 200 and 299 which are inferred to mean the benefits are borough-wide (someone please correct me if I am wrong on this and 00 and 99 have different implications).

To get a basic understanding of how many projects have “local” benefits, I did some simple grouping, treating projects with at least one real community district listed as local, all others are considered borough-wide.

We can see that the ratio varies quite a bit from borough to borough, with Brooklyn having nearly double the amount of local spending for boroughwide projects. Collectively, about 48% of all borough-specific project spending is local under this definition.

Local Projects by Category

Now, the moment you’ve been waiting for and the reason you came to read this blog post. With community district and categories assigned to each project, we can map out where local projects for each category are happening around the city! A few caveats and notes before we dive in:

  • The categories shown come from the ten_year_plan_category field in the dataset
  • Projects attributable to more than one community district are represented more than once in the calculations used to generate the maps
  • All of the spending shown below represents only non-citywide and non-boroughwide, or about 40% of the $105.8B total spending outlined in the dataset

Drumroll, please 🥁…

Right off the bat, several categories stand out as having wide coverage around the city:

  • Large, major, and regional park reconstruction
  • Neighborhood parks and playgrounds
  • Essential reconstruction of facilities
  • Garages and facilities

Parks, as we might expect, are spread throughout the city and need plenty of capital investment for improvements. “Facilities” is vague, and there are may other categories that also describe facilities, so it’s not really clear what we’re seeing here. Each tiny map on this grid could probably be its own blog post. As a viewer, I immediately want to drill down into each category and see the individual projects. An interactive UI for exploring Capital Spending would go a long way, and I’d love to build one someday (again)!

Mapping a Single Community District’s Local Projects

Another useful application of this rich, project-level data is discrete mapping of individual projects. Of course, we don’t have latitude/longitude coordinates in the data, but we can’t let that stop us.

I manually looked up each of the 46 projects that serve my home community district, Brooklyn 6, assigning a point coordinate to each one. The location, combined with the total spending and project description makes for a fascinating visual of what’s going on with the in my area ($480.9M of actual + planned spending).

Manual Geocoding of Local Capital Projects makes it possible to visualize the spending of hundreds millions of dollars on a map

The big ones include water and sewer reconstruction along with reconstruction of the drawbridge over the Gowanus Canal. It’s also interesting to see the taxpayers contribute to the cruise terminal. The projects in Prospect Park generally list all of the surrounding districts as served, so we can see some interesting projects like renovations to the carousel, a comfort station, and the LeFrak Center (I thought this was complete but I guess it’s still an active project)

What does $3.7M for ADA accessibility at Brooklyn’s smallest library branch (Pacific Street) entail?

Why does HVAC replacement at the Carroll Gardens Library cost $4M?

What does $15.9M of ball field improvements in Red Hook actually look like?

This dataset probably raises more questions than it answers, but I think that’s the point. Now we can see, sum, map, chart, and otherwise have more visibility and informed discussions about how we spend our capital dollars.

Next Steps

Mapping capital spending has been a white whale of mine for a while (I actually worked on this during my time at the Department of City Planning but the project never saw the light of day). Obviously, we need to assign latitude/longitude coordinates to the other 4000 or so capital projects that serve local community districts, so we can expand the map above to the whole city. On top of that, there’s so much rich data about each project that static visuals don’t do it justice. An interactive tool where we can combine the mappable projects with the citywide projects and allow the user to “surf the budget” is in order, and would go a long way towards making this information useful and actionable to community groups, politicians, and anyone else with a stake in how we invest in our city.

If you’d like to help out with this effort, we’d love to hear from you. Of course, we’ll be managing the data in qri, our cutting-edge version control system for datasets, so every contribution will be trackable and public for the world to see.

Thanks for reading! Let us know what you found exploring the maps on this blog post by giving us a yell on twitter.

--

--

Chris Whong
qri.io
Editor for

Urbanist, Mapmaker, & Data Junkie. Outreach Engineer at Qri.io