Welcome to the world DCP’s Capital Projects Database

Amanda Doyle
NYC Planning Tech
Published in
8 min readJun 21, 2022

“Hello world.” — the Capital Projects Database

The New York City Department of City Planning’s (DCP) Capital Planning team built DCP’s Capital Projects Database (CPDB) to fulfill the team’s desire to know where capital projects are happening or are planned to happen in NYC to inform future capital investment decisions. Initially, CPDB was only available to planners in DCP and other City agencies. In December 2021, DCP released the Capital Planning Explorer to the public, giving all access to CPDB. Currently, CPDB is built and maintained by DCP’s Data Engineering team, but DCP’s Capital Planning team remains the product owner and key user of this core data product. We’re all excited to introduce CPDB to you.

What is the Capital Projects Database?
A new way to explore capital projects in NYC’s Capital Commitment Plan​ and capital spending.

What is a capital project?
A “capital project” involves the construction, reconstruction, acquisition, or installation of a physical public improvement with a value of $35,000 or more and a “useful life” of at least five years. This encompasses spending on physical public works projects, such as roads, sewers, and bridges, as well as investments in core information and technology infrastructure, and critical equipment, like fire trucks. It does not include spending on programs, such as after school funding or community engagement projects.

What is the Capital Commitment Plan?
The Capital Commitment Plan is a multi-volume document published by the Office of Management and Budget (OMB) three times a year as a series of pdf files, generally in the months of January, April, and September as part of the publication of the Preliminary, Executive, and Adopted Capital Budgets. This document presents an agency’s capital program, and provides the anticipated implementation schedule for projects in the current fiscal year and the next three years​. Unfortunately, despite containing valuable informations that is used by NYC capital agencies and beyond, the layout of a Capital Commitment Plan document is not user friendly, as illustrated in the screenshot below.

A screenshot of one page of a Capital Commitment Plan.

These non-searchable pdfs capture information on individual future commitments allocated to a project by a specific budget. One project can be funded by multiple budgets, and one project often has multiple commitments. A budget is a bucket of monies funded by a city agency to sponsor particular projects. A commitment is an individual contribution to fund a portion of a project. A project is a discrete capital investment, and is defined as a record that has a unique Financial Management Service (FMS) ID, which is composed of the three digit managing agency code and the alphanumeric project ID.

The managing agency is the agency overseeing the construction or implementation of a project. The sponsor agency is the agency funding the project, and is associated with the budget of a project. The managing agency and sponsor agency are not always the same for a project. For example, the Department of Environmental Protection (DEP) can fund a project for sewer reconstruction, but the Department of Design and Constriction (DDC) may coordinate and manage the construction work.

An example of how to parse the information from the pdf is highlighted below. Project 096HR25CHRYC (the FMS ID), is the construction of a project along Chrystie Street, and has $1.165 million allocated for construction work in June 2024 from budget HR-25, which is dedicated to funding the improvements of structures used by the Department of Social Services.

An annotated screenshot of one page of a Capital Commitment Plan.

As you can see, you need a degree in OMB speak to understand the information in this document. Even if you were a user who could effectively read this document, the information it contains is locked in a pdf. The data is not machine readable, and despite much of the information being place specific, it is not mappable. This is where Data Engineering comes in.

How does DCP add value to the Capital Commitment Plan and create the Capital Projects Database?

First, CPDB makes the data available as machine readable relational data tables that report data at the project, budget, or commitment level. Now, in a single row of a table a user can know how much money is allocated to a single project, or use a table to look up all projects managed by a specific agency. Additionally, these data are made available in the Capital Planning Explorer in an easily searchable interface.

The project table view of the Capital Planning Database in the Capital Planning Explorer.

Second, and perhaps most importantly, spatial data is added to projects in the CPDB, which enables us to map a subset of projects. Only a subset of projects have a geometry because not all projects are “mappable.” A project may be a lump sum that will be drawn upon to fund other discrete projects, or a project may not be place specific and have a direct impact on the surrounding area. For example, a project may involve the installation of IT equipment, such as a server, which may be housed in a specific building but have uses beyond the walls where it is located. Additionally, funds may be for vehicles, which can be moved between locations during their lifespan.

Using keywords in the project description DCP categorizes each project in CPDB as either a “Lump Sum​” (i.e. Surveys for sewers in Bronx), “ITT, Vehicles, and Equipment​” (i.e. Urban Resource Institute: DV Shelter Vans​), or “Fixed Asset” (i.e. Storm Swr ext 119 ave b/t 192 & 195 st, QNS). DCP focuses on mapping fixed assets, or projects that are place specific and have an impact on the surrounding area, visible or not, such as park improvements or sewer reconstruction.

The map view of the Capital Planning Database in the Capital Planning Explorer.

How is spatial data added onto projects in the Capital Projects Database?

  1. Join on geometries from spatial datasets of capital projects provided by agency partners or made available on Open Data. DDC, the Economic Development Corporation (EDC), and Department of Transportation (DOT) all provide spatial datasets of their capital projects to DCP. Additionally DOT and the Department of Parks and Recreation (DPR) publish spatial datasets of their capital projects on Open Data. Given that all of these datasets have an FMS ID, which is the unique ID in CPDB, we’re able to easily join on the spatial data.
  2. Use fuzzy string matching. Often the description of a project is telling of where it is located. For example, the description of a project may be “St. Mary’s Plgd West — ADA CS” or “Brower Park Library Fit Out.” Therefore, we can join on spatial data from DCP’s Facilities Database and DPR’s Parks Properties dataset by the matching the place named in the project description to the name of NYC park or facility. Abbreviation variations are taken into account for common words (e.g. Playground vs Plgrd and Fort vs Ft.) to help facilitate more matches between records. Additionally, the managing and sponsor agencies of a project are considered to increase and improve the accuracy of matches between these datasets. For example, we know that if the New York Public Library is the sponsor agency of a project that the project should only be able to match records that are categorized as a library in the Facilities Database.
  3. Manually map projects. Despite our best efforts we can not programatically map all projects that are site specific. Therefore, in 2017 an extensive effort was made to map as many fixed asset projects as possible by reading the project description and assigning the project to a Building Identification Number (BIN); Borough, Block, and Lot (BBL); or a manually created geometry.

We recognize that our work is not error proof; therefore, we remove faulty geometries by reviewing mapped projects and deleting geometries that are inaccurate or incorrect. Furthermore, we know our mapping efforts are not comprehensive and not all projects that “can” be mapped are mapped. Generally, about 40% of all capital projects in a Capital Commitment Plan are mapped.

The last way DCP adds value to the data in the Capital Commitment Plan is by joining on data from Checkbook NYC so that a user can know how much money has already been spent on a project. Luckily, Checkbook NYC uses FMS ID as its identifier for a project. Therefore, using the Checkbook NYC API we sum up the value of all checks by FMS ID and incorporate this data into CPDB.

How can you use the Capital Projects Database?

  • Explore current and planned capital investments in a neighborhood
  • Discover the projects sponsored by an agency across the City​
  • Understand the relative capital budget sizes of different agencies
  • and more…

What should you be mindful of when using the Capital Projects Database?

  • Again, not all projects that “can” be mapped are mapped​, as our spatial data sources and manual mapping efforts are not comprehensive of all place specific projects.
  • The data is a snapshot in time​. Each version of CPDB only contains the projects listed in the associated Capital Commitment Plan. It does not include projects from previous Capital Commitment Plans​. Therefore, you cannot know what previous capital investments have been made in an area by using one CPDB.
  • While CPDB reports planned monetary allocations and monies spent, it does not capture all funds allocated to a project​ because it does not account for funds that have been released to an agency but not yet spent.
  • Being a “special” agency, School Construction Authority (SCA) funds are included as lump sums in CPDB​, and SCA publishes a separate capital budget with detailed funding allocations to discrete projects. This project breakdown is not included in CPDB.
  • CPDB is not a project management tool​. Despite the Capital Commitment Plan reporting when funds will be allocated for a phase of a project, such as design or construction, it does not mean that that is when the actual work is happening.
  • There is uncertainty around projects in CPDB that are 4 or more years out.

What’s next?

  1. Make CPDB available on Open Data. Currently, CPDB is only available via the Capital Planning Explorer. We’re actively working to write the metadata and package the datasets so that CPDB and its contents are accessible and consumable by all data users.
  2. Refactor CPDB’s codebase. At Data Engineering we love maintenance. Given that CPDB was one of our early data products some of the components of the code are outdated. We look forward to bringing CPDB’s build process in line with our other data products and streamlining the current logic.
  3. Improve CPDB. Now that CPDB is available to a broader audience we look forward to partnering with the Capital Planning team, planners, other City agencies, and users like you to improve the data quality of CPDB, and solicit feedback on how to make CPDB useful to existing and future use cases. If you have questions or comments about CPDB please open an issue or reach out to us!

--

--

Amanda Doyle
NYC Planning Tech

Urban scientist / Geographer / Data engineer / City enthusiast