Introducing DCP’s Housing Database, DCP’s latest open data product

Amanda Doyle
NYC Planning Tech
Published in
4 min readMay 21, 2021

Do you wonder how the NYC housing landscape has changed over the past decade? Do you want to analyze how many housing units each neighborhood has gained? Do you want to study the types of housing that have been developed throughout the city? Well, then I have the dataset for you. Let me introduce you to NYC Department of City Planning’s (DCP) latest open data product: DCP’s Housing Database (Housing DB).

Housing DB visualized to show the type of job (color) and how many units the job will add (size of circle)

Housing DB contains approved NYC Department of Buildings (DOB) applications for new construction, alterations of existing buildings, and demolitions. Included job applications either add or remove residential units, and were filed or completed since January 1, 2010. Housing DB is an open data product primarily built off of DOB Job Applications Filings records, and other open data inputs. Why create a new data product when there are existing open data products with this information, you ask?
I’ll explain.

You could argue that one needs a degree in DOB speak to understand the DOB Job Application Filings dataset, which includes permits for everything from new buildings to new plumbing with many of the attributes encoded.
DCP worked hard to create a data product that would be useful and understandable to DCP and non-DCP housing analysts. Here’s what we do:

  1. Select a subset: Housing DB encapsulates a subset of DOB Job Application Filings — permits that change the number of residential units are included, and permits for plumbing, electrical, elevators, and other types of work are excluded — saving a user from doing some filtering.
  2. Decode DOB’s codes: Users don’t need to know that “J-0” and “R-2” are occupancy codes for “Residential: 3 or More Units,” or that a job is “Permitted” once the application has a status “Q” or “R.” Instead, fields in Housing DB are populated with the descriptive value instead of the code.
  3. “Clean” the data: DCP automatically cleans the data. For example, for all new building jobs we set the existing number of residential units value to “0,” because it’s a new building and even if there was a building there before, it would have been preceded by a demolition.
  4. Join data tables together: To provide all the information that is encompassed in Housing DB, DCP joins data from multiple sources together, including certificate of occupancy data.
  5. Map the records: DCP geocodes the DOB data so that it can be plotted on a map. DCP strives to plot the record to the most precise location possible, preferably at the centroid of the building or within the tax lot.
  6. Write documentation: To make sure a user has all that we know about Housing DB, we wrote and published data dictionaries, ReadMes, and other user guides. Comprehensive documentation allows users to learn more about the limitations of the data, the intended or appropriate uses of each field, and the source of each attribute.

All together, this upfront work saves an analyst from having to clean, join, decipher, and geocode datasets before starting to analyze the information they contain.

Furthermore, with each release of HousingDB, DCP’s Housing and Economic Development (HED) team conducts an extensive, multi-week research and review process. This review includes correcting data errors, as well as allocating units between long-term residential use (Class A), hotel, and other Class B units. Other class B units include all dwellings that are not Class A units or hotels, and may include single room occupancy units, dormitories, and certain kinds of supportive housing. This intensive research processes improves the data quality and adds a level of detail to Housing DB.

Finally, knowing that many analysts are interested in how the number of housing units have changed over time by geographic area, DCP publishes unit change summary files that report the net change in Class A housing units by year at different geographic levels (i.e. Neighborhood Tabulation Area, Community District, and Census Tract).

So, if you’re interested in analyzing the the change in legal housing units across time and space, look no further than to DCP’s Housing Database.

Now, where can you find Housing DB, you ask? You can download Housing DB and the unit change summary files on DCP’s Bytes of the Big Apple and NYC Open Data. Need some inspiration to get the analytical juices flowing? Check out the analyses done so far by the HED team in their report here. Curious about how the Data Engineering team built Housing DB? Check out the code in our public GitHub repo here. Questions or comments about the dataset? Open an issue or reach out to us!

--

--

Amanda Doyle
NYC Planning Tech

Urban scientist / Geographer / Data engineer / City enthusiast