Piloting Google Cloud Platform to Enhance Data Access and Usability at NYC Planning

Matt Kudo
NYC Planning Tech
Published in
6 min readAug 20, 2021

This summer, the Data Engineering Team hosted two Civic Innovation Corps data science fellows from Coding it Forward. During their 10-week internship, they worked with Data Engineering to improve access and usability of data at NYC Planning using Google Cloud Platform. To learn more about @mattcrittenden and @spencersimon16, check out Coding it Forward summer fellows join NYC Planning’s Data Engineering Team.

Data Engineering at NYC Planning

New York City’s Department of City Planning (NYC Planning) — as the name would suggest — plans for the future of the city, and the Data Engineering Team develops data products to inform analyses and decisions supporting that mission.

There are four pillars of how Data Engineering supports NYC Planning. Our project this summer aimed at enhancing the Ecosystem and Community components.

  • Product: Create and release high quality public datasets
  • Operation: Build highly transparent and automated data pipelines
  • Ecosystem: Offer comprehensive documentation and metadata
  • Community: Bring people together to share data and to learn from each other
The four components of Data Engineering’s Mission

Enhancing data at NYC Planning with Google Cloud Platform

A data ecosystem is a collection of data products, metadata, and computing resources used to capture, describe, and analyze data. Put more simply, it bridges the gaps between data production, publication, discovery, and analysis. The current data ecosystem at NYC Planning includes discovering data via curated metadata documents, accessing it from a variety of places, such as Bytes of the Big Apple, collaborating with other data users through Teams messages or meetings, and analyzing data on one’s local computer with tools, such as Excel or ArcMap.

With this pilot, we are simplifying the process of working with data with three Google Cloud services: BigQuery, Data Catalog, and Data Studio. In addition to these three, we experimented with other Google tools, including Cloud Shell, Colab, and Notebooks.

Proposed data ecosystem tools

To demonstrate the proposed data ecosystem, we grounded our exploration in the following question: How have zoning changes impacted the built environment? More specifically, we wanted to see what happens to the number of residential units in a lot following a zoning change.

Loading and preprocessing data in BigQuery

Our first step was to load and process historical MapPLUTO data so we could look at changes over time in our analysis. While we loaded many datasets into BigQuery as part of this project, we’ll focus on MapPLUTO here because it was the largest and most involved dataset we worked with. MapPLUTO contains extensive land use and geographic data at the tax lot level for New York City. We used this dataset mainly to determine the zoning district(s) of tax lots.

We took annual, publicly available MapPLUTO versions from 2002 to present and loaded them into BigQuery. This took several steps, including merging borough shapefiles for earlier versions of the dataset and correcting for changes in the schema definitions between versions.

Our process for working with MapPLUTO data

After completing these steps to upload and process the data, anyone at NYC Planning can now access these datasets in their future analyses in BigQuery. In the past, examining two decades worth of historical MapPLUTO data was incredibly tedious. Now, planners and analysts across the agency can quickly and easily use Google Cloud to incorporate this data into their analyses.

Screenshot of the BigQuery interface previewing MapPLUTO 21v2

Creating searchable metadata with Data Catalog

To aid their work, they can use Data Catalog to search across datasets for relevant fields and metadata. As we were interested in zoning for our analysis, we could search for “zoning” and all datasets referencing zoning in their field names or descriptions would pop up.

Screenshot of Data Catalog

Analyzing decades of change with Data Studio

For our analysis, we mainly used Data Studio, another part of the Google Cloud Platform which works efficiently with big data. In Data Studio, you can create interactive and filterable visualizations within minutes, even when connected with large datasets such as this combined version of MapPLUTO which is over 2 GB and contains over 16 million rows.

In the screenshot shown below of an interactive and filterable dashboard we made, we can see a citywide overview based on MapPLUTO data. Visualizations show the area breakdown of NYC by borough, the most common zoning districts, the change in residential units across NYC over time, and more.

Screenshot of a Data Studio dashboard displaying citywide information

Some data studio features we found especially convenient included:

  • Chart Filters allow for creation of visualizations based on a subset of larger datasets, as well as user-selected inputs.
  • Interactive Filters provide for user-centered experience as visualizations automatically update when subsets of the data are selected (e.g., click “Queens” in the donut chart and the page will change to only display Queens data).
  • Spatial visualizations with Google Maps enable map creation within seconds even when working with thousands of point, line, or polygon features. These maps have panning and zoom options built-in and can support numerical and categorical data types.

Studying at the neighborhood-level in East New York

Recognizing that New York City is a heterogenous place where it can be difficult or inappropriate to draw generalizations at the citywide and borough level, we also demonstrated the utility of Google Cloud tools at the neighborhood level. In the screenshot below, we show one page of a Data Studio dashboard we created to visualize early outcomes in mixed-use development from a neighborhood study in East New York.

Screenshot of a Data Studio dashboard on mixed use development in East New York

Here, we see an example of how Data Studio allows us to create a colorful, interactive map displaying many tax lots in an array of colors depending on their building class. This map helps us see over time the makeup of mixed-use development in East New York as well as where it has been concentrated. An interactive bar chart near the bottom of the page allows us to subset based on which developments are new buildings, alterations, or demolitions. An interactive donut chart also allows us to compare changes within the neighborhood study to those in the surrounding area.

Takeaways and Going Forward

Overall, Google Cloud Platform does an excellent job supporting Data Engineering’s mission to enhance accessibility and collaboration when working with data. Google Cloud makes it easy to do things that were very difficult before, such as analyzing data across many versions of MapPLUTO and other datasets. Even when working with large amounts of data, exploratory analysis can be quick and easy with these tools. These tools can enable complex analysis across the extent of data available at NYC Planning, helping planners answer big questions about New York City.

From our experience, the Google Cloud products that we used (BigQuery, Data Catalog, and Data Studio) were easier to integrate than other Google products that we experimented with, such as Colab and Notebook. Below, we share our reviews of each tool.

Performance evaluation of data ecosystem tools

Moving forward, this enhanced data ecosystem using Google Cloud Platform opens up exciting opportunities for NYC Planning. It enables citywide analyses incorporating large, time-series data which was previously challenging to work with, and supports traditional research methods at the neighborhood level with big data-driven decision-making. These analyses can take place in collaborative spaces, such as Data Studio, with many city planners working simultaneously in lieu of having to download and work with data locally.

We hope to see Google Cloud Platform get adopted more broadly and for a more vibrant data ecosystem to flourish at NYC Planning.

--

--