Test Driven Development on the Equitable Development Data Explorer

Published in

NYC Planning Tech

5 min readApr 28, 2022

This spring the Data Engineering team completed the first version of the data pipeline for the Equitable Development Data Explorer (EDDE). This pipeline brings together data from a variety of City agencies and public sources and standardizes it for use by the EDDE web app. You can learn more about our contribution to EDDE in the repo wiki.

Screenshot of the equitable development data explorer

While building some parts of this pipeline we applied a development approach called test-driven development (TDD). I wrote this blog post to share how TDD helped us build the EDDE pipeline and what we learned.

What is test-driven development?

Test driven development (TDD) is a coding approach where unit tests are used to plan and structure development. Geeksforgeeks.com describes unit tests as “a software testing technique by means of which individual units of software… are tested to determine whether they are suitable for use or not.” If this seems an extremely broad description, that’s because it is! Unit tests can be applied to any part of a coding project.

The basic way to use a unit test is to look at a piece of existing code, think about what it’s supposed to do, write a test, and see if the test passes. However, as developers we found that unit tests were most useful when they were written before the associated code was finished. This is a test-driven development approach. Browserstack.com explains TDD: “In layman’s terms, Test Driven Development (TDD) is a software development practice that focuses on creating unit test cases before developing the actual code. It is an iterative approach that combines programming, the creation of unit tests, and refactoring.”

The key to TDD is writing tests that are supposed to fail when they are written. A failing test acts as an acknowledgement that the existing software isn’t sufficient. But it also clearly dictates what needs to happen for the software to be done. Note that when I say “done” here I don’t mean perfect. Our goal isn’t to write tests that completely cover all expected functionality. We use tests as a complement to manual quality assurance/control, not a replacement for it.

Testing for included geography

This is an example of how TDD helped us as developers. Our deliverable for EDDE was a series of files that aggregated data at three geographic levels: PUMA, borough, and city. The project is divided into groups of data points, each produced by an accessor function that returns a pandas dataframe. The last step before uploading our deliverables to the cloud was to combine dataframes with like geographies into a single collated dataframe.

We kept running into the same issue on the collate step where dataframes for individual data points had an incomplete list of geographies. For example, our process to produce data on the number of public housing units per PUMA was to read in the source data for individual public housing projects, assign a PUMA value to the record, then aggregate the number of units by PUMA. However, not all PUMAs have public housing units. Therefore, we needed to ensure that PUMAs without public housing were reported with zero public housing units, instead of N/A or omitted from the data.

What made this issue a good candidate for TDD?

It was consistent across all data points.
We knew our goal when we started writing tests.
The tests were not computationally intensive which meant they ran quickly. This allowed developers to use them iteratively as they worked.

These tests were written before most of the accessor function. As we worked, we tested the output of each accessor function and only considered it “done” if it passed this test. Code on feature branches would only be merged if it passed our tests.

We used pytest to implement our tests. The –v flag was helpful in visualizing our progress towards our goal of having all dataframes include all geographies before the collate step was started.

An example of a pytest run with the–v flag to give a summary of which accessor functions were finished .

You may be thinking “why didn’t you test the data to make sure it was correct?” The reason we didn’t develop tests for the content of the data was that the data points weren’t sufficiently similar to one another. There was no piece of commonly reused code on the data munging/cleaning side that we thought would be a good candidate for TDD. Instead, ensuring the correctness of the outputted data was done manually by engineers, the project managers, and the subject matter experts.

Testing for expected “token” in the column label

This round of TDD was different compared to the above in that it guided us through a refactor instead of building new functionality. The refactor was prompted by a conversation with the Open Source Engineering team responsible for ingesting our data to the web app. This team told us that it would make ingestion of the data easier if our column labels consistently stated the metric being reported. Therefore, we needed to add a substring from a set list of values (i.e. count, pct, moe) to each field name to indicate what type of metric the data in the column represents (i.e. a count, a percent, or a margin of error); we call this component of the field name the metric token.

Right after that meeting, we wrote this test.

This turned our abstract plan into a concrete set of expectations. When a developer needed to remind themselves of what needed to be done, they could refer to this test instead digging up an email or the meeting notes. Then they could run it to see which accessor functions conformed to the requirements and refer to that code as instructional.

This testing approach didn’t need to be robust to help guide us in our refactor. Trying to capture each column label requirement in a test would have taken too long. The metric token was the only substring that was expected in every column label in all data outputs, and thus the best candidate for this round of TDD.

How do we decide if TDD/unit tests are “worth it?”

If we think TDD/unit tests will save us time in the long run, then it’s worth it from our perspective. This of course involves some guesswork. It’s often necessary to invest some hours in writing and running tests to get a broad sense of how helpful they can be.

The bottom line is that for us tests are a tool to guide our work but aren’t a necessity. All our deliverables involve a manual QAQC process that can’t be automated.

Lessons

Test driven development can help us track our work towards a goal.
Test driven development can help us clarify our expectations and act as a reference.
Tests are a complement to manual QAQC, not a replacement.