Lessons learned from working with an expanded engineering team for a limited period

Sasha Weinstein
NYC Planning Tech
Published in
6 min readNov 3, 2022

This past summer the Data Engineering team hosted two fellows for six weeks through the Coding it Forward program. These fellows were strong programmers with a wide set of skills, and both expressed interest in learning about the intersection of data and city government. It was important to our team that they gain valuable hands-on experience with NYC data and make meaningful contributions to our ongoing projects. However, six weeks is a short time period for a fellowship considering the fact that it often takes a month or more for a new engineer to learn our entire technology stack. So, finding projects that the fellows could contribute to as quickly as possible was key to making sure they had a good and meaningful experience.

The main project that the fellows ended up working on was our Quality Control and Assurance (QAQC) web application. We use this application to check the quality and completeness of some of the data products we produce. In this post I’ll lay out why this was a good project for the team to focus on with its expanded capacity, and how our agile development approach was well suited for this work.

Keeping context “overhead” low

One major challenge of hosting fellows for a six week program is that it can take several weeks to learn all the nuances of our team’s stack,processes, projects, and products. The Data Engineering team works with a unique implementation of various open-source technologies that fit together in a particular way, and we are responsible for supplying dozens of datasets to different teams for various uses. Every hour that a fellow spends learning our stack and ecosystem is an hour they are not spending actively contributing as an engineer. Future employers will want to know about what the fellows contributed to directly, not just what they learned.

So, for our fellows’ sake we looked for a project that required relatively small “context overhead.” Our QAQC application fit that bill. Each page in the application is relatively atomic — you don’t need to know that much about the code in the rest of the repository to add new content to one page. Likewise the queries written to assess the completeness of a dataset ingest data from at most two tables.

The Facilities Database QAQC page provides a good example of how contributing to the app requires little context and helps build a generalizable skill. The content on this page summarizes changes between a build of the data product and the previous release. The goal of these reports is to give a reviewer a chance to flag changes between versions that could be due to errors.

There are two pieces of code required to make this work: a SQL query that compares the previous and current versions of Facilities Database, and Streamlit code to visualize the results of this query. Contributing to either module doesn’t require deep knowledge of who uses the data or where it comes from.

Learning the ability to measure and visualize this sort of quality assurance check gave the fellows a skill to take to other engineering roles. Plus working with the team’s data products gave them meaningful insight to what sorts of common issues arise in the data that NYC collects and makes public, which satisfied our fellows’ goals of learning about data in city government.

Going into the six week fellowship we knew that getting the fellows directly contributing as quickly as possible was a key goal of ours, and so we identified the QAQC application as a project with a low “context overhead.” The fact that the queries and front-end development were generalizable tasks was an added bonus that gave the fellows additional skills to take to future opportunities.

An agile framework to accommodate engineers with varied skills and abilities

There was a wide range of front-end development experience on our five person team. One engineer had worked as a web developer professionally, some of us had only done front-end work in school, and some of us had never worked on a front-end. The flexibility of an agile approach was key to making sure our team worked together effectively.

The way we set up the work was that I (a full-time engineer) scoped out a number of issues in GitHub and presented an overview of our goals to the team. From there anyone could take on any issue they wanted, and add new issues to correct bugs or add new content. This meant our more experienced engineers could move through multiple tickets in a day and then scope out new ones if there wasn’t a task to pick up.

Our review process was similarly open. Code was reviewed in pull requests and approval by any other engineer was sufficient to push the code to production. This meant our more experienced engineers could pivot to reviewing other people’s work when they had free time, and less experienced engineers would be empowered to do more reviewing as they got more familiar with a particular element.

We didn’t worry about the distinction between fellows and full-time engineers. Instead we allowed anyone to scope out issues, implement new code, and approve pull requests. We were also flexible in our goals as we didn’t know exactly how much work the team would be able to take on with our new capacity. We knew that the fellows would naturally gravitate towards work they found interesting or that would best complement their resumes. If an issue required context that they didn’t know or required understanding of a part of the stack they hadn’t seen, we were clear that it was totally ok to move on to another task. Of course in certain situations the fellows were curious about seeing new things. We just didn’t want them to get hung up on errata or the peculiarities of our implementations when they could be contributing to the scrum.

Code that couldn’t be maintained wasn’t pushed to production

Another new challenge presented by bringing new engineers on to the team for a limited period of time was that they scoped out technical improvements that were totally novel to our full-time staff. The best example of this was the caching upgrade to the app. This was an upgrade written by one of our fellows who had an extensive web development background and knew a way to make our app faster by implementing a cache. Wikipedia describes a cache as: “a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere.”

Streamlit doesn’t have automatic caching as it’s fairly complex to understand and the framework prioritizes ease of development over performance. As a consequence the app was slowing down as it re-downloaded the same data from the cloud over and over. One of our fellows saw this problem and addressed it in the right way. They wrote a clear issue, coded up a working implementation of a caching process that improved runtimes, and opened a pull request with a sufficient explanation of how their enhancement would make the app better. We reviewed the work and thought it was solid, but we didn’t end up merging it.

Why didn’t we want this improvement that would help developers and users? The answer is that none of the full-time data engineers knew much about how caching worked on the backend, and none of our other projects involved anything similar. We decided that none of us were likely to have the skills to maintain this code as the app evolved, and it would just end up causing problems or breaking down the line. In making this choice we prioritized maintainability over performance.

Conclusion

Bringing on two data engineering fellows presented new opportunities, but we could only take advantage of their skills if we found the right project and had the right project management perspective. Making sure our fellows were able to contribute to our projects and pursue their own goals required flexibility without losing sight of the bigger picture.

--

--