Managing Engineering Productivity

5 min readJan 16, 2024

Around mid-year 2022, tech companies began tightening their belt and closely monitoring software engineers’ pull request (PR) counts. This was after 10 years of a bull market run, in which the employees had all the bargaining power and companies had accumulated a lot of bloats. PR count, or diff count, is a term that describes the number of code snippets merged into the main code base and has been a standard way of measuring the coding throughput in the software industry.

👋

The Problem

In 2022 when Uber announced it was going to start performance-managing by the PR count (besides other measures), there was a lot of dissent and pushback from the individual contributors (ICs). From their standpoint, the problem was two-fold: the count constitutes the quantity aspect and there were no standard measures for the quality aspect; there was too much variance in the counts across teams of different verticals and types.

The pushback was so strong that some employees chose to transfer to a different org or leave the company. At first, I was empathetic with the plea of the ICs, in that they could be objectified into a number and that there was not a clear normalization method to fairly compare them across different teams. Then, I realized this renewed emphasis on the PR count is beneficial to the line managers, as it takes away the arbitrariness that had always existed in performance evaluation and exposes those who were coasting.

The challenge for the line managers is how to fairly measure the PR count as it relates to the performance evaluation, knowing that this is a double-edged sword that could either lift a whole team’s performance or kill its morale.

💪

I use a four-pronged approach to answer the challenge:

segregating PR count expectations by career level
exploring and exhausting coding opportunities
optimizing PRs
computing the org average and median as the benchmark

Segregating PR count expectations by career level

First, we have to decide how to allocate the engineering staff to attain maximum impact. The rule of thumb is to let the more senior staff — staff, senior — work on broadly impacting architecture (that has downstream consequences and a wide blast radius) and the more junior staff — intermediate, junior — work on mostly assigned, independently executable tasks.

In this way, the more capable and more experienced staff could focus on the more impactful aspects of a project as well as leading the execution thereof, whereas the more junior staff could hone their coding skills by carrying out the implementation. This plays to their strengths and they should be evaluated as such. Typically, the intermediate staff should have the highest PR count, followed by junior, senior, and then the staff engineers.

Exploring and exhausting coding opportunities

Second, we need to ensure sources for coding are identified and that there is a process with which the staff knows where to look for coding opportunities. The below depicts typical coding sources: the top portion is internal and the bottom is external.

The internal sources are basically the software development lifecycle, beginning with project work, through post-release operations, to continuous improvement thereafter. The external sources constitute all unplanned things that the team is obligated to do.

Optimizing PRs

Is a PR with 1000 lines of code change acceptable? Is a PR without unit tests acceptable? Should a PR be submitted for review after a feature is fully flushed out? The answer is, a PR must be independently contained and reviewable — ideally under 75 lines of code with a full unit test coverage. This not only helps the code reviewer with readability but also the author in breaking down a problem into logically relatable chunks.

It also follows that related changes should be stacked, so the author doesn’t need to accumulate a lot of unreviewed changes and the artifacts from each stacked PR will help the reviewers understand the intention and how it related to the previous PR.

Computing the org average and median as the benchmark

The last thing is data collection. First, know what to compare to. Product teams usually have more stakeholders and more non-coding work than Platform teams. Full-stack teams usually require more internal coordination than purely back-end or front-end teams.

Second, compute the PR count mean and average across the org and the company, while filtering for the type of teams you want to compare your team to. Then compute the same for each of the career levels — staff, senior, intermediate, and junior.

At the team level, if your team’s PR count average is below the org’s and the company’s above the standard deviation, that’s a big red flag to rectify. At the IC level, let them know where they are in comparison to the rest of the company.

🙌

When the PR count as a performance metric is done right, individuals are self-motivated to find coding opportunities. The incentive is to outperform and get promoted; the disincentive is to get caught for coasting and get managed out.

Engineering leaders must empathize with the ICs, in that everybody has a transparent metric and there’s tremendous pressure to perform, as well as coach them on how to code more correctly, efficiently, and frequently. The review of staff’s PR count is an opportunity for conversation about the feasibility of the expectation, any help needed, and room for growth.

At Uber, there is a mantra of “a diff a day” (diff is another word for PR). Its express goal is to encourage engineers to pump out at least a PR per day as a unit of a day’s work. It forces folks to have ful clarity on what they expect to do on the following day, to work without distraction during the day, and to submit the code change as they close out the day. It is a bar of excellence all engineers should aspire to.