Analyzing Code Review Response Times with GraphQL and Chartify

12 min readDec 17, 2018

Code review — one of the most important tasks of a software engineer and often one of the most dreaded. “Do I really have to slog through these 3000 lines of subtle webpack configuration changes?” is a thought I can’t deny I’ve had in the past.

Yet for all of us as software engineers, code review is a critical way in which we help our coworkers produce high quality work, help each other’s knowledge grow, and help our products stay excellent.

At TrialSpark, one of our core values is to “Deliver Care”. This drives the mission we strive for: we want people around the world to have access to new and better medical treatments which they desperately need.

But not only do we want to “deliver care” externally as a company, we also want this principle to drive how we operate internally. As a member of our engineering team, I deliver care by asking “How can the engineering team serve others here at TrialSpark?” and “How can I care for and support my engineering coworkers?”.

And that means, as painful as it may feel for me to review that large webpack configuration change, I’m going to review it promptly and provide useful, kind, accurate feedback.

Code Reviews at TrialSpark

Our system of code review at TrialSpark is based around GitHub’s pull request review tooling. When a code author is ready to submit their work, they make a pull request on GitHub, wait for CI to pass, and then request reviews from a few team members. Those team members look over the code, submit comments, and either approve the code or ask for changes. Once the PR is approved, the code author can merge the PR.

One of my coworkers submitting a controversial PR which has been commented on, rejected, accepted, and is still awaiting review.

As a team, we have an informal goal for code review promptness: if someone requests a code review from you in the morning, try to submit a review by the end of the day. If they request a code review in the afternoon, try to submit it by midday the next business day.

This rule lets code authors have an idea of when their code will be reviewed by. It also provides flexibility for the code reviewer: some people like to do code reviews as soon as it’s requested, others like to put aside a set block of time each day to do code reviews.

In the end, we want reviewers to be able to concentrate on their coding and have times of deep focus while also encouraging prompt feedback for those waiting for reviews.

Figuring Out Our On-Time Review Rate

With the importance of code review in mind, I wanted to see how often, we, the engineering team at TrialSpark, were meeting our code review timeliness goals.

In this blog post, we’ll write some Python code to figure this out. All the final source code is available here. Our code will be structured in three main stages for our journey:

download_data.py: Query the GitHub GraphQL API to download a JSON of raw review-related data
transform_data.py: Transform the data from its raw form to a simplified form that indicates whether reviews are on-time or not. This is where the business logic will live
visualize_data.py: Visualize the transformed data to make it easy to see how often we make on-time reviews

As we walk through these steps, we’ll use the TypeScript repository for demonstration purposes since it’s a public repository anyone can query data from. At the end of this blog post we’ll loop back to TrialSpark and look at some results from our private repository.

Step 1: Getting Data from the GitHub GraphQL API

GitHub offers a GraphQL API for accessing the massive amount of software-development activity that happens on it. As we recently started using GraphQL at TrialSpark, this provided me with the perfect chance to learn more about GraphQL while discovering what our code review habits are.

Understanding a new API, whether it be a traditional REST API or something else, can often feel overwhelming. Figuring out how all of the data fits together and the requests that need to be made can be quite challenging.

Luckily, GitHub has a hosted API explorer which makes working with its GraphQL API much easier! Built-in documentation, autocompletion, syntax highlighting, and more make it pretty fun to experiment with the API.

When you write a GraphQL query, you define the structure of the data you want returned from the API. In the GIF below, there’s a GraphQL request asking for a repository object with a diskUsage field. You can see how the response we get mirrors that structure!

Turns out the TypeScript repo is almost a whole GB on disk.

Constructing Our GraphQL Query

To get the data in order to calculate on-time code reviews, we need to write a query that requests all of the different review-related events for a repository. With some trial and error, we can eventually build a query that looks like the following:

A slightly simplified version of our GraphQL query to retrieve review data.

There’s a lot going on in this query! Let’s break down what’s happening line-by-line. Our query starts by accepting three arguments (one of them optional):

query($repoOwner: String!, $repoName: String!, $prCount: Int = 50) {

It then uses these arguments to request the last $prCount pull requests from a given repository:

query($repoOwner: String!, $repoName: String!, $prCount: Int = 50) {
  repository(owner: $repoOwner, name: $repoName) {
    pullRequests(last: $prCount) {
      nodes {

Why do we have to request a nodes field inside of our pullRequests field? It’s because our pullRequests field will return a Connection, a typical GraphQL abstraction used for handling pagination. The nodes field will give us our list of actual pull request data, and for each pull request, we’ll request its title and its first 100 timeline items:

pullRequests(last: $prCount) {
  nodes {
    title
    timeline(first: 100) {

Pull request timeline items in the GitHub API represent anything that can happen during a pull request: a commit being pushed to it, a label being added, a review being requested, etc. This makes it the perfect fit for a GraphQL union type. To grab data out of a union type, we have to make use of inline fragments which specify a type we want data from and the fields to grab from that type. In our code, when we encounter a timeline node, we use an inline fragment on ReviewRequestedEvent to retrieve the time a review was requested and info about the requested reviewer:

timeline(first: 100) {
  nodes {
    ... on ReviewRequestedEvent {
      createdAt
      requestedReviewer {
        ...ReviewerInfo
      }
    }
    # more types of events handled here
  }

Finally, we make use of fragments to reduce code repetition when getting the username (or team name — on GitHub you can request a review from a team instead of an individual user) for requested reviewers. Fragments essentially let you use a code snippet in multiple places in one query. In our query, whenever we have a ...ReviewerInfo line, it’s equivalent to this:

fragment ReviewerInfo on RequestedReviewer {
  ... on User {
    login
  }
  ... on Team {
    name
  }
}

With our query fully constructed, let’s run it on a repository!

Pull request timeline events for the TypeScript repo

Hooray! We can see we’ve gotten back a list of pull requests from the TypeScript repo, each with a title and a list of timeline events about code reviews.

Finally, we can write a simple Python wrapper script around our query which will make a few small enhancements¹ and let us easily run our query and save the results locally:

python download_data.py microsoft typescript -o rawData.json

Step 2: Transforming Our Data

With the raw data now on disk, it’s time to transform it and visualize it! We need to write code that takes the raw JSON output from our data download script and writes out a more usable JSON file that contains the data in a simpler form. In our raw data, each pull request has a list of events that looks like this:

Someone asks Suzy and Joe for a code review in the morning, and they review it later in the day.

Our transformation script will correlate review events with their corresponding review request events² and produce a flat list with every review by every user and whether it was on-time or not:

Suzy’s review is on-time since it was before the end of the business day. Joe’s review is just a touch late.

We can codify our on-time goals fairly easily using Arrow and a few helper functions to simplify the time handling:

def get_due_time(request_time):
  if request_time < midday(request_time):
    return endofday(request_time)  next_business_day = request_time.shift(
    days=+days_until_next_business_day(request_time.weekday()),
  )
  return midday(next_business_day)def is_review_on_time(request_time, review_time):
  return review_time <= get_due_time(request_time)

We can run our transformation script on the output from the previous step:

python transform_data.py -f rawData.json -o data.json

And produce a JSON file of processed data:

[
  {
    "reviewer": "sheetalkamat",
    "status": "on_time",
    "time_due": "2018-11-07T14:00:00-05:00"
  },
  {
    "reviewer": "weswigham",
    "status": "late",
    "time_due": "2018-11-07T14:00:00-05:00"
  },
  // and more items here...
]

Step 3: Visualizing Our Data with Chartify

To finish up our analysis, we’ll generate visualizations of our data using Chartify, a new Python-based visualization library released by Spotify³. Our script will load the JSON file produced by the prior step, aggregate the data by user, and create a DataFrame. After that’s all completed, we’ll use Chartify to create a bar chart to visualize the data:

# once we have `data_frame`, visualize it with Chartify
ch = chartify.Chart(blank_labels=True, x_axis_type='categorical')
ch.set_title('On-time review rate (last 1000 PRs)')ch.plot.bar(
    data_frame=data_frame,
    categorical_columns=['user'],
    numeric_column='on_time_ratio',
).callout.line(
    0.75,
    line_dash='dashed',
)ch.axes.set_yaxis_range(0, 1)
ch.axes.set_yaxis_tick_format('0%')
ch.axes.set_xaxis_tick_orientation(['diagonal', 'horizontal'])
ch.save(args.output_filename)

We can run our script with our previous data from our transformation step:

python visualize_data.py -f data.json chart.html

And get a nice chart showing on-time review rates! In the chart below, we see the proportion of reviews that are on-time for each TypeScript contributor⁴:

How are we doing at TrialSpark?

Enough examples with the TypeScript team though, let’s cut to the chase and see how we’re doing at TrialSpark! How often are we meeting our goals of reviewing each others’ PRs in a timely manner?

It’s unrealistic to expect every review to be completed on-time — other critical tasks may come up at work, we all forget things now and then, or personal commitments may interfere. We can, however, aim for most code reviews to be prompt; our charts below will mark an ideal goal of at least 75% of code reviews happening on-time⁶.

Let’s grab the review data from the TrialSpark monorepo and see how we’re doing!

python download_data.py trialspark spark -o raw.json
python parse_data.py -tz America/New_York -f raw.json -o data.json

We can take advantage of a feature in our visualization script which lets us break down the data by different subgroups:

python visualize_data.py -f data.json -g teams.json chart1.html
python visualize_data.py -f data.json -g roles.json chart2.html

Our engineering team is broken down into three whimsically-named sub-teams, “Wildfire”, “Hippo”, and “Goldfish”, each focusing on different business areas. While we share tech stacks and general coding practices between teams, each team has its own product and its own way of operating internally. Analyzing our on-time code review rates by team, we see that our “Wildfire” team (the team responsible for internal tools) has an incredible rate of doing code reviews on time⁷, shooting way above our goal of 75% on-time reviews!

On-time review rates for TrialSpark, broken down by product team

Broken down by role, we see that our engineers are the most likely to review PRs on time, which is what we’d expect since the code-review goal is primarily for engineers:

On-time review rates for TrialSpark, broken down by job role

Wrapping Up

You can run these analytics for your own team as well! Check out the repository of code for full usage instructions and examples.

There are all sorts of ways to expand on this analysis: we could chart on-time reviews over time and see if we’re improving as a company. We could also chart on-time review ratio against number of requested reviews to see if those with lower on-time rates are feeling overwhelmed by the quantity of reviews.

This data could also be put to use in ways beyond visualizations; for example, a slackbot could gently remind reviewers when they’re late with a review.

It’s important to note that at TrialSpark we would never use a metric like on-time review rate to evaluate an engineer’s performance or compare against others. Metrics are easily gamed and can be misleading. An engineer leaving high-quality feedback a bit late is much better than an engineer blindly approving any and all PRs. We have a responsibility to be thoughtful with how we use data.

This blog post has primarily touched on the idea of supporting each other by doing code reviews promptly. Conversely, the one writing code also needs to keep the reviewer in mind by:

1) Leaving a clear and concise description of the changes
2) Attaching screenshots and gifs of visual changes
3) Linking to the appropriate design docs related to their change which give broader context for their work
4) Properly documenting their code and commenting on especially confusing parts

Good code review in general is a broad topic for another day, but it should suffice to say that this post is only scratching the surface of what good code review entails.

These metrics are useful, though, to help us consider if we’re appropriately supporting each other through code review⁶. And if we find that we’re rarely leaving reviews on-time, perhaps we could consider our day-to-day schedules and how we can support each other as much as possible.

Footnotes

[1]: There are a couple important improvements in our final Python script as compared to the query described in the main blog post. Instead of using the timeline field, we now use the experimental timelineItems field, which lets us pre-filter our list of timeline events to only the types we care about. This prevents us from receiving empty objects in our list of timeline events. (Unfortunately, experimental fields don’t work in the GitHub API explorer.)

In addition, GitHub limits queries to requesting data on 100 pull requests at a time. Our script handles this by adding some pagination logic in order to be able to pull more than 100 PRs for a repo.

[2]: It’s slightly more complex than this — we also look at any time a review request is cancelled and any time a PR is closed or merged. Say a PR is merged before the expected time for a reviewer to respond. This review then is neither on time nor late, it’s simply not present.

[3]: Why Chartify? Well, I wanted a library which could easily make grouped bar charts. I tried plot.ly initially, but found it a bit cumbersome. The Chartify announcement blog post had a grouped bar chart in its introductory image and I was sold! Plus, it’s always fun to experiment with newly released libraries.

[4]: Admittedly, it’s completely unfair to expect the TypeScript team to have “on-time reviews” since it’s an idea we defined for ourselves at TrialSpark. It still is interesting to see how promptly other teams like to operate!

[5]: ahejlsberg, aka Anders Hejlsberg, is the architect of C# and TypeScript, one of my software engineering role models, and an all-around great guy as far as I can tell. If you’re interested in his work, specifically TypeScript, he has a great video out about compilers and how TypeScript is special.

[6]: Aiming for 75% on-time reviews is an arbitrary goal I made up for the purpose of this blog post and doesn’t reflect anything we’ve discussed at TrialSpark. It seemed reasonable though!

[7]: Charting by both team and individual could lead to misleading data (per Simpson’s Paradox perhaps), but in this case the data is accurately represented. One could imagine an alternative scenario where individuals with high on-time response rates only completed a few reviews, and individuals with low on-time response rates had been involved in many reviews. In this situation, our chart could be misrepresentative of what’s actually going on.