More enjoyable Code Reviews with Gemini

Published in

Google Cloud - Community

12 min readJul 26, 2024

Code reviews are a widely adopted practice and an integral part of any professional software development workflow today. They play an important role in ensuring the quality, performance and adherence to agreed upon coding standards within an organization or team of developers. Many companies, including Google, require code reviews for any proposed change to their code base. When conducted effectively they are a popular tool to preserve the quality of the codebase and also provide a good opportunity to upskill new developers as well as share knowledge among the team.

This blog post discusses several aspects of using generative AI in code reviews by augmenting them with valuable insights and additional information. Our objective is to explore whether this technology can alleviate some of the current challenges and overall accelerate the turnaround time of pull requests. Rather than completely eliminating the human reviewer’s involvement, the aim is to explore how generative AI can augment their capabilities, allowing them to focus on more intricate aspects of code reviews while reducing their cognitive load.

High-Level Solution

To determine the potential applications of generative AI models in the code review process, we start by decomposing the process into its key steps:

Context of the Code Changes: Collecting enough information to meaningfully reason about code changes is usually done at a tooling level. Usually the context is represented in the form of diffs that represent the suggested changes as well as links to a snapshot of the full codebase to understand how the proposed changes integrate with the rest of the code.
Reviewer assignment: The code review can be assigned to be reviewed by a specific engineer based on their knowledge of the codebase, previous commits or even via explicit ownership over a section of the code e.g. via a Codeowners file.
Understanding proposed changes: The designated reviewer will first try to understand the intent behind the proposed changes. Between the associated PR descriptions, commit messages, linked issues and of course the code changes the reviewer tries to understand what the author tried to achieve.
Review Assessment: This is the primary goal of the code review. The reviewer assesses whether or not the suggested changes are ready to be merged into the upstream codebase. If they agree that proposed changes are good to go, they provide the LGTM (looks good to me) approval that signals that the review is concluded.
Provide code review comments: The reviewer tries to come up with actionable suggestions on how to improve the proposed suggestions based on their experience and the agreed upon standards.

As shown in the schematic view above, a code review process usually is not a linear process but includes iterations as the reviewer and the developer collaborate on resolving the code review comments.

In this post we want to focus mainly on the core code reviewer activities on the right hand side but also discuss what resources can be provided to make that process more efficient and enjoyable.

Obviously, generative AI can also enhance the performance during the development activities shown on the left. To learn more about generative AI-assisted software development you check out the following content:

Getting Started with Gemini Code Assist

Google Cloud’s always-on Coding Assistant

medium.com

Codelab: Using Gemini Code Assist to explore and enhance Generative AI Document Summarization Jump…

Google Cloud Jump Start solutions are pre-built sample applications and infrastructure best-practices that you can…

medium.com

Code Review Context

In order to conduct a thorough code review the reviewer needs enough context to understand the proposed changes and address them accordingly. The review context usually starts at the actual line changes in the form of a diff that shows the content that was added, modified or removed compared to the previous version. In addition to the raw changes a reviewer often needs to incorporate knowledge about the broader code base and environment that the changes apply.

For instance, a suggested change to a function may reference a field in a structure that isn’t part of the modified code since it was introduced earlier. Even though the proposed modification isn’t directly related to this field, it might still be necessary to consider it when evaluating the code changes and their compatibility with the original intent.

In the context of AI-assisted code reviews, knowledge of an enterprise’s existing code libraries, overall coding standards and other aspects of their software development practices can be crucial. This information, which human reviewers often apply intuitively, can be explicitly incorporated to enhance the review process. To achieve this, relevant code snippets from the same or other repositories can be added to the prompt, thereby increasing the relevance and accuracy of the outputs.

To try this in real life, let’s pick an arbitrarily chosen pull request from the Kubernetes project and consider the following Git Diff as the context.

In the initial experiment, solely the Git diff is incorporated in the prompt. This Git diff describes the code changes of the PR. By leveraging Gemini 1.5 Flash, a prompt similar to the one outlined below could be employed (omitting the actual Git diff for the sake of brevity):

You a senior software engineer. You are asked to provide a code review with 
helpful suggestions for the following pull request.
Make sure you only focus on the code referenced in the diff.
Be concise and focus on the most impactful suggestions only.

### START of Git diff of the PR
[Diff content from  https://patch-diff.githubusercontent.com/raw/kubernetes/kubernetes/pull/109939.diff here]
### END of Git diff of the PR

When you run this prompt against Gemini e.g. in Vertex AI studio you’ll see that the resulting suggestions are mainly focussed around the readability of the code asking for more documentation and comments to explain the functionality.

Let’s compare this to a version of the same prompt with an additional context of the full files that are changed in the prompt:

You a senior software engineer. You are asked to provide a code review with 
helpful suggestions for the following pull request.
Make sure you only focus on the code referenced in the diff.
Be concise and focus on the most impactful suggestions only.

### START of Git diff of the PR
[Diff content from  https://patch-diff.githubusercontent.com/raw/kubernetes/kubernetes/pull/109939.diff here]
### END of Git diff of the PR

For reference only: here's a snapshot of the files with the changes applied:

### START File Contents after the changes ###

####  File: build/common.sh

[file content here]

####  File: build/lib/release.sh

[file content here]

####  File: build/release-images.sh

[file content here ...]

### END File Contents after the changes ###

With the additional file content added to the prompt we can now get to more targeted recommendations that allows us to see the changes in the context and come up with suggestions that are more specific to the actual changes in the PR.

With the increase in context window of newer language models like Gemini 1.5 Flash and 1.5 Pro you can obviously spin this much further and include several additional files that could potentially be relevant in the context of your prompt. To determine the context scope boundary, walking a dependency graph and cutting off at a certain depth could be a relatively simple but effective strategy.

Context cut-off based on a dependency graph

PR / Reviewer Matching

In addition to collecting the context of a PR we also need to identify an appropriate reviewer who is knowledgeable about the codebase and able to provide helpful review feedback. In many pull request workflows the underlying tool is able to leverage heuristics of some sort to come up with a shortlist of suggested reviewers. This is typically done through looking up the respective users in a code owners file, the history of the files that were edited or through other rule-based approaches.

In addition to these approaches it could also be interesting to leverage a vector embeddings based search approach where the code changes are matched against pre-indexed code snippets to identify experts with prior knowledge of similar problems even if they are not directly related to the code that is currently being worked on. For example a pull request might contain code related to a complex authentication process within a frontend application. The traditional approach of using code owners and historical context might yield reviewer candidates with experience with the overall frontend application but limited knowledge of the authentication process. By comparing the vectorized PR context with existing code snippets the process might also identify individuals within the reviewer pool that have deep experience with the specific authentication framework and be more suitable to provide expert advice and guidance.

Lower Cognitive Load during Code Reviews

With the context of the proposed change and the reviewer in place we can now move to the core tasks of a code review process. In this step a reviewer tries to come to a conclusion whether or not the current state of the proposed changes is ready to be merged back into the main codebase.

Of course the code review process is highly individual and depends on a range of factors including a reviewer’s familiarity with the codebase, the complexity of the proposed change, the structure and test coverage of the code and even the programming language used. However there are certain aspects of pull request that can lower its review complexity.

From our own experience we know that large pull requests or pull requests that bundle a number of unrelated changes are much harder to review than smaller targeted changes. There is also good empirical evidence to support the claim that the reviewer’s ability to detect defects and other code issues goes down as the volume of a pull request crosses the threshold of a couple of hundred lines of code. Intuitively this makes perfect sense. The bigger the pull request is, the higher the cognitive load of keeping track of all the changes and trying to make sense of it all in the first place.

The limited resource that is the attention of a reviewer is a great starting point for leveraging generative AI. After all, the concept of attention is what kicked off this new wave of possibilities in the first place. In practice this means that we should explore how we can use LLMs to help a reviewer in the following review substeps:

Understand the proposed changes in a PR

This can be a trivial task especially for small PRs with only a couple of lines of code and with a good description in the PR title and body to guide the reviewer. As the pull request increases in size, understanding the intent as well as the nuances of the implementation requires a significant time investment to read the changes and try to correlate the code with the PR description.

Our first step in helping the code reviewer better understand a code change is therefore to provide them with a comprehensive summary of the proposed changes and the meaning behind them. To do this we could craft a prompt with the instructions to generate a summary and the pull request context like so:

You are an experienced software engineer.

Provide a complete summary of the most important changes of the 
pull request based on the following Git diff:

{pr_git_diff}

The important parts here are:

Setting the tone and defining a persona that should be used to respond to the request.
The task you want to achieve
The code review context as described above

Obviously the prompt used here is just a starting point that can and should be tuned for your specific use case and needs.

Automatically generate PR comments

To take the code review a step further we can also use the same process as before to generate more actionable recommendations that can be included in the pull request comment thread. Compared to just summarizing the changes of a PR the generated output can directly as input to the PR author and give them valuable insights into how to improve their code. With this we not only help the reviewer to surface potential issues but we also reduce the time between when the PR is opened to the first round of feedback. As delayed PR reviews can hinder software delivery performance, having the ability to gain immediate feedback and thus shortening the feedback cycles is often a valuable improvement to the overall process.

The prompt to generate PR comments is similar to the previous one except for the task description:

You are an experienced software engineer.

You are tasked to review a pull request from one of your peers.
You only comment on code that you found in the pull request diff.
Provide a code review with suggestions for the most impactful 
improvements based on the pull request in the following Git diff:

{pr_git_diff}

End to End Flow

Let’s put everything together and explore how the gathering of the pull request context and generating the respective artifacts can be automated in a CI process. The real-world implementation of this is quite flexible and will depend on your CI automation tooling of choice. To make the example here more concrete we’re looking at a serverless Cloud Build pipeline that combines the different steps of generating comments on a PR that was opened. This pipeline is automatically triggered on the creation of a PR as well as on code changes to it. To help with the readability and allow for reusability of PR assistance tooling we’re leveraging an abstraction that we call friendly-cicd-helper. This provides a simple CLI over the Vertex AI API as well as over the APIs used to comment back on the pull request thread in the source code management. It can also be containerized and used directly within a Cloud Build step.

In our example here we’re using the following two operations in the friendly-cicd-helper

friendly-cicd-helper vertex-code-summary which uses Gemini in Vertex AI to create a concise summary of the changes proposed in a PR.
friendly-cicd-helper github-comment to take the generated text passed via STDIN and attach it to the PR thread in Github (similar CLI functionality also exists for Gitlab).

steps:
  - id: Generate Git Diff
    name: gcr.io/cloud-builders/git
    entrypoint: 'bash'
    args:
      - '-c'
      - |
        git fetch origin
        git diff origin/main --output /workspace/diff.txt
        cat /workspace/diff.txt
  - id: Using Vertex AI to generate PR Summary
    name: 'europe-west1-docker.pkg.dev/$PROJECT_ID/default/friendly-cicd-helper'
    entrypoint: sh
    args:
    - -c
    - |
      export VERTEX_GCP_PROJECT=$PROJECT_ID
      echo "## Automated Merge Request Summary (generated by Vertex AI)"
      friendly-cicd-helper vertex-code-summary --diff /workspace/diff.txt | tee -a pr-summary.md
      printf "\n\nView details in Cloud Build (permission required) https://console.cloud.google.com/cloud-build/builds/$BUILD_ID?project=$PROJECT_ID" | tee -a pr-summary.md
      cat mergerequest-release-notes.md | friendly-cicd-helper github-comment --repo my_org/my_project --issue $_PR_NUMBER
    secretEnv: ['GITHUB_TOKEN']
...

From a reviewer perspective the output will look something like the screenshot below where they are presented with a summary of the PR. With the aggregated summary at hand this will hopefully reduce the time it takes them to understand the extent of the PR and preserve mental capacity to perform a thorough review of the actual changes.

automatically generated PR summary added as a comment

How to measure quality and impact

The integration via the source code repository comments allows for a simple feedback loop e.g. via the built-in emoji reactions. Here the team can for example agree to use thumbs-up and thumbs-down emojis to provide qualitative feedback on the generated content:

using Emoji reactions as feedback on the generated content

Impact on the pull request turnaround times can be measured quantitatively by comparing the time a PR is open e.g. difference in timestamps between opening and merging a PR with and without AI assistance (1). We can also look at the time between the PR being created until the first change is applied by the PR author to get a measurement of the impact of feedback turnaround times (2). Additionally the number of iterations between reviewer feedback and PR author changes (3) can be an interesting metric to keep track of and compare to the control group.

measure the impact of generative AI tooling on the PR

Conclusion and next steps

Generative AI has the potential to substantially improve how we approach code reviews. By leveraging its capabilities, we can reduce the cognitive load on reviewers, improve the quality of code, and shorten the feedback cycle during a pull request. To explore how you can embed generative AI into your own code review processes, start by experimenting with the prompts, models and techniques discussed in this article. Identify the most promising steps in your pull request process that could benefit from this technology and automatically trigger the generation of helpful explanations and other code review assets. We’ve literally just started to scratch the surface, and the benefits could be transformative.