Implement A genAI Code Review Bot with Google Cloud Platform

Published in

neoxia

11 min readApr 24, 2024

Introduction

In the rapidly changing landscape of software practices, DevOps principles have become essential to drive collaboration and efficiency between development and operations teams. DevOps practices emphasize continuous integration, delivery, and deployment to achieve faster development cycles and improved product quality. However, with the increasing complexity of modern applications and the demand for faster response times, innovative technologies such as generative artificial intelligence (AI) are integrated to augment and streamline DevOps processes.

This article explores the intersection of DevOps and generative AI, highlighting how AI-powered tools could revolutionize code review practices. We will look into the capabilities of generative AI models to provide insightful code reviews and discuss the implementation of an automated code review solution leveraging Google Cloud Platform (GCP) services.

Generative AI in DevOps

DevOps emphasizes agility and automation by combining software creation and operational activities in continuous, transparent cycles of build, integrate, test, monitor, respond, deliver, and deploy . The process involves iterative software practices such as continuous integration, testing, delivery, and deployment, which promote faster, higher-quality product releases with less risk. Continuous integration ensures that developers have constant access to updated and validated code, facilitating concurrent coding efforts and avoiding delays. Continuous testing ensures code functionality in various environments, identifying bugs for improvement. Automated monitoring and feedback throughout the pipeline enables real-time problem identification and responsive actions, improving system reliability and security.

As technology advances, developers are increasingly using artificial intelligence, particularly generative AI, to automate tasks, analyze data for insights, improve user experience, and optimize creative processes of software. This trend improves efficiency and promotes innovation in software creation. Stack Overflow 2023 survey shows that 70% of responders have interest in using AI tools in their processes.

Along with growing developer interest, an increasing number of advanced models are being released, as evidenced by the code generation rankings below. Generative AI models, moving closer to the skills of skilled developers, continually improve their ability to understand and generate code.

Generative AI for code review in DevOps

Generative AI can go beyond simply creating code: it can help us understand it better.

Code review is an essential aspect of DevOps, crucial for maintaining software quality (look at XZ backdoor). Traditionally, this process involves manual code inspection through pull requests. However, this method suffers from drawbacks including differences in reviewer skill levels, delays in feedback, inefficiencies in balancing review depth with feature delivery, format inconsistency, and overall boring nature especially for long pieces of code.

Automated tools help, but current ones, based on static analysis, have limitations: they often provide disconnected feedback, are rigid and lack concrete solutions. Generative AI can improve this process by providing clear, natural reviews and helpful feedback, transforming the way we review code.

In this article, we will build an automated code review solution using generative AI. We will leverage Google Cloud Platform (GCP) which has strong privacy garantees regarding generative AI. GCP offers platforms such as Vertex AI and access to state-of-the-art generative AI models and even services to help Developers already like Gemini Code Assist. First, we’ll explore Gemini Code Assist and how it may help, then show how to build a custom tool using a Large Language Model (LLM) to automate pull request actions.

Gemini Code Assist : Developer Assistance Tool

In April 2024, Google Cloud Platform (GCP) launched Gemini Code Assist, a coding tool available for a monthly subscription of $19 per user. This innovative solution integrates seamlessly with your IDE and Google Cloud Console.

Gemini Code Assist, is an enterprise-level coding assistance solution, supports private codebases stored on-premises, GitLab, GitHub, Bitbucket, or across multiple repositories.

*Illustration: Gemini Code Assist in action. Source:* *Google*

The main features of Gemini Code Assist include real-time code completion and automatic generation of blocks or functions, code explanation, leveraging Google’s Gemini 1.5 Pro model. It also provides a chat UI to discuss with your code and troubleshoot issues. This tool is designed to improve project accuracy and efficiency by providing a comprehensive overview of the project and enabling code transformation via natural language prompts.

Using its chat UI, users can request code reviews, but this requires intentional effort on the part of the user. To adhere to DevOps best practices, we aim to automate this process. Despite its comprehensive feature set, Gemini Code Assist currently lacks triggers for pull request actions, which is our primary goal.

For people with specific needs, developing a tailor-made solution can be valuable. This leads us to introduce our pull request agent bot driven by generative AI.

Our solution will benefit users who:

Want to establish triggers beyond their IDE within their DevOps workflow
Desire for flexibility to explore various model options
Seek adaptability across multiple or custom platforms they use
Require the implementation of more personalized automated tasks

Now let’s move on to the code.

Pull Requests Code Review : Implementation from scratch

We propose a simple solution that you can connect to your code repositories that will be able to leverage generative AI for automated reviews on your code.

The key components needed for our application are:

GCP Project: This will serve as the foundation for our solution.
GitHub (account) and Repository: Essential for testing our code.

Setup:

Create a New Folder and Environment:

bashCopy code
mkdir review-bot
cd review-bot
python -m venv venv
curl <https://github.com/github/gitignore/blob/main/Python.gitignore> > .gitignore
git init
git add .gitignore

Create a GCP project (e.g., ‘test-review-bot’).
Ensure gcloud CLI is installed and initialized:

gcloud init

Activate the required Gemini API in your GCP project.

Accessing Google Gemini API:

If you’re in a supported country, create an assess Token , then use the following code :

export GOOGLE_API_KEY=<YOUR API KEY>

from langchain_google_genai import ChatGoogleGenerativeAI
import getpass
import os
if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Provide your Google API Key")

if __name__=="__main__":
  llm = ChatGoogleGenerativeAI(model="gemini-pro")
  print(llm.invoke("HI"))

For unsupported countries, we'll connect to GCP using our gcloud account and use this workaround to use Gemini:

gcloud auth application-default login

from langchain_core.language_models.llms import LLM
from typing import Optional, Any, List
from vertexai.generative_models import (
    GenerationConfig,
    GenerativeModel,
    HarmBlockThreshold,
    HarmCategory,
    Part,
)
from langchain_core.callbacks.manager import CallbackManagerForLLMRun# This may change as it is available in preview only
from vertexai.preview.generative_models import GenerativeModel, ChatSession
import vertexai
PROJECT_ID = "sandbox"
LOCATION = "us-central1"
vertexai.init(project=PROJECT_ID, location=LOCATION)

class ChatGoogleGenerativeAI(LLM):
    model: str
    chat: Optional[ChatSession]
    safety_settings: Any
    def __init__(self: Any, *args: Any, **kwargs: Any) -> None:
        super().__init__(*args, **kwargs)
        # Instantiating the ChatBot class
        self.model = GenerativeModel(self.model)
    @property
    def _llm_type(self) -> str:
        return "google_gemini"
    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        response = self.model.generate_content(
            prompt, safety_settings=self.safety_settings
        )
        return response.text

if __name__=="__main__":
  llm = ChatGoogleGenerativeAI(model="gemini-1.0-pro")
  print(llm.invoke("HI"))

We’re creating a custom LLM model called ChatGoogleGenerativeAI to interact with Google Cloud Platform (GCP) using its native APIs, specifically focusing on the Vertex AI Generative Models (Gemini) service. This code defines a class that inherits from LLM (Language Learning Model) and includes settings to specify the GCP project ID (data-sandbox) and region (us-central1) for Gemini instantiation. Inside the class, we initialize a GenerativeModel object to handle chat interactions. The _call method processes user prompts, generates responses using the Gemini model, and returns the response text.

Wonderful! You can now have access to Gemini model by GCP!

Connect to GitHub

In order to comment on GitHub, your application needs a token, it will be used by the GitHub API to validate calls and enable you to read code source (if not public) or post comments.

You can get an access token that is classic or fine-grained (for more restrictions on what the token can do). If you choose a fine-grained token, you will need to choose the read access to code and metadata, read and write access to discussions and pull requests. Once you have your token, you can export it to your environment variables.

export GITHUB_TOKEN=<YOUR TOKEN>

Let’s see how to retrieve code diff (code comparison from two branches) from a pull request. This code take the access_token, the repo and the branches that need to be compared to extract code diff. Create a utils.py file with this content:

from typing import Optional
from git import Repo
import tempfile

def get_diff_from_repo_changes(
    access_token: str,
    repo_url: Optional[str] = None,
    base_branch_name: Optional[str] = None,
    topic_branch_name: str = "main",
) -> str:
    diff_content = ""
    with tempfile.TemporaryDirectory() as tmpdirname:
        repo_url = repo_url.replace("https://", f"<https://oauth2>:{access_token}@")
        repo = Repo.clone_from(repo_url, to_path=tmpdirname)
        topic_branch_ref = f"origin/{topic_branch_name}"
        base_branch_ref = f"origin/{base_branch_name}"
        # Retrieve the commit hash where the HEAD is currently pointing
        if not base_branch_name:
            detached_head_commit_hash = repo.head.commit.hexsha
            base_branch_name = detached_head_commit_hash
        # Fetch the target branch from the remote
        repo.git.fetch("origin", topic_branch_name)
        # TODO: Ignore large config files diff like lock files
        diff_content = repo.git.diff(
            "{}...{}".format(base_branch_ref, topic_branch_ref),
            ignore_blank_lines=True,
            ignore_space_at_eol=True,
        )
    return diff_content
if __name__=="__main__":
    diff = get_diff_from_repo_changes(access_token="",repo_url="https://github.com/githubtraining/hellogitworld",base_branch_name="master",topic_branch_name="feature_division>")
    print(diff)

The get_diff_from_repo_changes function retrieves and returns the difference (or "diff") between two branches in a Git repository that are going to be merged. It uses a provided access token for restricted repositories. The function then fetches the specified branches (base_branch_name and topic_branch_name) from the remote repository and computes the diff between them. The result, representing the changes between these branches, is returned as a string. The example in the __main__ block shows how to use this function by specifying the repository URL, base branch (master), and topic branch (feature_division) to print the resulting diff.

In our example as the code is public no access token is required, you can test the code using :

python3 utils.py

Create a "server" to respond to web requests

A webhook is an HTTP-based callback function enabling lightweight, event-driven communication between two APIs. They facilitate the flow of data from one application to another, triggering automation workflows in GitOps environments. When a pull request is initiated on GitHub, a webhook will trigger the call to the function we will define. GitHub sends a request containing all relevant information about the triggered pull request.

To begin, we need to create a function that handles incoming requests.

We will have to deploy this function on Cloud Functions, the serverless service of GCP, then configure Githtub with the URL (webhook) that it should trigger when a pull request is created.

Let's create a main.py file to handle web requests:

from llm import ChatGoogleGenerativeAI
from utils import get_diff_from_repo_changes
import functions_framework
from flask import Request,HTTPException,Response,jsonify
from json.decoder import JSONDecodeError
from langchain.prompts import PromptTemplate
import requests
import os

@functions_framework.http
def review_github(request:Request)->Response:
    try:
        data = request.get_json(silent=True)
    except JSONDecodeError:
        raise HTTPException(status_code=400, detail="Invalid JSON data")
    topic_branch = data["pull_request"]["head"]["ref"]
    base_branch = data["pull_request"]["base"]["ref"]
    repo_url = data["pull_request"]["head"]["repo"]["html_url"]
    owner = data["repository"]["owner"]["login"]
    repo = data["repository"]["name"]
    pull_request_id = data["number"]
    access_token = os.environ["ACCESS_TOKEN"]
    diff = get_diff_from_repo_changes(
        access_token=access_token,
        repo_url=repo_url,
        base_branch_name=base_branch,
        topic_branch_name=topic_branch,
    )
    # text_splitter = TokenTextSplitter(chunk_size=2000, chunk_overlap=0)
    # chunked_documents = text_splitter.split_documents(
    #     documents=[Document(page_content=diff, metadata={"source": "local"})]
    # )
    template = (
        "You are an expert dev at Google known for expertise in best practices."
        "You have to do/make a code review on a diff file (where issues are detected) on changes done in code."
        " The diff of changes to review is :"
        "------------\\n"
        "{content}\\n"
        "------------\\n"
    )
    prompt_template = PromptTemplate.from_template(template)
    llm = GoogleGemini(version="gemini-1.0-pro")
    chain = prompt_template | llm
    review = chain.invoke({"content": diff})
    API_URL = "<https://api.github.com/repos>"
    url = f"{API_URL}/{owner}/{repo}/issues/{pull_request_id}/comments"
    headers = {
        "Authorization": f"Bearer {access_token}",
        "Accept": "application/vnd.github+json",
        "X-GitHub-Api-Version": "2022-11-28",
    }
    data = {"body": review}
    response = requests.post(url, headers=headers, json=data)
            # Check for successful request
    if response.status_code != 201:
        raise HTTPException(status_code=response.status_code,message="Something went wrong")
    else:
        return jsonify(status=response.status_code)

This function processes incoming requests to review code changes on GitHub pull requests. It extracts necessary details such as branch names and repository URLs from the request data. Using this information, it retrieves the code diff from the repository. The function then generates a review using an AI model based on the extracted diff and posts this review as a comment on the pull request via the GitHub API.

Let’s deploy our code on Google Cloud Functions, a Serverless service within Google Cloud Platform designed for deploying functions.

(IMPORTANT) To store the required libraries for our code, we use the following command:

pip freeze > requirements.txt

The Cloud Function will use this file to install required dependencies.

In GCP, give your default service account the Vertex AI User Role, it will the the service account used by your deployed function.

Next, deploy the function using the Cloud Functions CLI:

gcloud functions deploy review_github_func \\
--region=us-central1 \\
--runtime=python311 \\
--source=. \\
--entry-point=review_github \\
--trigger-http

Upon deployment, the URL for the function will be generated and accessible.

Configure Github and test it !

Let’s configure our webhook to call our server on pull requests events.

Navigate to your GitHub repository and configure a webhook for your project:

Go to Settings > Webhooks
Select the pull request event and add the URL of your Cloud Function
Click ‘Test’ to verify connectivity

Start by creating a new branch for your changes and initiating a pull request:

git add -A
git commit -m "time to test"
git checkout -b <branch_name>
// some changes
git add -A
git commit -m "Added changes"

Now, create a pull request. Your configured webhook should be triggered and communicate back to GitHub.

Congratulations! You’ve successfully developed an application capable of responding to pull requests.

Considerations

We reached the end of this article but here are some suggestions and thought on what has been done:

Responsibility : It is important to remember that although genAI can provide a lot of improvements, a human should always review alongside the machine. A tool always has limitations but it can lighten the burden.
Scalability: This code has been done as a Proof of Concept, don’t hesitate to challenge it and provide feedback in comments. Also, it may fail on some edge cases like very large diff, don’t hesitate to try models like Gemini 1.5 Pro for handling larger inputs.
Security: Since the Cloud Function URL is publicly accessible, consider implementing request signing or other security measures to restrict access. Also, to enhance security, consider creating a Virtual Private Cloud (VPC Service Controls) to isolate your Cloud Function from other resources in your project.
Extensibility : While we’re focusing on code review here, imagine deploying this AI bot in your project to help you with different code tasks, like explaining or summarizing code changes. The possibilities are only limited by what you can imagine (and what the model can do ;)).
Minor Improvements : For clarity and organization, consider using a dedicated GitHub account specifically for bot activities to differentiate automated actions from personal contributions.
Other ideas ? The comment section is open and don't forget to leave a clap if this article was interesting 👏🏾.

Conclusion

In conclusion, the fusion of generative AI with DevOps represents a paradigm shift in software development methodologies. By leveraging AI-powered tools for code review and automation, developers can accelerate development cycles, improve code quality, and ultimately deliver more robust and innovative software solutions. As AI technologies continue to advance, their integration into DevOps practices promises to redefine the future of software development, enabling teams to achieve greater efficiency, agility and competitiveness in the digital age.

References:

Code review - Wikipedia

Code review (sometimes referred to as peer review) is a software quality assurance activity in which one or more people…

en.wikipedia.org

About webhooks - GitHub Docs

Webhooks provide a way for notifications to be delivered to an external web server whenever certain events occur on…

docs.github.com

What is a webhook?

A webhook is an HTTP-based callback function that allows lightweight, event-driven communication between 2 application…

www.redhat.com

Managing your personal access tokens - GitHub Docs

You can use a personal access token in place of a password when authenticating to GitHub in the command line or with…