Better Pull Requests using Ollama and Llama2
We at Moss are very excited about recent advances in the AI world, especially those that can increase developer productivity and improve the engineering experience. The new developer toolbelt should now include one or two AI-backed tools to increase productivity, speed up investigation, and explore new ideas.
While engineers might be eager to jump into the next cool tooling, they must be aware of an aspect that can put their own company at risk: Privacy. Terms of use and privacy policies are very often obscure, and with new tools every day, we should be in control of the privacy of data we use with these AI capabilities.
This post will explore Ollama as one option to deploy LLMs inside your network (or even on an engineer’s laptop) so that your data never leaves your network perimeter.
ChatGPT is one of the most discussed topics in 2023, along with its bandwagon of replicas you find at Hugging Face. Companies like Meta (LLaMA), Google (PaLM 2), and Microsoft (Orca), to name a few, are also pushing Large Language Models everywhere, giving a further dimension to the disruption the LLMs are creating.
Using a chat tool is pretty straightforward; you type your questions, and the model will reply with good enough accuracy and detail everything you need. It should also work if you copy and paste a piece of your code and ask for an explanation of the code. Here starts the tricky part: copying and pasting your company’s code might not be the best idea when you are unsure what an AI provider will do with the code.
Ollama is a great option when it comes to running local models. Their library offers a dozen different models, and Ollama is very easy to install. Go to https://ollama.ai/ and download the installer.
After the installation, you should be able to run ollama run llama2:7b
. The prompt opens in your terminal, and you can type /help
to see some info:
>>> /help
Available Commands:
/set Set session variables
/show Show model information
/bye Exit
/?, /help Help for a command
Use """ to begin a multi-line message.
You type just like any chat tool, and the model will answer. You can paste your code as well. Here an example:
```bash
>>> """
… Please explain the python code below:
…
… def fibonacci(n):
… a = 0
… b = 1
… if n < 0:
… print("Incorrect input")
… elif n == 0:
… return a
… elif n == 1:
… return b
… else:
… for i in range(2, n+1):
… c = a + b
… a = b
… b = c
… return b
…
… print(fibonacci(9))
…
… """
> Certainly! Here's an explanation of the Python code you provided:
The `fibonacci()` function takes a single argument `n`, which is the number for which you want to compute the Fibonacci sequence. Here's what the function does:
1. It initializes two variables `a` and `b` to 0 and 1, respectively. These will be used to store the first two terms of the Fibonacci sequence.
2. It checks if `n` is negative or zero. If it is, the function prints an error message and returns.
3. If `n` is zero, the function simply returns `a`, which is the first term of the Fibonacci sequence.
4. If `n` is one, the function returns `b`, which is the second term of the Fibonacci sequence.
5. Otherwise, the function enters a loop that iterates from 2 to `n+1`. Inside the loop, it computes `c = a + b` and stores the result in `a`. Then it computes `b https://ollama.ai/library/llama2= c`, effectively replacing the previous value of `b` with the sum of `a`
and `b`.
6. The loop continues until `n` is reached. When `n` is reached, the function returns the final value of `b`.
The `Driver Program` at the end of the code calls the `fibonacci()` function with the argument `9`, which means it should compute and print the ninth term of the Fibonacci sequence (i.e., 34). When you run this code, you should see the
output "34" in the console.
One can access Ollama in a GPT-like interface straight on your browser using Ollama Web-ui:
But developers don’t chat in a textbox; they use IDEs, creating a new niche of tools tailored to boost developer brains and arms. You can leverage Ollama directly from IntelliJ using Continue, allowing a dev experience of manually selecting code and asking questions/edits. And if you are in the auto-completion mood, Llama Coder on VSCode is another excellent option.
Pullama
Now that we can run a local model and guarantee our privacy, let’s put Ollama and llama2 (by Meta) to the test by creating a git diff summarizer to help you write better Pull Request descriptions: pullama.
More often than not, we overlook PR descriptions. PR descriptions can be anaemic, shallow, and don’t tell much more than a link to a Jira ticket. But the ticket talks about requirements, not the solution. A minimal summary of the changes and a review path suggestion can improve the reviewer’s experience.
Pullama uses the qdrant vector database to store different content and then provides this database as a retrieval source for RAG. It looks like this:
The implementation is straightforward without sophisticated reranking or deep memory techniques, yet the results are very useful. Take for example this Pull Request (#59) in one of our public repositories and check out the output of pullama using llama2:
Summary: This pull request adds new features to the Metabase collections support, including the ability to update collection permissions and handle collection graph permissions. It also fixes some bugs and improves the overall performance of the client.
Additions:
* Implemented a new feature to extend Metabase collections support (#59)
* Added build instructions for darwin_arm64 arch
* Added support for collections
* Implemented retry mechanism for updating collection permissions
* Refactored code to use maps instead of lists
* Improved performance by removing unnecessary data from the client
Updates:
* Fixed reading collection permissions by handling cases where the graph is None
* Removed test file for collectionGraph
* Moved mutex out of Client object
* Made backoff exponential
* Added mock server for graphUpdate
* Updated metabase/resource_collection.go to include new feature
Deletes: None
Review order: Start by reviewing the changes in client/client.go, as they are the most significant. Then move on to client/collection.go, client/collection_graph.go, and finally metabase/resource_collection.go. It is important to carefully review each file to ensure that the changes are accurate and do not introduce any potential business impacts.
The summary is consistent with the code changes and gives much better insights into what’s happening in the PR when compared to the original description. Investing in proper prompt engineering could help squeeze the model to output more precise information.
Diffs can have a considerably large changeset, so the diff itself was added to the vector database. The prompt then takes the file names and commit messages so a search can be as assertive as possible. The embeddings are done with FastEmbed, saving several extra calls to the model and producing the vectors locally. It is worth noting that we haven’t tried several different embedding algorithms or fine-tuned the parameters. We just went with the simplest and fastest solution for now.
Should it analyze the whole codebase?
A more audacious approach is to make the model analyze the impact against the whole codebase. As you can see in DiffChain#check_impact
, a portion of the code is doing it rudimentarily.
The challenge is that the document loader is a `DirectoryLoader`, instead of a generic loader using parsers for the specific language in the Pull Request. Because the files are loaded without language knowledge, there’s no special treatment of the code base from a language semantics perspective. The lack of understanding of the code base language impacts how the retrieval will fetch documents locally, affecting the model output. A limited search on the similarity database outputs a “I don’t have enough information to do the analysis” result.
Neither Ollama nor llama2 are to blame for the limited whole codebase analysis. Langchain does not support document loading with a Java parser yet. When it is available, we can give it a second try. Or even better, we can polish the impact through agents.
For the future
Currently, pullama can be used locally by cloning the repo or using the pypi package. Use
pipx install pullama
For convenience, create an alias like this:
alias gpullama='TOKENIZERS_PARALLELISM=true pullama -r . -s $(git rev-parse - abbrev-ref HEAD) -t master'
# and use in your branch
gpullama
Developers can generate their pull request descriptions before creating one. However, after refining the tool and doing proper AI engineering, we are considering deploying Ollama to Kubernetes and integrating pullama into our pull requests via Github Actions. For now, pullama is a great tool to help give more precise and insightful descriptions for your pull requests.