ChatGPT — Commenting Your Code With ChatGPT? (4/4)

Markus Lindner
arconsis
Published in
5 min readJun 2, 2023
Image from Pexels.com

As the year 2022 drew to a close, a new era of excitement and innovation began with the release of ChatGPT by OpenAI. From tech enthusiasts to industry experts, people around the globe are exploring the full range of capabilities offered by this cutting-edge technology. In this series, we’re taking a look at different aspects of the new AI tool to evaluate how it could help developers in their daily life

In our previous articles, we delved into the world of ChatGPT and gained a comprehensive understanding of its capabilities. If you haven’t checked them out yet, you can start reading them here:

Now, it’s time to put those capabilities to use and see how ChatGPT can assist us in our software development endeavors. We had the opportunity to implement ChatGPT in a real-world scenario by using it for an existing web/typescript project (https://gitlab.com/games.bluber/aecs — an open source entity component system library) with the goal of adding comments into the code.

The Setup

In order to effectively utilize ChatGPT for providing code comments, we found it necessary to develop a simple nodeJs script that iterates through the project files, sending the contents of each file to ChatGPT in separate requests. To ensure that the correct information was provided, we crafted a specific prompt for ChatGPT which contained instructions to include not only the comments for each function header, but also to take into account the original source code and import statements in the response.

“Can you write a comment about each function header of this class and provide the whole class again with comments? Please also put the import statements on top of the class back into the response. <code>”

Each received response is then integrated into the original file. In theory, this should give us a final product that is identical to the original but with added comments.

The Execution of the Experiment

For the first execution, the model “Text-Davinci” was employed to generate the comments for the code and a token limit of 2,500 as well as a temperature of 0.75 was chosen. The token limit defines how much text can be contained in the request. The temperature basically works as a creativity value — with 0 meaning not creative and 1 very creative. The project that was utilized for testing comprises of 26 typescript files (not including the tests), which makes a total of 1074 lines of code. To conduct the benchmark, the contents of each file were uploaded to ChatGPT. The duration of each request ranged from 2 to 86 seconds, depending on the file size and maybe also on the complexity the code contained. The entire benchmark was repeated a few times, with the total time ranging of 630 to 835 seconds. The timing may fluctuate due to the workload ChatGPT is managing at the moment.

For the second iteration of the experiment, the specialized code model “code-davinci-002” was employed. A token limit of 2,500 and a temperature of 0 was selected for this run. (Using temperatures higher than 0 resulted in bad requests when using Code-Davinci). This time, the benchmark took significantly more time than before with a range between 1356 and 1678 seconds and each request taking between 10 and 91 seconds.

The Results

When it came to commenting, the text-focused model proved to be superior to the code-focused model. However, the results for both approaches fell short of expectations.

Using “Text-Davinci,” ChatGPT was able to accurately complete the task for approximately 80% of the files. Nevertheless, there were instances where the entire class was commented out using “//” or the syntax was disrupted due to missing brackets or other small errors. Furthermore, the comments were placed at different locations, sometimes within the code and other times at the top of the file. In contrast, “Code-Davinci” performed significantly worse with most responses being empty or causing severe damage to the class’ syntax.

The quality of the comments also had a wide variety — from simple comments starting with slashes and explaining obvious things to using typedoc or multiline comments. All in all the documentation feels like multiple developers would have written it. Down below are some examples illustrating the different approaches ChatGPT did:

//private field "someVariable" of type "SomeType"
private someVariable: SomeType

//@Getter for SomeType
/*
Function to be called when the component is removed
//NOTE HERE CHATGPT IS UNDERSTANDING THE CONTEXT OF THE CLASS
*/
public onRemove(): void
/**
* Removes entity from the EntityManager
* @param id ID of the entity to remove
*/
public removeEntity(id: number | string): void
// ComponentItem interface
/**
* Interface for Component items
*
* @export
* @interface ComponentItem
* @template T
*/
export interface ComponentItem<T> {
value: T;
id: number | string;
}

In one instance ChatGPT interpreted an empty class called Store, added functions that were not there in the first place and then commented on top of the self created code.

//Original class
export abstract class Store {}

//Changes made by ChatGPT (Note that the class Product never existed in the first place)

//This is an abstract class called Store
import { Product } from './product';

export abstract class Store {
//This function adds a product to the store
addProduct(product: Product): void {

}
//This function returns an array of products in the store
getProducts(): Product[] {
return [];
}
//This function checks out the products in the store and returns the total cost
checkout(): number {
return 0;
}
}

Conclusion

After some heavy testing and spending a lot of time with our buddy ChatGPT, we have to come to the conclusion that it has not yet reached its full potential at this place — at least for comprehensive tasks like the one illustrated in this article. However, there are plenty of opportunities for smaller tasks where GPT can be of assistance to the developer in their day-to-day work. Also it is important to keep in mind that it will only get more efficient as time moves on. But let’s see what ChatGPT has to see itself for conclustion:

In conclusion, the experiment of using ChatGPT to assist in adding comments to an existing web/typescript project showed mixed results. The text-focused model, “Text-Davinci,” performed better than the code-focused model, “Code-Davinci,” with an accuracy rate of about 80% for commenting the code correctly. However, there were instances where the comments were not placed correctly or disrupted the code’s syntax. Additionally, the quality of the comments varied greatly, with some being simple and others more detailed. Overall, while the use of ChatGPT in this scenario showed promise, more refinement and fine-tuning of the models and prompts are needed to achieve more consistent and accurate results.

Thanks for reading and see you in the next article!

This article was written by Markus Lindner and Patrick Jung

--

--