Learning How Code Works using AI
You have probably heard that Generative AI tools like OpenAI’s ChatGPT, Google’s Bard or Github’s Copilot (which also uses OpenAI technology), can help you write software. They can, but I will add from my experience using ChatGPT that it does make things up (like really nice API’s that don’t exist!) and the code is not likely to compile, but it still helps. Think of it is as a junior developer, at best.
What I find more interesting and useful is the ability of the tools to explain how code works. Imagine you are a school teacher doing an introduction to programming for younger kids. You can just turn them lose on a chromebook or even a phone with some sample code in any language and have the AI tool tell them how it works. Or maybe there is some really dense code you want to understand better- just let an AI explain it to you!
I tried a smattering spanning over 50 years of software history- Cobol, Algol, Ada, Java, and Kotlin. I found you don’t even need to tell the AI what language it is- it will figure it out. I have done these experiments with both ChatGPT and Bard and they work great.
First, get yourself a ChatGPT account at https://chat.openai.com/. You can also sign up for Bard at https://bard.google.com/ The accounts are free.
Second, find some source code. There are examples all over the Internet from various programming tutorials. There are a bunch of basic Java ones here: https://www.geeksforgeeks.org/java-programming-examples/ Just copy the code and provide the AI a simple prompt:
What does this program do?
Then paste in the code and submit it. Here is what the process looks like in Bard.
What does this program do?
// Main class
public class GFG {
// Main driver method
public static void main(String[] args)
{
int num = 5;
int x = 0;
// Outer loop for rows
for (int i = 1; i <= num; i++) {
x = i - 1;
// inner loop for "i"th row printing
for (int j = i; j <= num - 1; j++) {
// First Number Space
System.out.print(" ");
// Space between Numbers
System.out.print(" ");
}
// Pyramid printing
for (int j = 0; j <= x; j++)
System.out.print((i + j) < 10
? (i + j) + " "
: (i + j) + " ");
for (int j = 1; j <= x; j++)
System.out.print((i + x - j) < 10
? (i + x - j) + " "
: (i + x - j) + " ");
// By now we reach end for one row, so
// new line to switch to next
System.out.println();
}
}
}
The analysis is quick and detailed, providing both a summary and a line by line explanation. There are also multiple drafts of the AI provided analysis, allowing you to pick which one is best and to tell the AI which is best. That feedback improves the functioning of the AI.
Code Summary
The program you provided is a Java program that prints a pyramid pattern. The program first defines a constant named num, which is the number of rows in the pyramid. The program then declares a variable named x, which is used to track the current row.
The program starts by printing a blank line. Then, the program enters a loop that iterates through the rows of the pyramid. For each row, the program prints a number of spaces equal to the number of rows minus the current row. Then, the program prints the numbers from 1 to the current row, in reverse order. Finally, the program prints the numbers from the current row to 1, in forward order.
The program then prints a blank line and repeats the loop until it has printed all of the rows of the pyramid.
The following is an example of the output of the program when num is 5:
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
Line by Line Analysis
Line 1: This line declares the class GFG.
Line 2: This line declares the main() method.
Line 3: This line declares the constant num and initializes it to 5.
Line 4: This line declares the variable x and initializes it to 0.
Line 5: This line enters a loop that iterates from 1 to num.
Line 6: This line sets x to i — 1.
Line 7: This line prints a space.
Line 8: This line prints a space.
Line 9: This line prints the numbers from 1 to i in increasing order.
Line 10: This line prints the numbers from i — x to 1 in decreasing order.
Line 11: This line prints a newline.
Line 12: This line terminates the program.
Explaining Java Byte Code
Here is a more complex example but using Java Byte Code created from a class file using the command line tool javap. If you don’t have the Java source code, you can use javap to disassemble (reverse engineer) a class file into byte code, which is much more understandable (at least by an AI).
To disassemble a class named Main.class type at a command prompt:
javap -c Main.class
eizdepski@eclector Downloads % javap -c Main.class
Compiled from "Main.java"
public class com.erichizdepski.Main {
public com.erichizdepski.Main();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream;
3: aload_0
4: arraylength
5: invokedynamic #13, 0 // InvokeDynamic #0:makeConcatWithConstants:(I)Ljava/lang/String;
10: invokevirtual #17 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
13: aload_0
14: arraylength
15: ifle 35
18: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream;
21: aload_0
22: iconst_0
23: aaload
24: invokedynamic #23, 0 // InvokeDynamic #1:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
29: invokevirtual #17 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
32: goto 56
35: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream;
38: ldc #26 // String Hello! No argument provided.
40: invokevirtual #17 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
43: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream;
46: new #28 // class java/util/Date
49: dup
50: invokespecial #30 // Method java/util/Date."<init>":()V
53: invokevirtual #31 // Method java/io/PrintStream.println:(Ljava/lang/Object;)V
56: return
}
This is java byte code. By the way, here is an interesting tool for manipulating it: https://asm.ow2.io/
I cut and pasted the byte code, with the prompt “What does this java byte code do?” into ChatGPT. It provided explanations for the constructor, the main method, and a summary. I was not expecting it could do this!
This is the constructor of the com.erichizdepski.Main class. It calls the
constructor of the superclass java.lang.Object using invokespecial
instruction. The aload_0 instruction loads the reference to this onto the
stack, and then the invokespecial instruction calls the constructor of the
superclass. Finally, it returns from the constructor.
This is the main method, the entry point of the Java program. It takes an array of java.lang.String as input.
Here's a step-by-step explanation of what the main method does:
Line 0: getstatic instruction fetches the value of the static field java/lang/System.out, which is the standard output (System.out).
Line 3: aload_0 instruction loads the reference to the first argument (String[] args) onto the stack.
Line 4: arraylength instruction gets the length of the array (number of command-line arguments) and places it on the stack.
Line 5: invokedynamic instruction is used to dynamically link the method invocation. It is used here to concatenate a string with the number of command-line arguments. The exact details of the dynamic linking are not provided in the bytecode snippet.
Line 10: invokevirtual instruction calls the println method on the java.io.PrintStream object, which prints the concatenated string to the standard output.
Line 13: aload_0 loads the reference to the command-line arguments array again.
Line 14: arraylength instruction gets the length of the array (number of command-line arguments) again and places it on the stack.
Line 15: ifle instruction performs a branch if the value on the stack is less than or equal to zero. In this case, it jumps to line 35.
If the length of the command-line arguments is greater than zero (lines 18-32):
Line 18: getstatic instruction fetches the value of the static field java/lang/System.out (standard output).
Line 21: aload_0 loads the reference to the command-line arguments array again.
Line 22: iconst_0 loads the constant integer value 0 onto the stack.
Line 23: aaload instruction gets the value at index 0 from the command-line arguments array and places it on the stack.
Line 24: invokedynamic instruction is used to dynamically link the method invocation, probably to concatenate the string with the first command-line argument.
Line 29: invokevirtual instruction calls the println method on the java.io.PrintStream object, which prints the concatenated string to the standard output.
If the length of the command-line arguments is zero (lines 35-40):
Line 35: getstatic instruction fetches the value of the static field java/lang/System.out (standard output).
Line 38: ldc instruction loads the constant string "Hello! No argument provided." onto the stack.
Line 40: invokevirtual instruction calls the println method on the java.io.PrintStream object, which prints the string to the standard output.
Lines 43-53: If the command-line arguments are empty or not, this block prints the current date and time using the java.util.Date class.
Line 56: The main method ends with a return statement.
In summary, this Java bytecode checks if command-line arguments are provided.
If arguments are given, it prints the number of arguments and the first
argument. Otherwise, it prints a message saying no argument is provided and
then prints the current date and time.
The byte code descriptions are great for understanding how it works and the summary is correct. I also fed ChatGPT some 6502 assembly code- it nailed it. Boy, if I had had this for my Atari 800 I would have learned assembly really well and written some great games back in high school!
Improving and Debugging Code
I write code that generates fractals. I fed this code in and it was properly analyzed. There was one more thing- it gave tips for improving my code. That is interesting. Could AI review your code and add tasks to your agile backlog for improving it?
Suppose I am having trouble with some code. I can feed the code to the AI and have it explain it back to me. This could reveal where I have gone wrong, since if the AI fails to understand something maybe it is because I have made a mistake. This is very powerful and disruptive, since ChatGPT and Bard can do this for free, as opposed to Copilot.
When it comes to debugging, an AI would first analyze the code for defects. Then, it could write some code to test if the defect could be exploited. Bugs (the response we see) are a combination of a code defect and the environment or put another way, an exploit. This roundtrip analysis can also be used to grow the training data for the AI. A nice positive feedback cycle.
What’s Next For AI Code Analysis
A little off topic, but I think there is huge disruption coming in this area. Some other potential use cases:
Discover IP theft
- did you steal my code an obfuscate it? Use AI to track it down.
Find Previously Unknown Malware threats
- find malware in a source code based on understanding the malware function, vice just a known malware code matching exercise
Discover Zero-day Vulnerabilities
- Train models on code at the system boundaries (the inputs at the interfaces) that has vulnerabilities. Teach an AI what a constitutes a vulnerability in code and since AI can write code, it can also try to exploit what may be a vulnerability and learn new things. Using AI to create its own new knowledge/training sets helps with a general problem in AI of learning how to handle things it has not seen before by reducing (however small) the things it has not seen before.
Software Preservation
- resurrect some old code to source (or assembly) to bring an old video game to life and fix some bugs or port to a new platform
These are really hard and Generative AI is probably not the right tool (or maybe one tool of many) for the job, but some other AI will nail it love before we are killed off by Skynet.
Conclusion
AI as a learning aid for programming is extremely powerful. From teaching kids to a professional trying to understand a chunk of code they didn’t write while debugging, it can do it all. Systems that “mash up” different AI tools that run in your DevOps pipeline is where this is headed. It is probably already there for some teams, but the tech will trickle down to all.