AI prompt for making sense of others’ code
This post can be easily renamed to: how to calculate GC content in a DNA sequence. Anyway, Rust-Bio is a bioinformatics library for Rust. Let’s say we want to understand a function in the source code here.
use std::borrow::Borrow;
/// Base gc content counter
fn gcn_content<C: Borrow<u8>, T: IntoIterator<Item = C>>(sequence: T, step: usize) -> f32 {
let (l, count) = sequence
.into_iter()
.step_by(step)
.fold((0usize, 0usize), |(l, count), n| match *n.borrow() {
b'c' | b'g' | b'G' | b'C' => (l + 1, count + 1),
_ => (l + 1, count),
});
count as f32 / l as f32
}
Prompt Template
My strategy is to politely ask GPT-4 to simplify the code, and then compare it with the original code in terms of what features might be lost. You can fill in the parts with []
and use this for your own use case.
Step 1 (Prompt): Generate the initial simple version
Assume you are a bioinformatics expert with extensive Rust programming knowledge. You are especially good at writing simple code. I will give you a code snippet and your task is to rewrite it in the most simple way possible. It’s okay if the simple version of the code does not capture all edge cases as the original code. The simple code should be easy-to-understand for beginner Rust programmers who are still new to concepts like generic types, structs, and traits. This is the code:
[paste code here]
Step 2 (Questions): Iteratively generate better versions of the simple code
The first output may not be what you are looking for. You will have to keep the conversation going and ask the model to rewrite the code again. For example, ask:
- Can you remove the step argument? It doesn’t seem essential.
- Can you make the code use characters when checking equality instead of byte strings?
Step 3 (Prompt): Compare the simple version and the original code
Thank you. This is great! Now we have 2 versions of the code: simple code and the original code. For your reference, here is the simple code:
[paste final version of the simple code here]
And here is the original code we started with:
[paste the original code we couldn't make sense of here]
Make a list of what the original code can do but the simple code cannot do. Usually the simple code is more specific and the complicated code is more general.
Sample Output
I used the template above with Rust-Bio code. And I got this.
Output from steps 1 & 2
// This function calculates the GC content of a DNA sequence
// GC content is the percentage of bases that are either G or C
// The sequence can be a string slice
fn gc_content(sequence: &str) -> f32 {
// Initialize the length and count variables to zero
let mut length = 0;
let mut count = 0;
// Loop through the sequence
for base in sequence.chars() {
// Increment the length by one
length += 1;
// Check if the base is G or C (ignoring case)
if base == 'g' || base == 'c' || base == 'G' || base == 'C' {
// Increment the count by one
count += 1;
}
}
// Return the ratio of count to length as a floating point number
count as f32 / length as f32
}
You can play with this in the Rust Playground. There’s a problem on Rosalind you can solve too.
Output from step 3
Here is a list of what the original code can do but the simple code cannot do:
- The original code can take any type that implements the
IntoIterator<Item = C>
trait as the sequence whereC
represents any type that implementsBorrow<u8>
trait. The simple code can only take&str
as the sequence. - The original code can take a
step
parameter that allows skipping some bases between each calculation. The simple code does not have this parameter and always calculates the GC content based on every base in the sequence. - The original code uses the
fold
method to accumulate the length and count variables in a single pass. The simple code uses afor
loop and mutates the variables in each iteration. - The original code uses the
match
expression to check the base against the binary values of G and C. The simple code uses theif
expression to check the base against the characters of G and C.
Writer’s Note: I am not a prompt engineering expert. But I know enough to generate useful output (after a lot of trial and error). You can learn more about how to write good prompts here.