How to count DNA nucleotides
Published in
2 min readDec 5, 2023
A Rust function that returns counts for A, C, G, and T in a DNA sequence
Arguments
- seq: a string slice that holds a DNA sequence
Example
let seq = "ACGT";
let counts = count_dna_nucleotides(seq);
// counts holds
// {'A': 1, 'C': 1, 'G': 1, 'T': 1}
Code
use std::collections::HashMap;
fn main() {
let sequence: &str = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC";
let counts: HashMap<char, i32> = count_dna_nucleotides(sequence);
println!("{:?}", counts);
}
fn count_dna_nucleotides(seq: &str) -> HashMap<char, i32> {
let mut map = HashMap::new();
let nucleotides = ['A', 'C', 'G', 'T'];
for nuc in nucleotides {
// Filter the sequence for a specific nucleotide and count the number of characters
let count = seq.chars().filter(|&n| n == nuc).count() as i32;
// Add the nucleotide and count pair to the hashmap
map.insert(nuc, count);
}
map
}
Some notes on the code
- A
HashMap
in Rust is similar to a dictionary in Python - Angle brackets or
<>
are used to define generic types - The part inside the filter function
|&n| n == nuc
is called a closure - A closure is like a lambda function in Python
|&n|
is a parameter (&
indicates that the parameter is a reference)- A closure can access variables from the outer scope like
nuc
filter
method can be called on iterators&str
is not an iterator but calling.chars()
on&str
can turn it into an iterator
Next Steps
- Play with this code in the Rust Playground
- Solve a Rosalind challenge