How to count DNA nucleotides

Drashti Shah
Bioinformatics with Rust
2 min readDec 5, 2023
Generated with AI

A Rust function that returns counts for A, C, G, and T in a DNA sequence

Arguments

  • seq: a string slice that holds a DNA sequence

Example

let seq = "ACGT";
let counts = count_dna_nucleotides(seq);

// counts holds
// {'A': 1, 'C': 1, 'G': 1, 'T': 1}

Code

use std::collections::HashMap;

fn main() {
let sequence: &str = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC";
let counts: HashMap<char, i32> = count_dna_nucleotides(sequence);
println!("{:?}", counts);
}

fn count_dna_nucleotides(seq: &str) -> HashMap<char, i32> {
let mut map = HashMap::new();
let nucleotides = ['A', 'C', 'G', 'T'];
for nuc in nucleotides {
// Filter the sequence for a specific nucleotide and count the number of characters
let count = seq.chars().filter(|&n| n == nuc).count() as i32;
// Add the nucleotide and count pair to the hashmap
map.insert(nuc, count);
}
map
}

Some notes on the code

  • A HashMap in Rust is similar to a dictionary in Python
  • Angle brackets or <> are used to define generic types
  • The part inside the filter function |&n| n == nuc is called a closure
  • A closure is like a lambda function in Python
  • |&n| is a parameter (& indicates that the parameter is a reference)
  • A closure can access variables from the outer scope like nuc
  • filter method can be called on iterators
  • &str is not an iterator but calling .chars() on &str can turn it into an iterator

Next Steps

--

--

Drashti Shah
Bioinformatics with Rust

ESG & climate data scientist by day. Aspiring computational biologist and game designer by night. A Rust fan who believes in an "open career".