How to read a FASTA file

Drashti Shah
Bioinformatics with Rust
2 min readDec 7, 2023
Generated with AI

A Rust function that returns a HashMap containing sequence IDs or headers as keys and DNA sequences as values extracted from a FASTA file

Arguments

  • file_path: a string slice that holds path to the text file (in FASTA format)

Example

>Header_000
ACTG
>Header_001
AAAT
// Assume the text file above is named "sample_file.fa"
let data = read_fasta("sample_file.fa");

// data holds
// {"Header_001": "AAAT", "Header_000": "ACTG"}

Code

use std::collections::HashMap;
use std::fs::File;
use std::io::{BufRead, BufReader};

fn read_fasta(file_path: &str) -> HashMap<String, String> {
let mut data = HashMap::new();
let file = File::open(file_path).expect("Invalid filepath");
let reader = BufReader::new(file);

let mut seq_id = String::new();

for line in reader.lines() {
let line = line.unwrap();

// Check if the line starts with '>' (indicating a sequence ID or header)
if line.starts_with('>') {
seq_id = line.trim_start_matches('>').to_string();
} else {
// If it's a DNA sequence line, insert or update the HashMap entry
// If seq_id is not present, insert a new entry with an empty String
// Then append the current line to the existing DNA sequence
data.entry(seq_id.clone()).or_insert_with(String::new).push_str(&line);
}
}

data
}

Some notes on the code

  • trim_start_matches('>') removes all > from the start of seq_id
  • It seems like BufRead is not used anywhere, but reader.lines() needs this trait in scope to create an iterator
  • Note that you can only push a &str to an existing String (you can’t append/concatenate a String with another String)
  • data.entry() will take ownership of seq_id and make it invalid if we don’t clone it

Next Steps

--

--

Drashti Shah
Bioinformatics with Rust

ESG & climate data scientist by day. Aspiring computational biologist and game designer by night. A Rust fan who believes in an "open career".