Using SPAdes for Genome Assembly: A Step-by-Step Guide

BThors
2 min readDec 28, 2022

--

Photo by Jack Hamilton on Unsplash

Genome assembly is an essential step in the analysis of genomic data, allowing researchers to reconstruct the complete DNA sequence of an organism from short sequences called “reads” produced by DNA sequencing technologies. SPAdes (St. Petersburg genome assembler) is a popular open-source assembly tool for genome reconstruction from DNA sequencing reads. This article intents to demonstrate how to use SPAdes for genome assembly.

Before you begin, you will need to install SPAdes on your computer. You can download the latest version of SPAdes from the SPAdes website (http://cab.spbu.ru/software/spades/) or install it using a package manager (e.g., conda, pip).

Once you have installed SPAdes, you can use the following command to run the assembler on a set of DNA sequencing reads:

spades.py -1 reads_1.fastq -2 reads_2.fastq -o output_directory

The -1 and -2 options specify the files containing the paired-end reads in FASTQ format. The -o option specifies the output directory where the assembled genome will be stored.

You can also specify additional options to customize the assembly process. For example, you can use the --careful option to enable the "careful" mode, which reduces the number of misassemblies and mismatches at the cost of longer computation time. You can use the -k option to specify the k-mer sizes to be used for the assembly. The recommended k-mer sizes for different sequencing technologies can be found in the SPAdes documentation.

In addition to the standard assembly mode, SPAdes also supports hybrid assembly using both short and long reads. To run a hybrid assembly, you can use the following command:

spades.py -1 reads_1.fastq -2 reads_2.fastq -s long_reads.fastq -o output_directory

The -s option specifies the file containing the long reads in FASTQ format.

Once the assembly is complete, you can use tools such as QUAST (Quality Assessment Tool for Genome Assemblies) to evaluate the quality of the assembled genome. You can run QUAST on the assembled genome using the following command:

quast.py assembled_genome.fasta -o output_directory

This will generate a report containing various statistics about the assembled genome, including the number of contigs, N50, and the percentage of the genome covered by the contigs.

--

--