Downloading NGS datasets using Nextflow

Andrea Telatin
#!/ngs/sh
Published in
2 min readJul 29, 2022

A simple to use pipeline to download FASTQ files from NCBI or EBI, which serves as a good advertisment for my favourite workflow manager

A quick start

If you have a Linux machine with Docker and Nextflow installed, you can try the final result: create a text file with a list of SRA accession numbers like:

SRR8652866
SRR8652865
SRR8653287

And save it as list.txt , the run the command:

nextflow run telatin/getreads --list list.txt --outdir data \
-profile docker

Nextflow will automatically download the repository with the workflow (from github.com/telatin/getreads), then fetch the Docker container with the dependencies, and will download in parallel the requested samples, as depicted in the screenshot below.

What is Docker

Docker is a popular ecosystem to manage, execute and distribute container images, in this context is a system to ensure that a set of tools will work in any machine capable of executing Docker.

What is Nextflow

Nextflow is a workflow language and a task orchestrator. It allows the creation of multistep workflow separating the logic and the configuration, making them easily shareable across different premises (local computers, High Performance Clusters with schedulers like Slurm or PBS, cloud environments like AWS or Azure…).

Why this workflow?

One day NCBI went mad with their APIs and broke existing workflows that I was using to retrieve raw data. With Nextflow I have been able to draft a workaround in one day, and that to me has been a greate example of the flexibility of the platform. Exercises apart, I recommend checking a robust and fully powered pipeline called nf-core/fetchngs. It’s a ⭐️⭐️⭐️ pipeline!

A primer on Nextflow

If with this short article I made you curious, and you’d like to learn how to write a workflow using Nextflow, check my tutorial that will bring you to write a de novo assembly pipeline for bacterial genomes.

See the full schematics of the final result below:

De novo assembly workflow

How to run the final example

Again, before trying to follow the tutorial and make the pipeline by yourself, try running it as shown in the video. Nextflow requires some knowledge both to create pipelines and to execute them in your premises (resource management, dependencies …) so it’s a good exercise to give it a go!

Let me know if this helped, slapping a star in the github repository of the tutorial!

--

--