Processing 200 RNA-seq samples over the weekend

Karl Sebby
truwl
Published in
3 min readOct 20, 2021

When Feng Liu needed to process data from 200 ATAC-seq and 200 RNA-seq experiments he turned to Truwl to get it done.

Feng is an Ph.D. student at the University of Oxford whose research is focused on identifying the molecular drivers of liver metastasis. A patient with metastatic liver cancer has a poor prognosis with only 11% of patients surviving for 5 or more years. By seeking to identify the transcriptional network that drives the metastatic process, this research can help understand the metastatic transition and lead to the design of early diagnostics and interventions.

Feng wanted to use trusted and established methods to analyze his data and chose the pipelines created by the ENCODE Data Coordination Center (DCC) for that reason (https://github.com/ENCODE-DCC). The pipelines are available on GitHub with an open source license, but that doesn’t mean it’s trivial to make them run, especially at scale. There’s underlying software that needs to be installed, required computational power that exceeds what’s available in a typical computer, and it takes some time to figure out what all the parameters mean and how to specify them properly. Feng is not a newcomer to bioinformatics and programming and could have set up the pipelines on his own server. He estimated that it would have taken at least 15 hours of setup time per project to do so; and then he still wouldn’t have been able to submit many jobs at the same time. “Not everyone should build their bioinformatic skills and own server,” he said.

Truwl input editor for the ENCODE RNA-seq pipeline.

Truwl has everything that is needed to run bioinformatics pipelines, including the ENCODE pipelines, already set up and ready to go so users can process their data fast. In fact, Feng processed all of his RNA-seq data in just one weekend using Truwl. Truwl runs pipelines on the cloud providing access to nearly limitless compute power and storage and is built on tried and true architecture that is based on the system used by the ENCODE DCC.

There are several bioinformatics platforms out there, but they don’t typically support important work like Feng’s. They may take more time to figure out and evaluate than it should take to do all the analyses. You might have to make an account, do a whole bunch of setup, or talk to a sales rep before you even know if it will meet your needs and you know what methods they support. And commercial vendors won’t often pay attention to users that don’t need to process a high volume of samples on a regular basis. Truwl is happy to support users like Feng and offers no-subscription accounts to enable these users to run workflows and just pay for the compute they use with no commitment. And just like it doesn’t make sense for everybody to have to develop equivalent computational skills and set up their own servers, they also shouldn’t have to start their experiment from scratch. Feng shared a public example of an RNA-seq job that others can evaluate and fork to use as a starting point to run their own analyses here.

--

--