Top 10 Misconceptions about Next Generation Sequencing

Things we didn’t know or assumed to be true.

There are a whole lot of things you can achieve with latest NGS platforms for your research projects and we will try to cover some of them in this article.

Historical trends in storage prices versus DNA sequencing costs.

The blue squares describe the historic cost of disk prices in megabytes per US dollar. The long-term trend (blue line, which is a straight line here because the plot is logarithmic) shows exponential growth in storage per dollar with a doubling time of roughly 1.5 years. The cost of DNA sequencing, expressed in base pairs per dollar, is shown by the red triangles. It follows an exponential curve (yellow line) with a doubling time slightly slower than disk storage until 2004, when next generation sequencing (NGS) causes an inflection in the curve to a doubling time of less than 6 months (red line). These curves are not corrected for inflation or for the ‘fully loaded’ cost of sequencing and disk storage, which would include personnel costs, depreciation and overhead.

Source: Stein Genome Biology 2010 11:207 doi:10.1186/gb-2010-11-5-207

Image Source: Tute genomics;

Common misconceptions about NGS technology

[1] Next Gen Sequencing will completely replace older microarray based technologies

Not really, although many scientific teams have migrated to newer NGS based approaches to perform the same experiments they used to do earlier with microarray based platforms, the results are not the same plus it takes additional expenses in terms of infrastructure investments and getting the right kind of bioinformatics support to analyze the data generated. Even today microarray based platforms continue to be sold and used in the market for various purposes. Its simplicity and cost effectiveness guarantees that it will not disappear altogether. The enormous amount of data generated from Affymetrix arrays for gene expression studies which is available on public domain databases like NCBI GEO is still very valuable today as modern day scientists are now using these “gold standard” data to perform quality checks on RNA Seq experiments.

[2] NGS is very difficult and expensive and only big labs can afford it

Not true, as we see more and more companies entering this space the cost is lowering at a dramatic rate. Recently Illumina, one of the leading companies in this sector has announced the $1000 genome analysis kit. Compared to the unbelievable 3 million dollars and 13 years spent on the first human genome project in 2003 this is a huge price drop for 2013 within a decade. Now the focus has shifted towards faster and better data analysis pipelines while keeping the cost at an affordable range.

[3] I need a lot of sample DNA to get good quality data

Not true, ever heard of single cell genomics — now we have commercial kits in the market and specialized microfluidics chips (lab on a chip) to isolate a single cell of interest and analyze its genome — for example a circulating tumor cell (one in a billion blood cells population) which is a representative sampling of the primary tumor can be captured and its genomic DNA sequenced in no time at all. This type of approach has opened up a lot of possibilities giving birth to the whole new diagnostics market known as “liquid biopsy” which enables personalized medicine become a reality.

[4] I cannot use FFPE sample as the DNA is fragmented and poor quality

No, FFPE (paraffin embedded) samples used to be troublesome, not anymore. Almost all major kit manufacturers now have special reagents to handle such difficult samples. There is a way to correct for errors from bioinformatics approach also.

[5] I don’t need biological controls as the reference genome is available for my species of interest

False, whatever the model organism and regardless of the availability of reference genomes it is good science to always include normal controls both to remove analytical bias and account for biological variability. The reference genome is just a helping hand for analysis but it is not a true representation of a single organism in the wilderness. It is more like a soup salad of sequences from multiple sources belonging to the same genomic regions of interest.

[6] I need expensive software to analyze the data

Not necessary — Galaxy is a open source portal maintained by the NGS community and most bioinformatics professionals prefer it to commercial vendors. I am running a cloud based self hosted Galaxy instance for all my client requirements.

[7] I need a big budget if I am going to analyze more than 12 samples

No issues mate — 12 or 24 or 48 or 96 and above — the number of samples is only relevant to the size of data generated. For a whole genome or whole exome project it makes sense to do some trial runs with limited samples. For RNA Seq the cost is not much compared to whole genome.

[8] Paired end (both directions) sequencing is always better for RNA Seq

Not really, if the quality of RNA is poor analyzing paired end reads (short RNA sequences) will become problematic and we will lose valuable data when we do quality control by filtering out the unpaired reads. If your experiment can get by without paired end reads then you really don’t need it. I would always recommend to sequence a couple of samples first and check the FASTQC (quality of reads) report.

[9] My sequencing lab has done everything correctly and I don’t have to ask them how they did it…

This is the most common issue I have faced — clients who are new to sequencing do not understand the quality steps followed by the vendor who does the sequencing for you. It may have all-or-none filtering artifacts on good quality versus bad quality reads depending on the default parameters setup in their analysis pipeline. I would like to take a look at the raw data and technical specifications of the instrument, reagents and library preparation used for sequencing. Incomplete information about how the experiment was conducted will always lead to problems downstream during data analysis and interpretation.

[10] My Post Doc did everything and now he is no longer working in my lab — but I am sure he followed all the protocols

Again this is a common issue with PIs trusting their students and Post Docs. I would always want to see the lab notebook of the primary person who performed the experiments if they allow it. In most cases it is easy to trace the person and get details directly by email. Never assume how an experiment was conducted — it can be a life or death scenario for the project outcome.

That’s all folks

More interesting stuff in my next article

Teaser for my upcoming articles:

How to plan and implement a Microbiome Sequencing Project — We will cover popular ongoing projects and discuss the data

Implementing Clinical NGS — challenges in developing countries

Exosomes — the miracle workers of cell — to — cell communication

Self promotion: If you are a research scientist currently involved in NGS work and need some help with data analysis then please contact me and depending on my schedule I can definitely help you. Also if you have analyzed data already and would like someone to interpret the results and write it up for a good publication in high impact journals then shoot me an email:

If you have made it this far then please follow me for more exciting content coming soon…

Show your support

Clapping shows how much you appreciated Shibi Kannan’s story.