One Step at a Time

William L. Weaver
TL;DR Innovation
Published in
5 min readApr 17, 2018

--

Innovations in Next Generation Genome Sequencing

In 1898, Samuel P. Langley received a $70,000 grant from the United States War Department and the Smithsonian Institution to develop a piloted airplane. Langley invented a series of successful unmanned powered aircraft that were launched by a catapult system mounted on the roof of a houseboat anchored in the Potomac river — a system that would later serve as the basis for modern aircraft carriers. However, when used with piloted prototypes the catapult and aircraft engine overwhelmed the fragile structures with too much thrust. Langley’s development platform had no provision for stepwise improvement. After each failed attempt the craft needed to be fished out of the river, dried, repaired and hoisted back to the top of the catapult. During the final test on December 8th 1903 the craft entered into a vertical climb, one of the wings collapsed and the structure again crashed and sank into the river — unable to be salvaged. Nine days later on December 17th, two brothers from Dayton, Ohio who had a deep understanding of mechanics from their print and bicycle shops, successfully demonstrated powered piloted flight above the sand dunes of a windy beach a few miles South of Kitty Hawk, North Carolina using a light experimental machine costing less than $1000.

Photo by Florian Bernhardt on Unsplash

A century later in 1990, the United States Department of Energy (DOE) and the U.S. National Institutes of Health (NIH) provided $3 billion to fund an international project for the purpose of sequencing the 3 billion nucleotide bases comprising the complete human genome. The technology relied on labor-intensive processes that generated multiple copies of short base sequences using chemically-modified nucleotides. The resulting copies were either cleaved into shorter segments (as in the case of Maxam-Gilbert sequencing) or prematurely terminated by reaction (as with Sanger sequencing) and the resulting collection of fragments were separated according to length using gel electrophoresis. The identity of the last nucleotide at the end of each chain fragment was measured by radioactive or fluorescent labeling and the nucleotide order in the short sequence was determined by collating the last nucleotide of each fragment. Due to the poor quality of separation among the initial and final fragments, the order of the first few nucleotides is undetermined and the length of the short sequence is limited to less than 1000 bases.

The first stage of the 15-year international project involved the determination of “marker sequences” located at regular intervals throughout the entire 3-billion base human genome. These markers would serve as “top-down” guide posts that would be used to stake out the framework of the entire genome and calibrate segment position as the interposing sequences were measured. After the halfway point of the publicly-funded international project in 1998, the privately-funded firm Celera Genomics embarked on the same goal backed by $300 million. The Celera approach used a “bottom-up” method known as “whole-genome shotgun sequencing” that employed similar short-fragment sequencing technology but relied on the development of advanced computational algorithms to assemble the overall order of the fragments into the complete genome without the aid of marker sequences. Ultimately, the publicly-available marker positions provided by the international project assisted in the development of the algorithms used by Celera and the rapid advances by the 2-year private project spurred faster progress among the international laboratories. Both projects jointly presented their initial findings in early 2001.

Now a decade later, the goal of human genomics is the ability to sequence the complete genome of individuals in a timely and cost-effective manner. Advanced research is being catalyzed by the $10 million Archon X Prize announced in 2006 that is to be awarded to “the first team that can build a device and use it to sequence 100 human genomes within 10 days […] at a recurring cost of no more than $10,000 (US) per genome.” Advances in technology at all levels has pushed the industry to amend this goal down to $1,000 per genome. Among the evolutionary innovations in reaction chemistry, sample handling, computing speed and algorithm optimization are a few revolutionary insights that bring these goals ever closer.

One such innovation is the use of a removable nucleotide marker. Obviating the need to make multiple copies of a sequence that are subsequently fragmented, tagged, and separated by chromatography, a removable fluorescent marker permits a single nucleotide strand to be sequenced in a stepwise fashion. Helicos BioScience Corporation in Cambridge, Massachusetts has developed a commercial system that attaches a 100 to 200-base strand to a substrate by its 3’ end, eliminating the uncertainty of knowing the sequence direction. The anchoring molecule contains a fluorescent tag that can be imaged by a scientific-grade digital camera. After the spatial position of the strand is located in the image, the tag is washed away and a tagged complementary nucleotide of type A,C,G, or T is polymerized to the bottom of the strand near the attachment point. The strand is imaged again to discover if the specific nucleotide was attached. If the tag is detected, the nucleotide type is recorded and the tag is washed away while leaving the attached nucleotide. The process is cycled using all four tagged bases and repeated until the desired read length is obtained. The elegance of this approach is realized in that a single strand can be sequenced and the imaging detector can analyze multiple strands simultaneously at a density of 100 million strands per cm² resulting in several billion sequenced strands per run. The short-sequence data is then processed by algorithms to determine the overall genome sequence.

Another potentially game-changing project is IBM’s “DNA Transistor”. This approach seeks to ratchet a complete DNA strand through a nanometer-sized pore while measuring the identity of each nucleotide as it passes a detector in a stepwise fashion. IBM’s experience in semiconductor fabrication and nanotechnology is being leveraged to transition successful simulations and theoretical calculations into an actual device. Concurrent research into tag-free nucleotide detection is underway by IBM and others. In addition to simplifying the computational needs of assembling short fragments, this technology would reduce drastically the volume of chemical reagents required for analysis. While potentially quite a long way off, it is very plausible to envision a future DNA Transistor being incorporated into a handheld electronic device.

Along with advances in measurement and computation technologies, results from the Human Genome Project have the potential to transition the practice of medicine from one of “one size fits all” into “customized” cures and prescriptions that address the specific condition of each patient. Much like Samuel Langley, designing an aircraft carrier before we develop the first working airplane is commendable and forward thinking but can often inhibit innovation. While exciting, painstaking developments must be made one step at a time.

This material originally appeared as a Contributed Editorial in Scientific Computing May/June 2011, pg. 16.

William L. Weaver is an Associate Professor in the Department of Integrated Science, Business, and Technology at La Salle University in Philadelphia, PA USA. He holds a B.S. Degree with Double Majors in Chemistry and Physics and earned his Ph.D. in Analytical Chemistry with expertise in Ultrafast LASER Spectroscopy. He teaches, writes, and speaks on the application of Systems Thinking to the development of New Products and Innovation.

--

--

William L. Weaver
TL;DR Innovation

Explorer. Scouting the Adjacent Possible. Associate Professor of Integrated Science, Business, and Technology La Salle University, Philadelphia, PA, USA