Nanopore proteomics are on the horizon
How a revolutionary nucleic acid sequencing technology gets repurposed for peptide chains
Sequencing in biology usually refers to the process of determining the exact order of nucleotides (DNA/RNA) or amino acids (proteins) in large biological molecules that build life. Often compared with computer code, or strings of bits, all the functionality of life depends on what is encoded in that order.
Sequencing technologies have seen a phenomenal development in the last three decades; the prize and effort of sequencing a whole genome dropped from impossibly large to virtually nothing; they are the driving force behind the digitization of biology and have already inmeasurably contributed to science, medicine and human wellbeing. Genomics as a field would barely exist without it, neither would we have sequenced SARS-CoV-2 and produced a vaccine against Covid-19.
On track to be one of the most fundamental technologies shaping the 21st century, the developments in sequencing technologies will have an outsized impact on all of our lives, whether we are aware of it or not.
Currently, the bulk of sequencing is being performed using short read approaches that require fragmenting the chains into smaller pieces and preparation of socalled ‘read-libraries’ (see above) which can then be amplified by PCR; these approaches are cheap, scale well through parallelization and the often complicated re-assembly (infering the full-length sequence from the many overlapping small reads) profits from increased computational power and methods.
However, for some applications, short read approaches remain too disruptive, or are insufficient to address certain research questions, or the information gained from them is incomplete. Therefore, researchers have been trying to come up with methods to probe full-length chains.
The rise of nanopore sequencing
Nanopore sequencing technology works differently, and is truly brilliant. The basic idea is somewhat simple: Take a full lengh nucelotide chain and push it through a small opening, one nucleotide at the time, while measure what nucleotide goes through at each step.
The problem of course is that these chain molecules are super long while at the same time very small in diameter; we have no way of directly touching them or fold them into a straight line; lastly we have nothing that can just go over that string and ‘measure’ the nucleotide at a specific position.
But what we do have as a molecular door; more specifically a whole set of channel proteins which allow different molecules to traverse a membrane (for example the lipid barrier surrounding a cell) in an gated fasion. During traversal of that door, the nucleotide chain is stretched straight; so all we need now is a way to assess which nucleotide passes that door at each step; and this works with the help of an electrical membrane potential.
There are multiple different setups for the tech, but here are the essential steps:
- the nucleotide chain one wants to sequence gets connected (‘ligated’) to an adaptor molecule, which is bound to an engineered motor protein (for DNA, often a helicase)
- an electrical current or the motor enzyme push the DNA/RNA chain through the nanopore (often a channel protein like MspA), or alternatively, the chain is allowed to diffuse through and then pulled back by the motor enzyme through the pore
- the trick is that the pushing or pulling of the chain through the nanopore happens at a constant speed (tick rate; full or half steps of nucleotide intervals) defined by the motor enzyme used
- depending on which nucleotide bases go through the pore at each moment, the electrical membrane potential where the nanopore is embedded shifts (“squiggles”) characteristically, which is what gets measured
- the squiggle pattern is highly reproducible and can get interpreted by statistical models to derive the sequence
Sounds complicated? It is sophisticated for sure, but to visualize how this works, you can check out this little video.
While ideally we would like to only measure one nucleotide at a time, nucleotides are so small relative to the channel that the ‘squiggle’ measurements we get corresponds to five nucleotides in the channel; that’s why we need some ‘signal interpretation’ software (Markov chains or more recently neural nets) to do the deconvolution of the signal to identify nucleotides for each position. Today, this still has a small uncertainty and error rate attached to it, especially when the nucleotides in the pore are chemically modified or the motor enzyme has an irregular pull rate.
However, the big advantages of Nanopore sequencing over more classical PCR approaches is that it works without chopping the long nucleotide chains into smaller pieces and does not require amplification of those molecules. Biologically, getting the full length sequence is very valuable as it might contain information about isoforms or RNA/DNA modifications that would be lost otherwise. Furthermore, because no amplification, labourous library prep and ‘stitching’ together of the pieces is needed as with PCR based approaches, nanopore sequencers are small, portable and deliver real time analysis. The enzymatic requirements for Nanopore methods are also lower than for PCR based approaches, which should eventually make this technology cheaper at scale and easier to use in the long run.
Putting this all together means: sequencing and sequencing applications will come to the masses.
Imagine a world where everybody can have a little hand-held device and take a quick swap from the environment (e.g. train stations, subway, busses) to see if there is some bacterial or viral contamination going on right now. Sounds futuristic? Scientific volunteers have been doing that since 2016 with projects like Metasub. Once the pandemic started, this initative quickly expanded to check the environment for SARS-CoV-2 as well.
For all it’s achievements, the most exciting of nanopore sequencing technology is that this ingenious mechanism is not yet fully explored and might open up novel questions and applications.
For example, could this principle of pulling a string of molecules through a pore and measuring subsequent electric membrane potential changes also work for identifing other large molecules, let’s say chains of amino acids a.k.a proteins?
The next frontier: Peptide chains
A group lead by Cees Dekker at the Technical University Delft in the Netherlands was wondering the same thing. In a paper published on bioarxiv, the pre-print server for biological research, they provide some proof-of-principle that amino acid chains can be pulled through a nanopore channel and also change the electric membrane potential in a reproducible pattern based on sequence.
The core idea behind their setup is simple but smart; they created a peptide-DNA hybrid by chemically linking the two chains together (see below) and then just used an established DNA motor enzyme and channel protein to pull the DNA back (and with it, the attached amino acid chain at the end) through the pore.
Doing it like this also requires the chain to be first translocated through the membrane before being pulled back, which is easily facilitated by using the electrophoretic force pulling negatively-charged amino acids into the pore. From there, all the established principles of nanopore sequencing apply; the motor protein pulls at a predefined speed, the amino acid chain pulled through the nanopore causes characteristic ‘squiggles’ which can then be deconvoluted with software models to call position-specific amino acids, at least in theory.
Remarkably, their approach actually works. For their experiment, they used three peptides, identical in all but one single amino acid, and ran them through multiple pores many times to establish squiggle pattern differences. Below is a good demonstration why deconvolution is such an important part of establishing the sequence;
Even […] single-site variation were found to affect several ion current steps, because multiple amino acids around the pore constriction of MspA affect the ion current blockage level [read: ‘squiggle’] — Brinkerhoff et al., bioarxiv, 2021
The average single-read accuracy of their approach was 87%, which is quite good for these early experiments and also still an issue in the more mature DNA nanopore sequencing (most common source of error is irregular pull rate of the motor enzyme). So there is definitively room for improvement on this end as accurracy is among the most important metrics for this technology.
However, the researchers found a neat work-around for that. By increasing the concentration of motor enzyme, they (and others) have observed that a second helicase is ‘queued up’ behind the first; when it falls off, the chain get sucked into the pore again, and the second helicase pulls it back out, allowing another ‘sequencing’ round for the same peptide chain.
When the first helicase reaches the linker at the end of the DNA section, it can no longer process and falls off. The DNA-peptide conjugate is then immediately pulled back into the nanopore such that the queued helicase, which is still bound to the DNA, takes control as the new anchoring enzyme. This effectively ‘rewinds’ the system and reinitiates a new independent read of the peptide sequence. — Brinkerhoff et al., bioarxiv, 2021
With multiple readings of the same molecule, they have shown that an accuracy of almost a 100% can usually be achieved.
As always with these early studies, there are a lot of open questions, from fundamental limitations to scalability to prize, so I reached out to Henry Brinkerhoff, a postdoctoral fellow and first author on the pre-print, to get his perspective on the implications of this technology.
He sees the major limitations at the moment lie both in the read length of the peptide as well as the single-read accuracy. However, he is optimistic that more engineering and some tricks that worked for nanopore DNA sequencing can be applied to nanopore peptide sequencing as well.
• The read length. This could be engineered around, and really isn’t much of a limitation in the first place since the reads are long enough to (in principle) uniquely identify just about any protein.
• The accuracy in discriminating the 20 amino acids plus PTMs instead of just 4 bases. We address this though the re-reading method shown in the paper, and there are a big library of known nanopore techniques for getting accuracy higher. And even if the accuracy remains too low to do true de novo sequencing, it will still be straightforward to do protein “fingerprinting” or identification, identification of splice variants, and variant mapping (like in the recent publication) or PTM mapping. — Henry Brinkerhoff (personal communication)
On the issue of scalability, he is very confident that at least from the nanopore side, it is virtually a plug-and-play system.
The technology as it is could be directly incorporated into a MiniON or similar platform with minimal effort: just put MspA in the membranes, add Hel308, ATP, and the DNA-peptide conjugates and you’re off and running on 1000+ pores. — Henry Brinkerhoff (personal communication)
However, since the peptide chains need to be chemically linked to DNA chains, and we have no great way of doing that for native molecules, or in a targeted fashion within cells, it is unrealistic to expect nanopore sequencers to provide real time analysis of environmental samples anytime soon. A lot more protocols will need to be adopted, and problems solved, before we will have a comparative proteomic tool for mass adoption like the MiniON DNA sequencer.
Nevertheless, for research purposes, this proof-of-principle work is very promising. Asked what he is most excited about, Henry replied:
I think the key thing to emphasize with this technology, the thing that makes it new and exceptional, is that it is a single-molecule sequencing technology. Existing protein sequencing technologies are limited by both absolute and relative sample size, making it impossible to measure samples as small as a single cell or as heterogeneous as the cytoplasm. A single-molecule protein sequencer could let us begin to answer the seemingly basic but presently inapproachable question of what exactly is floating around in a cell at any given time.
The other exciting thing is that as a physical technique, rather than a technique relying on specific chemical labeling, we expect that it will be pretty easy to extend this method to post-translational modification (PTM) detection, which is a huge issue in biology and medicine. — Henry Brinkerhoff (personal communication)
I couldn’t agree more.
Nanopore sequencing technology is exciting. While established for nucleotide sequencing, there is no physical reason why the mechanistic principle of pushing a chain through a constricted pore and measuring membrane potential could not work for other long-chain molecules with a predefinded set of subunits. Here, we reported about the first of these ventures towards full-length peptide sequencing which could revolutionize the way proteomics are done. I expect many more to follow.
For all the good it will do to science and basic research, I personally believe that the true potential of real-time, easy-to-use molecular sequencers has not yet been grasped by wider society;
Imagine local, decentralized and crowd-sourced biodiversity observation and conservation; or self-sufficient and automated medical diagnostics; or global biosecurity monitoring; so many aspects of our lives can be improved by giving people the tools to participate and work toward shared goals.
Some of the tech is already here, most will come very soon. So take heart, help me spread the word, and join me and others in being optimistic about our technological future, together.
Further reading on ONT technology:
This story is part of advances in biological sciences, a science communication platform that aims to explain ground-breaking science in the field of biology, medicine, biotechnology, neuroscience and genetics to literally everyone. Scientific understanding has too many barriers, let’s break them down!