Nanopore proteomics are on the horizon

How a revolutionary nucleic acid sequencing technology gets repurposed for peptide chains

Introduction

Sequencing in biology usually refers to the process of determining the exact order of nucleotides (DNA/RNA) or amino acids (proteins) in large biological molecules that build life. Often compared with computer code, or strings of bits, all the functionality of life depends on what is encoded in that order.

Overview RNA sequencing processing steps. Image source

Sequencing technologies have seen a phenomenal development in the last three decades; the prize and effort of sequencing a whole genome dropped from impossibly large to virtually nothing; they are the driving force behind the digitization of biology and have already inmeasurably contributed to science, medicine and human wellbeing. Genomics as a field would barely exist without it, neither would we have sequenced SARS-CoV-2 and produced a vaccine against Covid-19.

On track to be one of the most fundamental technologies shaping the 21st century, the developments in sequencing technologies will have an outsized impact on all of our lives, whether we are aware of it or not.

Currently, the bulk of sequencing is being performed using short read approaches that require fragmenting the chains into smaller pieces and preparation of socalled ‘read-libraries’ (see above) which can then be amplified by PCR; these approaches are cheap, scale well through parallelization and the often complicated re-assembly (infering the full-length sequence from the many overlapping small reads) profits from increased computational power and methods.

However, for some applications, short read approaches remain too disruptive, or are insufficient to address certain research questions, or the information gained from them is incomplete. Therefore, researchers have been trying to come up with methods to probe full-length chains.

The rise of nanopore sequencing

Nanopore sequencing technology works differently, and is truly brilliant. The basic idea is somewhat simple: Take a full lengh nucelotide chain and push it through a small opening, one nucleotide at the time, while measure what nucleotide goes through at each step.

The problem of course is that these chain molecules are super long while at the same time very small in diameter; we have no way of directly touching them or fold them into a straight line; lastly we have nothing that can just go over that string and ‘measure’ the nucleotide at a specific position.

But what we do have as a molecular door; more specifically a whole set of channel proteins which allow different molecules to traverse a membrane (for example the lipid barrier surrounding a cell) in an gated fasion. During traversal of that door, the nucleotide chain is stretched straight; so all we need now is a way to assess which nucleotide passes that door at each step; and this works with the help of an electrical membrane potential.

Basic nanopore sensing and sequencing schematic. (A) A single MspA channel is inserted into a lipid bilayer membrane. Positive (blue) K + ions and negative (red) Cl-ions are contained on either side of the membrane. An applied electric field drives K + ions from the trans chamber to the cis chamber and Cl-ions from cis to trans through MspA, producing the unblocked pore current. The region shaded in red at the base of the pore marks the constriction zone of MspA. (B) A motor enzyme controls the translocation of single stranded DNA through MspA. In the schematic depicted here, the motor enzyme translocates and unwinds double stranded DNA (dsDNA), allowing the passage of the ssDNA further into the pore with each step. The flow of both K + and Cl-ions is modulated by the presence of the DNA within the pore. [Nova I.C. et al., PlosOne, 2017]

There are multiple different setups for the tech, but here are the essential steps:

  • the nucleotide chain one wants to sequence gets connected (‘ligated’) to an adaptor molecule, which is bound to an engineered motor protein (for DNA, often a helicase)
  • an electrical current or the motor enzyme push the DNA/RNA chain through the nanopore (often a channel protein like MspA), or alternatively, the chain is allowed to diffuse through and then pulled back by the motor enzyme through the pore
  • the trick is that the pushing or pulling of the chain through the nanopore happens at a constant speed (tick rate; full or half steps of nucleotide intervals) defined by the motor enzyme used
  • depending on which nucleotide bases go through the pore at each moment, the electrical membrane potential where the nanopore is embedded shifts (“squiggles”) characteristically, which is what gets measured
  • the squiggle pattern is highly reproducible and can get interpreted by statistical models to derive the sequence

Sounds complicated? It is sophisticated for sure, but to visualize how this works, you can check out this little video.

While ideally we would like to only measure one nucleotide at a time, nucleotides are so small relative to the channel that the ‘squiggle’ measurements we get corresponds to five nucleotides in the channel; that’s why we need some ‘signal interpretation’ software (Markov chains or more recently neural nets) to do the deconvolution of the signal to identify nucleotides for each position. Today, this still has a small uncertainty and error rate attached to it, especially when the nucleotides in the pore are chemically modified or the motor enzyme has an irregular pull rate.

However, the big advantages of Nanopore sequencing over more classical PCR approaches is that it works without chopping the long nucleotide chains into smaller pieces and does not require amplification of those molecules. Biologically, getting the full length sequence is very valuable as it might contain information about isoforms or RNA/DNA modifications that would be lost otherwise. Furthermore, because no amplification, labourous library prep and ‘stitching’ together of the pieces is needed as with PCR based approaches, nanopore sequencers are small, portable and deliver real time analysis. The enzymatic requirements for Nanopore methods are also lower than for PCR based approaches, which should eventually make this technology cheaper at scale and easier to use in the long run.

Putting this all together means: sequencing and sequencing applications will come to the masses.

Imagine a world where everybody can have a little hand-held device and take a quick swap from the environment (e.g. train stations, subway, busses) to see if there is some bacterial or viral contamination going on right now. Sounds futuristic? Scientific volunteers have been doing that since 2016 with projects like Metasub. Once the pandemic started, this initative quickly expanded to check the environment for SARS-CoV-2 as well.

For all it’s achievements, the most exciting of nanopore sequencing technology is that this ingenious mechanism is not yet fully explored and might open up novel questions and applications.

For example, could this principle of pulling a string of molecules through a pore and measuring subsequent electric membrane potential changes also work for identifing other large molecules, let’s say chains of amino acids a.k.a proteins?

The next frontier: Peptide chains

A group lead by Cees Dekker at the Technical University Delft in the Netherlands was wondering the same thing. In a paper published on bioarxiv, the pre-print server for biological research, they provide some proof-of-principle that amino acid chains can be pulled through a nanopore channel and also change the electric membrane potential in a reproducible pattern based on sequence.

The core idea behind their setup is simple but smart; they created a peptide-DNA hybrid by chemically linking the two chains together (see below) and then just used an established DNA motor enzyme and channel protein to pull the DNA back (and with it, the attached amino acid chain at the end) through the pore.

Doing it like this also requires the chain to be first translocated through the membrane before being pulled back, which is easily facilitated by using the electrophoretic force pulling negatively-charged amino acids into the pore. From there, all the established principles of nanopore sequencing apply; the motor protein pulls at a predefined speed, the amino acid chain pulled through the nanopore causes characteristic ‘squiggles’ which can then be deconvoluted with software models to call position-specific amino acids, at least in theory.

Reading peptides with a nanopore sequencer. (A) The DNA-peptide conjugate construct consists of a peptide (pink) attached via a click linker (green) to an ssDNA strand (black ) (B) The complementary oligo blocks the helicase, until it is pulled into the pore (b), causing the complementary strand to be sheared off (c), whereupon the helicase starts to step along DNA. (C) As the helicase walks along the DNA, it pulls it up through the pore, resulting in (a) a read of the DNA portion followed by (b) a read of the attached peptide. (D) Typical nanopore read of a DNA-peptide conjugate (black), displaying clear step-like ion currents (identified in red). (E) Consensus sequence of ion current steps (red), which for the DNA section is closely matched by the predicted DNA sequence (blue). [Brinkerhoff et al., bioarxiv, 2021]

Remarkably, their approach actually works. For their experiment, they used three peptides, identical in all but one single amino acid, and ran them through multiple pores many times to establish squiggle pattern differences. Below is a good demonstration why deconvolution is such an important part of establishing the sequence;

Even […] single-site variation were found to affect several ion current steps, because multiple amino acids around the pore constriction of MspA affect the ion current blockage level [read: ‘squiggle’] — Brinkerhoff et al., bioarxiv, 2021

The average single-read accuracy of their approach was 87%, which is quite good for these early experiments and also still an issue in the more mature DNA nanopore sequencing (most common source of error is irregular pull rate of the motor enzyme). So there is definitively room for improvement on this end as accurracy is among the most important metrics for this technology.

Detection of single amino acid substitutions in single peptides. (A) Consensus ion current sequences for each of the three measured variants (D, gold; W, red; G, blue), which differ significantly at the site of the amino acid substitution. (B) Difference in ion current between the W (red) and G (blue) variants and the D variant. Error bars are standard deviations. (C) Confusion matrix showing error modes of a blind classifier in identifying variants of reads, demonstrating an 87% sequencing accuracy. (D) All-atom model where a reduced-length MspA pore (grey) confines a polypeptide chain (Glu: green, Asp: light blue; Cys: beige). The top end of the peptide is anchored using a harmonic spring potential, representing the action of the helicase at the rim of a full-length MspA. Water and ions are shown as semitransparent surface and spheres, respectively. (E) Top: Ionic current in MspA constriction versus z coordinate of the mutated residue backbone from MD simulations. Bottom: Fraction of nanopore construction volume available for ion transport. Vertical and horizontal error bars denote standard errors and standard deviations, respectively. (G,H) Representative molecular configurations observed in MD simulations of peptide variants. Glycine and tryptophane residues are shown in dark blue and red, respectively. Significant peptide/pore surface interactions are observed. [Brinkerhoff et al., bioarxiv, 2021]

However, the researchers found a neat work-around for that. By increasing the concentration of motor enzyme, they (and others) have observed that a second helicase is ‘queued up’ behind the first; when it falls off, the chain get sucked into the pore again, and the second helicase pulls it back out, allowing another ‘sequencing’ round for the same peptide chain.

When the first helicase reaches the linker at the end of the DNA section, it can no longer process and falls off. The DNA-peptide conjugate is then immediately pulled back into the nanopore such that the queued helicase, which is still bound to the DNA, takes control as the new anchoring enzyme. This effectively ‘rewinds’ the system and reinitiates a new independent read of the peptide sequence. — Brinkerhoff et al., bioarxiv, 2021

Re-reading of a single peptide sequence. (A) Highly repetitive ion current signal corresponding
to numerous re-reads of the same section of an individual peptide (in this case, the G-substituted variant).
The expanded plot below shows a region that contains four rewinding events (red dashed lines), where the
trace jumps back to level 52 ± 2 of the consensus displayed in Fig. 2A. (B) Re-reading is facilitated by
helicase queueing, where (a) a second helicase binds behind the primary helicase that controls the
sequencing, re-reading starts when (b) the primary helicase dissociates, and © the secondary one
becomes the primary helicase that drives a new round of sequencing. © By using information from multiple
re-reads of the same peptide, the identification accuracy can be raised to very high levels of fidelity. These
results indicate that with sufficient numbers of re-reads, random error can be eliminated and single-
molecule error rate can be pushed lower than 1 in 106 even with poor single-pass accuracy. Inset is a
logarithmic plot of the error rate = 1 — accuracy. [Brinkerhoff et al., bioarxiv, 2021]

With multiple readings of the same molecule, they have shown that an accuracy of almost a 100% can usually be achieved.

As always with these early studies, there are a lot of open questions, from fundamental limitations to scalability to prize, so I reached out to Henry Brinkerhoff, a postdoctoral fellow and first author on the pre-print, to get his perspective on the implications of this technology.

He sees the major limitations at the moment lie both in the read length of the peptide as well as the single-read accuracy. However, he is optimistic that more engineering and some tricks that worked for nanopore DNA sequencing can be applied to nanopore peptide sequencing as well.

• The read length. This could be engineered around, and really isn’t much of a limitation in the first place since the reads are long enough to (in principle) uniquely identify just about any protein.

• The accuracy in discriminating the 20 amino acids plus PTMs instead of just 4 bases. We address this though the re-reading method shown in the paper, and there are a big library of known nanopore techniques for getting accuracy higher. And even if the accuracy remains too low to do true de novo sequencing, it will still be straightforward to do protein “fingerprinting” or identification, identification of splice variants, and variant mapping (like in the recent publication) or PTM mapping. — Henry Brinkerhoff (personal communication)

On the issue of scalability, he is very confident that at least from the nanopore side, it is virtually a plug-and-play system.

The technology as it is could be directly incorporated into a MiniON or similar platform with minimal effort: just put MspA in the membranes, add Hel308, ATP, and the DNA-peptide conjugates and you’re off and running on 1000+ pores. — Henry Brinkerhoff (personal communication)

However, since the peptide chains need to be chemically linked to DNA chains, and we have no great way of doing that for native molecules, or in a targeted fashion within cells, it is unrealistic to expect nanopore sequencers to provide real time analysis of environmental samples anytime soon. A lot more protocols will need to be adopted, and problems solved, before we will have a comparative proteomic tool for mass adoption like the MiniON DNA sequencer.

Nevertheless, for research purposes, this proof-of-principle work is very promising. Asked what he is most excited about, Henry replied:

I think the key thing to emphasize with this technology, the thing that makes it new and exceptional, is that it is a single-molecule sequencing technology. Existing protein sequencing technologies are limited by both absolute and relative sample size, making it impossible to measure samples as small as a single cell or as heterogeneous as the cytoplasm. A single-molecule protein sequencer could let us begin to answer the seemingly basic but presently inapproachable question of what exactly is floating around in a cell at any given time.

The other exciting thing is that as a physical technique, rather than a technique relying on specific chemical labeling, we expect that it will be pretty easy to extend this method to post-translational modification (PTM) detection, which is a huge issue in biology and medicine. — Henry Brinkerhoff (personal communication)

I couldn’t agree more.

Conclusion

Nanopore sequencing technology is exciting. While established for nucleotide sequencing, there is no physical reason why the mechanistic principle of pushing a chain through a constricted pore and measuring membrane potential could not work for other long-chain molecules with a predefinded set of subunits. Here, we reported about the first of these ventures towards full-length peptide sequencing which could revolutionize the way proteomics are done. I expect many more to follow.

For all the good it will do to science and basic research, I personally believe that the true potential of real-time, easy-to-use molecular sequencers has not yet been grasped by wider society;

Imagine local, decentralized and crowd-sourced biodiversity observation and conservation; or self-sufficient and automated medical diagnostics; or global biosecurity monitoring; so many aspects of our lives can be improved by giving people the tools to participate and work toward shared goals.

Some of the tech is already here, most will come very soon. So take heart, help me spread the word, and join me and others in being optimistic about our technological future, together.

Further reading on ONT technology:

History of Nanopore sequencing

This story is part of advances in biological sciences, a science communication platform that aims to explain ground-breaking science in the field of biology, medicine, biotechnology, neuroscience and genetics to literally everyone. Scientific understanding has too many barriers, let’s break them down!

--

--

--

AdBioS is a science communication platform that aims to explain ground-breaking science in the field of biology, medicine, biotechnology, neuroscience and genetics to literally everyone. Scientific understanding has too many barriers, let's break them down!

Recommended from Medium

The Development Dual Hierarchies: Individual Status and Group Stratification

From Bacteria to Bacteria: A summary on the origin of life and evolution

Coronavirus research is in full swing in Bangladesh

Natural Night Vision in Humans

11BIOMICS: Rebalancing Ecosystems to Fight Plant Diseases

Mice embryos have been grown in artificial wombs. Could humans be next?

Researchers create ‘decoy’ coatings that trick infrared cameras

Researchers create 'decoy' coatings that trick infrared cameras

Study identifies the undersea origins of mysterious love waves, decoding some of Earth’s continuous…

Study identifies the undersea origins of mysterious love waves, decoding some of Earth's continuous vibrations

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Philipp Markolin

Philipp Markolin

Science holds the keys to a world full of beauty and possibilities. I usually try something new.

More from Medium

How Artificial Intelligence Looks For Ways to Help Humans Settle on Mars

Understanding AI Before It Overtakes Us

Making the “Automated scientist”: Co-navigating the hypothesis and experimental space using…

Connectionism, Languages, and Common-sense