Curing Chronic Diseases with Nanopore Protein Sequencing

Kiran Mak
MaculaX Therapeutics
8 min readApr 25, 2020

--

Proteins cary out all of the physiological processes in the human body yet we only know the structure of 20% of them. That means we have a very minimal understanding of the biological complexities that make life possible.

Further, if something goes wrong and a person gets a chronic disease, often we are unable to provide a cure simply because we do not know the detailed mechanisms causing it. And if you have protein cascades in which multiple different proteins interact and trigger each other like in Alzheimer’s, the problem becomes exponentially harder.

We literally do not understand at a small scale how our body works. If we were able to gain structural information on all of the proteins in the human proteome, we’d be able to cure chronic diseases and accelerate drug discovery.

But current methods of protein structure determination are expensive and inaccurate, often requiring weeks or months to execute. At Macula X, we don’t believe that needs to be the case. Learn more about our vision.

We’ve proposed a methodology to obtain high resolution spatial information in a time and cost efficient way.

How does the technology work?

To determine the structure and sequence of a protein, our flow sequence includes:

  1. Purify the protein
  2. Direct the protein to the nanopore
  3. Pass the protein through a nanopore
  4. Rotation using ATP Synthase
  5. Expose it to gold nanoparticles to precisely determine amino acid composition on the surface
  6. Measure the results from the nanopore and nanoparticles
  7. Use MakuAnalyze software to synthesize the data
  8. Use MakuFold to render an image and use machine learning to identify structural motifs

Let’s take a look at the unique aspects of our design.

2. Direct the protein to the nanopore

We propose imaging proteins in vivo in addition to ex vivo, which will allow us to accurately sample biologically active conformations rather than artificially constructed ones.

With the explosion in new genomics, epigenomics, proteomics, and transcriptomics data, we believe that we can direct proteins to our nanopore site using either a nucleic acid or protein beacon that has predicted or already known binding affinity to the protein in question. Here are some ways to elucidate the composition of this molecular beacon.

  • CHIP-seq immunoprecipitation data — use data showing how proteins bind to DNA to use specific DNA sequences as potential molecular beacons.
  • Biochemical transduction pathways — use pre-existing knowledge about how proteins interact with each other to culture another protein with known high affinity to the protein of interest.
  • If no CHIP-seq or PPI data already exists, we propose conducting HT-SELEX analyses to search through randomly sampled sequences of RNA libraries to identify potent small length RNA sequences that can bind to the protein of interest. Tools like Aptasim can generate error-prone pcr simulations in silico and we project that by 2030, HT-SELEX will be computationally robust and feasible.

3. Nanopore technology

In the past decade, we’ve seen an explosion in the use of nanopore technology for DNA sequencing — and even more recently protein sequencing. We think we may have found methods to exploit these same methods — except this time to explore and determine protein structures.

In DNA sequencing, a nanopore, or nano-scale opening in a membrane, has ionic current passed through it. The DNA strand is then fed through the opening nucleotide by nucleotide which perturbs the current flow. These fluctuations are unique to each base pair and so can be used to identify the exact sequence.

Companies such as Oxford Nanopore technologies have developed sequencing technologies utilizing these methods such as the ION series — small handheld devices that allow for the convenient and portable sequencing of high throughput DNA.

And even more recently, scientists have learned how to use the same technology to identify all 20 amino acids.

We propose using a similar method except instead of passing through a single amino acid at a time, we analyze the entire protein. (Separating the amino acids obviously would disrupt the structure).

Since we’d be looking at multiple amino acids at a time — a protein is a folded tangle, not a neat chain of amino acids — that adds in significantly more complexity. Most notably, if we’re getting a single read out for each cross section of the protein, how can we separate that to determine the amino acid structure and composition of the composition?

4. Rotation Using ATP Synthase

A single pass through the nanopore will not give enough information, but if we rotate the protein through multiple angles, we will gain multiple representations of the same data which can be combined to provide a full picture.

To visualize this, think of a table from the top view. Now from the side. The image you get is vastly different. We’re using this same principle of looking at something from multiple angles to understand the 3d structure.

More precisely, the nanopore effectively takes repeated current measurements that indicate the amount of protein that blocks the flow of current. It can be thought of as measuring “area” of a cross-section of the protein. Then, rotating the protein through multiple angles will give different cross-section stacks that all represent the same thing in a different way.

When the cylinder is examined at different angles, you measure different cross-sections.

Combining all of the data then gives a complete picture.

5. Gold Nanoparticles

Gold nanoparticles in the nanopore will allow for specific determination of surface level amino acids.

Researchers have recently used molecular dynamics to show that nanoparticles of the same size have different binding energies to each of the twenty amino acids. Moreover, each of the twenty amino acids has preferential binding to a nanoparticle of different diameter due to the chemical composition and size of its side-chain.

Each AuNP (gold nanoparticle) has a different binding affinity for each amino acid and each amino acid has different preferential binding levels for each of the different sized AuNPs as evidenced by different binding free energy levels.

By translating the differential binding affinities of each of the amino acids into fluorescent signals, we can computationally identify the amino acids composing a protein.

We envision using the ProtSA web server to identify the average solvent accessible surface area per residue in the unfolded conformations of the protein structure. This should provide insights into the number of gold particles needed to cover the structure.

7. MakuAnalyze Software to Synthesize Data

The nanopore current fluctuations across multiple rotations will be combined, likely via the use of Fourier transforms, to decompose the signal into each amino acid component part. This gives structural information which combined with more detailed sequential data from the gold nanoparticles will enable us to analyze the protein.

8. MakuFold will use ML to Recognize Structural Motifs

We propose using machine learning algorithms (especially algorithms in the realm of dimensionality reduction, such as PCA) to determine the secondary structure of protein elements through clustering.

  • alpha helices
  • 3₁₀ helices
  • beta pleated sheets

Further, we see the possibility of utilizing machine learning algorithms like RoBERTA and NLP Transformers which have recently acquired breakthrough levels of accuracy and precision in NLP. Although primarily used in NLP, these methods have also beginning to be used in a wider variety of tasks including computational chemistry and small molecule drug design.

As opposed to traditional NLP methods like LSTMs and RNNs, they capture long term relevant dependencies in data by using something called “attention mechanisms” where they observe all of the data at once (instead of sequentially). This means they are not bound by only immediately recent data.

Further Considerations

This technology is still in the early stages though and while we have reasonable validation for our ideas to be convinced of their possibility, there are still numerous questions to consider in development:

  • Do existing mathematical techniques like independent component analysis (ICA), SVD, and Fourier Transform allow us to decompose the electrical signal perturbations into high resolution secondary structure motifs?
  • Will the cost of the gold nanoparticles still allow this to be this an economically viable proposal? Based on our calculations of the average surface accessible surface area and supporting stoichiometric calculations, the maximum amount of gold nanoparticles needed for sequencing one protein will likely be around one milligram, costing ~$80.
  • Will our method be able to gain access to elements of structure outside of the solvent accessible surface area (especially the hydrophobic core)?
  • How many nanopores will be required for structure determination? Current ION projects from Oxford Nanopore sequencing technologies require anywhere from 300–10000 nanopores within one sequencing flow cell. We project, given the additional complexities of protein sequencing, anywhere from 10000–1,000,000 nanopores required in tandem for high quality structure determination the single and sub angstrom level.

This technology has so much potential to revolutionize structural biology and medicine. Being able to catalog protein structures through a Protein Structurome Project would help not only Macula X, but other researchers

  • Move forward in being able to cure chronic diseases like Alzheimer’s and cancer
  • Develop novel drugs that can more successfully target proteins
  • Better utilize proteins in protein therapeutics
  • More accurately simulate clinical trials and reduce the time and money spent on trials that failed simply because we did not understand the underlying biology

Ultimately, protein structure determination is fundamental to medicine and scientific curiosity alike. It is becoming a key bottleneck in fields such as biochemistry, longevity, and machine learning.

Our ability to unlock the key to therapeutics and understand the human code at a much deeper level extends beyond solely our genetic code (DNA and RNA) into the actual workers of the cells — the proteins — the ones that carry out all important cellular functions.

We envision a world where drug discovery doesn’t take two decades. Where vaccine development to decelerate disastrous pandemics like COVID-19 doesn’t take 1 whole year. Where we’re able to provide treatments and cures to the millions of people around the world with chronic diseases that right now have no definitive and certain cure.

Finally, we’d appreciate your help! If you have any advice or input on our technological methods, please reach out.

Macula Therapeutix is a company founded by Kiran Mak and Mukundh Murthy, two innovators passionate about changing the world by changing one of the most traditional and unquestioned methods that dictate how to acquire structural biology data.

--

--

Kiran Mak
MaculaX Therapeutics

I love learning and am interested in materials science, education, and environmental sustainability.