An Engineered Origin for the SARS-CoV2 Genome.

Sherlock G.Nomes
23 min readJul 11, 2021

--

Despite doubt regarding the origins of SARS-CoV2 and the possibility of deliberate engineering of this pandemic virus, an engineering workflow for SARS-CoV2 using available data sets and techniques common to molecular biology labs is hereby explained. Each step in the workflow has a historical precedence of being used on SARS-like coronaviruses from China.

For the sake of clarity, only viruses that can reasonably be assumed to come from real and natural sources were included in this analysis. More scientific detail on excluded genomes and the reasons for their exclusion will be presented at the end of the article.

The idea that the genetics of this virus are not consistent with evolutionary theory was published as early as August 2020 by Sirtokin & Son (1). Up until now, this idea has been mostly ignored, misunderstood, and often turned on its head to fit various natural origin theories. This article introduces simple reverse genomic engineering methods that when applied to publicly available data sets, confirm inconsistencies with evolutionary theory and reveal potential engineered origins for novel RNA sequence in the SARS-CoV2 genome.

The Synthetic Backbone — 3 Pieces from 3 Viruses

A pair-wise alignment of an engineered in-silico genetic backbone to the SARS-CoV2 genome has a 94.9% DNA Sequence Identity. This proposed backbone (named SynCoV1) consists of 4 distinct genomic pieces. Excluding the spike protein, SynCoV1 has a 97.3% DNA Sequence Identity to SARS-CoV2. The first piece is a consensus sequence of 3 bat coronaviruses; ZC45 (2), RmNY02 (3), and RpNY06 (4). The two other backbone pieces use natural bat coronavirus sequence. (RmNY02 and RpNY06, respectively).

Piece One — The Big ORF 1ab

The consensus genome sequence (called BB3C) was created in-silico (online with a laptop) from ZC45, RmNY02, and RpNY06. This in-silico construct shares a 97.5% DNA Sequence Identity to ORF1ab, the first genes in the SARS-COV2 virus. DNA Sequence Identity for the ORF1ab genes between the 3 bat coronaviruses ZC45, RmNYO2, and RpNY06 alone are 88.5%, 97.2% and 97.1%, respectively.

An Engineered Backbone. Proposed pieces (contigs) highlighted in green. Although the BB3C genome has higher homology from Orf3a to the Membrane protein than RmYN02 does, since ORF6 is clearly more highly homologous to RmYN02 than any other virus, we propose a single contig extending from after spike through ORF6 as simplest assembly. Spike protein is analyzed separately and constitutes a likely fourth contig inserted into this engineered backbone. The colored bar on the top represents approximate relative sizes of each gene.

An ORF is a Gene. ORF is short hand for Open Reading Frame and just one example of scientists using confusing and insider lingo. This in-silico evolution towards the biggest gene in the virus, which comprises 79% of the entire genome, is not consistent with expectations from evolutionary theory and suggests an an engineered approach to a large majority of the novel SARS-CoV2 genetic backbone.

The in-silico BB3C piece (contig) explains over 80 novel but naturally viable mutations potentially engineered into the SARS-CoV2 genome. These would not have been included by using any of the ZC45, RmNY02 or RpNY06 genomes as a backbone in this region alone. This leaves only an estimated 532 novel mutations after in-silico consensus left to evolve in this portion of the genome vs. 830 novel mutations from the highly homologous but unaltered bat coronavirus backbones RmNY02 and RpNY06.

This evolutionary feat, conducted with free bioinformatic tools, cuts decades off natural timelines required for bat coronaviruses genomes to evolve into a new virus genome, like SARS-CoV2. Learn more about harnessing natural diversity for genetic engineering in the Draper Prize speech presented later in this article.

Consensus backbones consolidate conserved genomic sequences found in nature. Using DNA or RNA sequences that are highly conserved and popular in nature is a good way to find and exploit new genomic diversity in an efficient manner. Viruses with mutations created from consensus are very likely to survive as the mutations have successful evolutionary precedence (6).

An in-silico consensus backbone to create a novel bat coronavirus was pioneered in 2008 by famous Tar Heel coronavirologist Ralph Baric, et. al. (5) using HKU3–1, HKU3–2, HKU3–3, and RP3 bat coronaviruses. These were chosen at the time because they seemed to be the closest natural reservoirs to the virus that caused the 2003 SARS outbreak.

The purpose of their study was to overcome the limitations of studying live bat coronaviruses in the lab, the main one being they were not able to survive in the lab. Hence, the goal was to engineer a new virus that would survive in the laboratory conditions (in-vitro), one genetically distinct but very similar to the natural blueprints used to create it. This makes it easier to study and understand underlying genomic and functional biochemical relationships in the SARS-like potential pandemic pathogens they were so worried about and well-funded to study.

Although the BB3C genome does predict over 300 new SARS-CoV2 mutations in ORF1ab, it did not predict novel genetic diversity seen in other genes for SARS-CoV2 (ORF6, ORF7b, and ORF8). These results also help highlight that consensus sequences are not guaranteed to predict novel diversity from either natural or engineered origins. This also suggests the SARS-CoV2 genome was assembled from separate and distinct sections.

Although the assembly of identifiable and distinct sections of the backbone does suggest an engineered origin, this fact alone does not rule out natural recombination. One study posits that with 40-50 years of cross-viral recombination amongst a host of different bat coronaviruses known to Southeast Asia (but not Wuhan) the backbone of SARS-CoV2 virus could have evolved naturally (7). It does not however provide any explanation for spike protein evolution beyond recent recombination of an unknown natural chimeric backbone with spike protein sequence from the mythical pangolin genome.

Piece Two — Envelope, Membrane and More.

After the the Big ORF1ab and the Spike Protein are 4 more genes called ORF3a, the Envelope protein, the Membrane Protein, and ORF6. The numbers are just their relative physical order in the genome and important genes get name upgrades related to their function.

DNA Sequence Identity between this single continuous string of DNA (contig) and the bat coronavirus RmYN02, is 96.3%. Thus 11% of the SARS-CoV2 genome’s origin can be explained by a single contig from a natural virus ancestor.

Viruses are some of the simplest DNA and RNA based lifeforms on earth. Each of these genes are essential for its survival. Learn about their functions.

Piece Three — The Nucleocapsid Neighborhood.

After the contig described above are 4 more genes called ORF7b, ORF8, the Nucleocapsid Protein, and ORF10.

DNA Sequence Identity between SARS-CoV2 and a third piece (contig) from a different bat coronavirus, RmYN02, is 98.2%. Thus 7% of the SARS-CoV2 genome’s origin can be explained by a single contig from a natural virus ancestor.

UPDATE: September 2021

New bat coronavirus genomes from Laos have been recently published (33) and show a high sequence identity to SARS-CoV2. A phylogenetic tree built with a Hidden Markov Model shows three of these new viruses have a closer common ancestor than the controversial RaTG13 sequence provided by the Wuhan Institute of Virology (34).

Three genome sequences (BANAL-20- series) share a more common ancestor with SARS-CoV2 than RaTG13. The point labeled 1085.25 is a hypothetical ancestor predicted by Clustal Omega alignment and DNA distance phylogenetic calculation.

One of these new sequences has the highest DNA Sequence Identity (96.8%) to SARS-CoV2 than any published bat coronavirus sampled in nature. Genomic pieces from these bats may have ended up in SARS-CoV2 through either natural or synthetic recombination.

Pairwise alignment by Emboss Stretcher. MZ937000.1 is BANAL-20–52, a bat coronavirus isolated in Laos. MN985325 is human coronavirus SARS-CoV2 isolated in Washington USA and published by the CDC.

THE S.G.N. PHYLOGENETICS CHALLENGE

It was suggested by some Zoonati scientists that an in-silico or synthetic coronavirus would be obvious in a phylogenetic analysis. In the below phylogenetic tree (Clustal Omega; Hidden Markov Model — displaying DNA distance), there are 3 known synthetic viruses and 2 more whose origins are purported by this article and others to be synthetic.

Can you find the synthetic viruses in the above tree? Although there are at least 5 empirically validated mathematical models available to phylogeneticists, Hidden Markov Models are less susceptible to to sub-optimal guide trees than simpler methods. MAFFT and MUSCLE are known to deteriorate phylogenetics for easy alignments (35).

Creating a Consensus — DIY Instructions

Making a consensus DNA sequence is a pretty straightforward calculation for 3 different viruses. First download the viruses (ZC45 from NCBI, RmNY02 & RpNY06 from GISAD) in what is essentially a text file format called ‘fasta’, and editable in Microsoft Notepad.

The viruses you want to align are cut and pasted into one ‘fasta’ file, then uploaded online for genomic alignment with Clustal Omega provided by the EMBI-EBL (8). The results are then downloaded as an alignment of all the DNA sequences in the ‘fasta’ format.

Another free software tool is called JalViewer (9) which can be downloaded onto a regular computer. The aligned sequences can be uploaded and if you want to go blind, looked at in great detail. A good alignment involves trimming 150 base pairs off the front of the genome data and confirming good cDNA translations in JalViewer. After uploading your best alignment into JalViewer, the consensus sequence has automatically been calculated. This can be copied and pasted into a New Window in Notepad and saved in the ‘fasta’ format. Congratulations, you just created a novel chimeric synthetic virus in-silico. Please be careful.

Calculating Consensus. The front (5') end of the ORF1ab gene is pictured in JalViewer. ZC45, RmYN02 and RpNY06 bat natural bat coronaviruses are aligned. Consensus at position 210 is automatically calculated to be T and displayed below the black box area. At position 231, consensus sequence is C, at 237 G.

Determining DNA Sequence Identity-DIY Instructions

DNA Sequence Identity is used to describe the same thing; a quantitative measure of how similar are two genetic organisms or any two pieces of genetic material.

You can compare DNA Sequence Identity between you and your father (over 99.9%), or you and a chimp (99.0%), you being more genetically like your father than a chimp. Instead of calling them shared, many evolutionary geneticists also like to say where the DNA is the same it is conserved. Although you are closer in DNA Sequence Identity to your father; you, your father, and the chimp share a lot of conserved sequence (99.0%!).

Where the DNA Sequence Identity is not conserved and has not been seen before in other organisms in nature, this is called a ‘novel’ mutant. If someone finds DNA from a lifeform and it does not have close DNA Sequence Identity to anything else ever seen before, this is often enough to call it a new species and the ultimate excitement for many field-tripping virologists and other assorted bug-collectors.

An Alignment of over 40 SARS-like Coronavirus genomes. The window where the spike protein sub-units, S1 and S1 meet is displayed. Examples of conserved DNA sequences and novel mutations within the SARS-COV2 viral family are labeled. For clarity, although the box labelled Conserved DNA Sequences has different examples of conserved sequences, the entire box itself is not a conserved sequence. However, every difference not labelled as ‘novel mutant’ within this box is seen in some sub-set of other genomes in this alignment. If you like Where’s Waldo you might try finding more novel mutations in the alignment pictured.

To calculate DNA Sequence Identities between BB3C, natural bat viruses and SARS-CoV2, gene start and stop locations were identified in the ZalViewer alignments using annotated gene information available from the NCBI (SARS-CoV2 genome).

The appropriate sequences for each genome were selected, copied and pasted into Notepad in the ‘fasta’ format. Pairwise alignments (which means only 2 genomes at a time), between each of the BB3C, RmNY02, RpNY06 and ZC45 gene sequences and the corresponding SARS-COV2 gene sequences were performed online with Emboss Stretcher pair-wise alignment tool provided by the EMBI-EBL (8).

ORF1ab DNA Sequence Identity. From pair-wise alignment of BB3C to SARS-CoV2 (MN985325.1.1)

An In-Silico Engineered Origin for the SARS-CoV2 Genome

By assembling one consensus sequence and 2 more contigs (pieces) from natural virus sequences the novel backbone of SARS-CoV2 can be explained with an engineered origin. After excluding the spike protein, this proposed backbone, SynCoV1, covers 97.6% of the virus genome and shares a DNA sequence similarity of 97.3%. This leaves only an estimated 661 further mutations in the backbone as the evolutionary distance from an engineered progenitor to today’s world-impacting pandemic virus.

The remaining piece of the SARS-CoV2 genome, not potentially engineered as described above, is the 2.7% that makes up the human toxin (10), well-adapted (11), and furin-cleavage site endowed (12), spike protein.

UPDATE: September 2021

Due to the tireless work of the bad-ass motherfuckers at #DRASTIC, we know now that the EcoHealth Alliance, along with collaborators Shi Zhengli at the Wuhan Institute of Virology and famous TarHeel coronavirologist Ralph Baric, had submitted a proposal to the DARPA’s PREEMPT program (32) that involved building novel chimeric coronavirus backbones including ones using consensus sequence:

In this portion of the grant proposal, they describe building consensus genomes and building them from commercial synthetic DNA.
QS stands for Quasi-Species, meaning a variety of closely related but different viral genomes. In this grant they will build them (using the method described in the preceding snippet) and see if they can infect human cells.

The Mutant Spike — From Points Unknown, A Singular Menace Rises.

The Final Piece — The Peerless Protein

SARS-CoV2 spike protein has less than 80% DNA Sequence Identity to bat coronaviruses that share a common lineage. Many of these same coronaviruses share very high DNA Sequence Identity with SARS-CoV1 spike protein (90–99%).

This is regardless of which animal, human, civet or bat that it infects. The spike protein consists of two parts, the S1 and S2 sub-unit. In the below graph, it is seen the choice of host makes a bigger impact on S1 genetic evolution than S2. However, switching to a receptor that is not ACE2 to infect a host creates a big genetic evolutionary impact in both S1 and S2 sub-units as compared to SARS-CoV1

Coronavirus DNA Sequence Identities. Viruses are ranked by first homology to SARS-COV1 S2 protein and then S1 protein. ACE2 Receptor usage by the spike protein from different viruses is labeled (13). The JPDB144 coronavirus infects the Daubenton’s bat. This is not a horseshoe bat which is the known natural reservoir for the SARS-like virus family. This virus does not share much conserved sequence with the horseshoe bat viruses and can be considered a random negative control genome. The purpose of including it was to illustrate just how conserved sequences with similar hosts and receptors are, and just how different coronaviruses can be genetically.

SARS-CoV2 is unique in that it infects humans using the ACE2 receptor but its spike protein doesn’t have high DNA sequence identity with any other ACE2 receptor binding spike proteins. Also, SARS-CoV2 shows very little variation at all in DNA Sequence Identity from 34 viruses identified in natural horseshoe bat reservoirs, regardless of their preferred host or receptor.

A natural animal host with live evolving viruses in it can be thought of as a ‘reservoir’ of genetic material to be used as genetic blueprints for evolving into a potential pandemic pathogen when presented that opportunity. Natural evolutionary theory when applied to potential pandemic bat coronaviruses requires a natural reservoir from which this novel genetic solution will emerge and gain the function to infect ACE2 receptor in humans. Without evidence for that or even any hint of the potential evolutionary path of the SARS-CoV2 spike protein from nature, the explanation that bets fits this data is that this spike protein has been engineered with random mutations.

UPDATE: September 2021

New bat coronavirus genomes (BANAL-20–52, BANAL-20–236, and BANAL-103) have been published that show a high sequence identity to SARS-CoV2 spike protein. Whether a natural ancestor, synthetic template, spill-back infection, or even perhaps academic fraud has not been conclusively determined.

A Library of Novel Spike Proteins

If natural evolution cannot explain the spike sequence, and neither can in-silico engineering, there leaves only one explanation for the appearance of large amounts of novel mutations while maintaining or enhancing function. It is important to understand that proposed novel mutations can’t destroy the conserved natural functions that have already proven effective for a virus’ continued survival in nature (14).

Any random novel mutation has the potential to create a ‘gain-of-function’ in the spike protein, or just destroy it. The type of novel mutations that are found and provide a ‘gain-of-function’ depends entirely upon what ‘gain-of-function’ you are looking for. Synthetic spike proteins have been engineered into coronavirus backbones and tested for infectivity in cell culture (in-vitro) by researchers at the Wuhan Institute of Virology. (15,16).

A synthetic, lab-made version of just the spike protein is called a ‘recombinant’ protein and these have been made before in China for studying bat coronaviruses (17). These real proteins are synthetic but accurate versions of the exact natural protein and are the most common way to understand biochemical function in nearly all aspects of biotechnology R&D, including agriculture, human health, and industrial biochemicals (18).

The most common way to create novel mutations and screen for ‘gain-of-function’ is to create a mutational variety of sequences in a random manner (of which there are several), produce synthetic versions and then analyze all the variants using your choice of in-vitro laboratory tests(19).

In this proposed workflow, a variety of random mutants for the Spike protein could be created with DNA sequence identities up to 20% divergent from natural spike proteins, then inserted into a novel bat coronavirus backbone and ultimately screened for infectivity in-vitro (20). Novel mutations, ones nature would not evolve towards without help, can then be created and discovered.

This type of engineering workflow is called directed evolution and won a Draper Engineering Prize for the late and great Pim Stemmer:

His Draper co-awardee Frances Arnold went on to win a Nobel Prize and is still practicing the science she helped invent to solve environmental issues today.

This workflow has advanced rapidly in the decade since these first prizes were awarded. Creation of libraries of hundred to thousands of mutants using a variety of means is commercially available as a service. These gene variant libraries are easily ordered from suppliers in China.

UPDATE: September 2021

In the EcoHealth Alliance DARPA PREEMPT proposal uncovered by #DRASTIC (32), building a synthetic library of S proteins was proposed, followed up by in-vivo screening of mutant function to select potentially novel sequence with pathogenetic potential.

In this proposal synthetic DNA (commercial gene blocks) will be used to produce mutant variants of the S protein’s RBD epitope. They then propose testing these variants for receptor usage in-vivo.

One Spike Rises to the Top

Proposed is an engineering workflow making use of commercially available random genetic libraries and in-vitro infectivity experiments in cell-culture (20) to screen libraries of randomly mutated spike proteins with the goal of finding novel mutations that increase infectivity.

By selecting a highly infectious mutant, new DNA sequence that can infect a cell using its ACE2 receptors is discovered. This increased infectivity and preference for humans over potential cross-over species (11) is a unique feature that has made SARS-CoV2 so destructive on a historical scale.

Here ends the spike’s journey. A random mutant selected from hundreds to thousands of commercially available spike protein variations using engineered backbones and laboratory cell-culture in-vitro. Without a cross-over species with a closely related natural SARS-CoV2 ancestor virus, this engineered workflow better explains novel mutations and dangerous human optimized ‘gain-of-function’ than a natural evolutionary theory of origin from unknown reservoirs.

The DNA sequence of the SARS-CoV2 spike protein is not only inconsistent with DNA Sequence Identity and mutational changes seen between all other known natural coronaviruses in this lineage, it is completely consistent with creation of a random library of spike proteins and ‘gain-of-function’ by directed evolution using in-vitro cell culture selection.

The Evolutionary Paths Converge

Life and Passage

Earlier, an engineered backbone was proposed that was close, but not exactly SARS-CoV2. A faithful reproduction of the in-silico construct should match very, very closely. Here we return to the issue of bringing this Frankenstein to life in the lab. We know this is not an easy task and why chimeric bat coronavirus backbones were invented in the first place (5).

An engineered coronavirus backbone, SynCoV1, using the entire sequence, including consensus spike sequence, is turned from computer screen to real world biology by assembling the pieces chemically and bringing it to life in petri dishes (5).

Once the virus lives and can replicate itself in a host, it is free to evolve in a more natural manner. Although this evolution looks natural, it does not happen in nature but in the laboratory. These new novel mutations will not have been predicted accurately by any in-silico analysis beforehand. In two different laboratory workflows, inoculation of the virus in cell culture (in-vitro) and live animals (in-vivo), more novel mutations arise that not only improve the viruses infectivity but create novel mutations not seen before in nature and specific to the lab-based environment to which they’ve been exposed (1).

A very dangerous directed evolutionary workflow in live animals (in-vivo) using potential pandemic pathogens was first published in 2012 (21) and has created controversy since day one; continuing through three different US Presidential administrations to the present. Regulatory failures and a global network outside transparent jurisdictions makes it naive to discount the real hazard this type of engineering on potential pandemic pathogens creates (22).

Adaptation to the host in cell culture (23) and also circulation and passage in-vivo (24) are techniques known to all virology laboratories. The SARS-CoV2 genome does show evidence of adaptation and passage in both human cells (25) and humanized mice susceptible to infection by human viruses through an engineered ACE2 receptor (26).

No evidence of formaldehyde induced mutation, a common but archaic virus mutation technique for vaccine development, nor broad in-silico recoding or codon optimization, common bioengineering techniques, were found upon further analysis of nucleotide mutation ratios (data not shown).

Assembly

A novel coronavirus can be assembled by synthetic parts and leave no trace as spelled out by famous Tar Heel coronavirologist Ralph Baric in an influential publication describing the creation of synthetic viruses using ‘seam-less’ technology (27).

What Does this Mean?

With the origin of the SARS-CoV2 epidemic unknown and efforts to intelligently prevent reoccurrences of similar events stalled due to uncertainty and obfuscation, i.e. the WHO report, it is paramount any evidence for an origin is considered soberly and without a political agenda.

Unfortunately, this has not been so easy with conflicts of interest by powerful gatekeepers orchestrating the suppression of scientific discussion on origins. Influential opinions are still being peddled that say a natural origin is supported evidence though nothing beyond the mere existence of the SARS and MERS outbreaks is presented.

Demonstrated herein is a basic reverse genomic analysis and a workflow with historical precedence in every step involved in engineering a novel bat-related SARS-like virus. From an investigative genomics angle, this article addresses two of the main inaccuracies put forth by a small circle of connected and conflicted scientists (The Zoonati) who continue to market the idea there is evidence for a natural origin and none for an engineering workflow .

SARS-CoV2 is Not Consistent with Expectations from Evolutionary Theory.

i.) The divergent evolutionary paths of the two pieces of the virus,

ii.) high mutational drift in the spike protein while

iii.) maintaining species and receptor specificity, plus

iv.) the ability to improve DNA identity prediction in-silico

are all features of the genome inconsistent with expectations from natural evolutionary theory.

SARS-CoV2 is Consistent with an Engineered Origin

i.) 97.3% of the genetic diversity of the virus backbone is explained by high DNA Sequence Identity to a reverse engineered assembly (SynCoV1).

ii.) The vast majority (79%) best matches an in-silico consensus of three viruses (BB3C), having more identity than any of the three viruses alone.

iii.) In addition, the novel mutations and high level of mutation in the spike protein when compared to natural conserved sequences

are all features of the genome consistent with a chimeric backbone and random library screening of spike protein for ‘gain-of-function’ via directed evolution, an engineered virus workflow.

Conclusion

It is inaccurate to say that SARS-CoV2 could not be engineered. Furthermore, based solely on this genomic data set, which includes genomes of 40 bat coronavirus relatives close to both the SARS-CoV1 and SARS-CoV2 genomes, the most likely explanation for genetic diversity in this novel pandemic virus is an engineered workflow like the one proposed herein and not natural evolution.

Acknowledgements

@lab_leak for stimulating conversation. @ayjchan for motivation, @Daoyu15 for education, and #DRASTIC for inspiration.

@quay_dr for noticing some shitty alignments.

Sherlock G.Nomes gratefully acknowledges the following Authors from the Originating laboratories responsible for obtaining the specimens and the Submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, on which this research is based.

>hCoV-19/bat/Yunnan/RmYN02/2019|EPI_ISL_412977|2019–06–25 Originating lab: Shandong First Medical University & Shandong Academy of Medical Sciences

Submitting lab: Institute of Microbiology, Chinese Academy of Sciences

Authors: Weifeng Shi, Tao Hu, Hong Zhou, Juan Li, Xing Chen, Alice Catherine Hughes, Yuhai Bi

>hCoV-19/bat/China/RpYN06/2020|EPI_ISL_1699446|2020–05–25 Originating lab: Shandong First Medical University & Shandong Academy of Medical Sciences

Submitting lab: Shandong First Medical University & Shangdong Academy of Medical Sciences

Authors: Weifeng Shi, Edward C. Holmes, Alice Catherine Hughes, Hong Zhou,Jinghai Ji, , Xing Chen, Yuhai Bi, Juan Li, Tao Hu, Yanhua Chen

Conflicts of Interest

Sherlock G.Nomes is currently employed in the biotechnology industry.

Dedicated to the Victims and the Survivors.

Epilogue

RatG13-Designed to Confuse

This scientific publication (28) from Zhejiang and Jiaxing Universities in China, sum up the issues with this genome best:

“The relative proportion of synonymous substitutions between human SARS-CoV-2 and its possible animal origin (RaTG13) is much higher than that between other human coronaviruses and their potential animal sources.”

This means it is not consistent with expectations from evolutionary theory, just like we saw in SARS-Cov2. In the same paragraph the author gives a simple reason to exclude it from further DNA Sequence Identity analysis:

“Therefore, the underlying mechanisms of such potential mutations between SARS-CoV-2 and RaTG13 require further investigation in the future.”

Which means there exists no known biochemical explanation (‘the underlying mechanisms’) for the mutational pattern seen in RaTG13. The inclusion of this data-set confuses investigators looking at mutational patterns in the backbone using DNA Sequence Identity and phylogenetic analysis.

This sample is no longer available to reanalysis by anyone. This includes the Wuhan Institue of Virology, who originally discovered it in the Mojiang mine bat cave and in whose once public database this genome was found. This sample has been used up. It’s gone forever!

The Myth of the Pangolin Coronavirus.

There is not enough evidence to conclude that a pangolin has ever been a natural reservoir of SARS-like coronavirus. Conclusions about DNA Sequence Identity in the spike protein should not be made from the lone published genome (sequenced & published 7 times as separate genomes). This single sample from one pangolin source, with extremely poor sequence quality (29) could have been contaminated by one interaction with a handler or in the testing lab (30). Further analysis of the genomic raw-data shows serious contamination with both human and synthetic DNA (31).

References

(1) Might SARS-CoV-2 Have Arisen via Serial Passage through an Animal Host or Cell Culture?: A potential explanation for much of the novel coronavirus’ distinctive genome. Sirotkin, K., & Sirotkin, D. (2020). BioEssays : news and reviews in molecular, cellular and developmental biology, 42(10), e2000091. https://doi.org/10.1002/bies.202000091

(2) Genomic characterization and infectivity of a novel SARS-like coronavirus in Chinese bats. Hu, D., Zhu, C., Ai, L., He, T., Wang, Y., Ye, F., Yang, L., Ding, C., Zhu, X., Lv, R., Zhu, J., Hassan, B., Feng, Y., Tan, W., & Wang, C. (2018). Emerging microbes & infections, 7(1), 154. https://doi.org/10.1038/s41426-018-0155-5

(3) A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein. Hong Zhou, Xing Chen, Tao Hu, Juan Li, Hao Song, Yanran Liu, Peihan Wang, Di Liu, Jing Yang, Edward C. Holmes, Alice C. Hughes, Yuhai Bi, Weifeng Shi,
Current Biology, Volume 30, Issue 11, 2020, Pages 2196–2203.e3,
ISSN 0960–9822,
https://doi.org/10.1016/j.cub.2020.05.023.

(4) Identification of novel bat coronaviruses sheds light on the evolutionary origins of SARS-CoV-2 and related viruses. Zhou, Hong Ji, Jingkai Chen, Xing Bi, Yuhai Li, JuanHu, TaoSong, Hao Chen, Yanhua
Cui, Mingxue Zhang, Yanyan Hughes, Alice C. Holmes, Edward C. Shi, Weifeng bioRxiv 2021.03.08.434390 2021/01/01
http://biorxiv.org/content/early/2021/03/08/2021.03.08.434390.abstract

(5) Synthetic recombinant bat SARS-like coronavirus is infectious in cultured cells and in mice. Michelle M. Becker, Rachel L. Graham, Eric F. Donaldson, Barry Rockx, Amy C. Sims, Timothy Sheahan, Raymond J. Pickles, Davide Corti, Robert E. Johnston, Ralph S. Baric, Mark R. Denison Proceedings of the National Academy of Sciences Dec 2008, 105 (50) 19944–19949; https://doi.org/10.1073/pnas.0808116105

(6) Identification of Coronaviral Conserved Sequences and Application to Viral Genome Amplification. Bridgen, Anne, Tobler, Kurt, Ackermann, Mathias, Laude, Hubert, Vautherot, Jean-François
Book Section Coronaviruses: Molecular Biology and Virus-Host Interactions
1993 Springer US Boston, MA 978–1–4615–2996–5 Bridgen1993
10.1007/978–1–4615–2996–5_13
https://doi.org/10.1007/978-1-4615-2996-5_13

(7) Exploring the natural origins of SARS-CoV-2 in the light of recombination. Spyros Lytras, Joseph Hughes, Darren Martin, Arné de Klerk, Rentia Lourens, Sergei L Kosakovsky Pond, Wei Xia, Xiaowei Jiang, David L Robertson bioRxiv 2021.01.22.427830; doi: https://doi.org/10.1101/2021.01.22.427830

(8) Analysis Tool Web Services from the EMBL-EBI. (2013) McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, Cowley AP, Lopez R Nucleic acids research 2013 Jul;41(Web Server issue):W597–600 doi:10.1093/nar/gkt376

(9) Jalview Version 2-a multiple sequence alignment editor and analysis workbench. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ (2009) Bioinformatics 25: 1189–1191. doi:10.1093/bioinformatics/btp033

(10) The SARS-CoV-2 spike protein alters barrier function in 2D static and 3D microfluidic in-vitro models of the human blood-brain barrier. Buzhdygan TP, DeOre BJ, Baldwin-Leclair A, Bullock TA, McGary HM, Khan JA, Razmpour R, Hale JF, Galie PA, Potula R, Andrews AM, Ramirez SH. Neurobiol Dis. 2020 Dec;146:105131. doi:10.1016/j.nbd.2020.105131 Epub 2020 Oct 11. PMID: 33053430; PMCID: PMC7547916.

https://www.sciencedirect.com/science/article/pii/S096999612030406X

(11) An silico comparison of SARS-CoV-2 spike protein-ACE2 binding affinities across species and implications for virus origin.Piplani, S., Singh, P.K., Winkler, D.A. et al. I Sci Rep 11, 13063 (2021). https://doi.org/10.1038/s41598-021-92388-5

(12) A Potential Therapeutic Target for COVID-19.Wu, C., Zheng, M., Yang, Y., Gu, X., Yang, K., Li, M., Liu, Y., Zhang, Q., Zhang, P., Wang, Y., Wang, Q., Xu, Y., Zhou, Y., Zhang, Y., Chen, L., & Li, H. (2020). Furin: iScience, 23(10). https://doi.org/10.1016/j.isci.2020.101642

(13) The evolutionary history of ACE2 usage within the coronavirus subgenus Sarbecovirus. Wells, H.L1,2*; Letko, M3,4; Lasso, G5 ; Ssebide, B6 ; Nziza, J6 ; Byarugaba, D.K7,8; Navarrete-Macias1 2 , I; Liang, E1 ; Cranfield, M9,10; Han, B.A11 ; Tingley, M.W12 ; Diuk-Wasser, M2 ; Goldstein, T9 ; Johnson, C.K9 3 ; Mazet, J9 ; Chandran, K5 ; Munster, V.J3 ; Gilardi, K6,9 ; Anthony, S.J1,2,13 4 *https://www.biorxiv.org/content/10.1101/2020.07.07.190546v1.full.pdf

(14) A case for the ancient origin of coronaviruses. Wertheim, J. O., Chu, D. K., Peiris, J. S., Kosakovsky Pond, S. L., & Poon, L. L. (2013). Journal of virology, 87(12), 7039–7045. https://doi.org/10.1128/JVI.03273-12

(15) Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Ge, XY., Li, JL., Yang, XL. et al. Nature 503, 535–538 (2013). https://doi.org/10.1038/nature12711

(16) Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. Hu, B., Zeng, L. P., Yang, X. L., Ge, X. Y., Zhang, W., Li, B., Xie, J. Z., Shen, X. R., Zhang, Y. Z., Wang, N., Luo, D. S., Zheng, X. S., Wang, M. N., Daszak, P., Wang, L. F., Cui, J., & Shi, Z. L. (2017). PLoS pathogens, 13(11), e1006698. https://doi.org/10.1371/journal.ppat.1006698

(17) Expression and purification of recombinant SARS coronavirus spike protein. Yu H, Yang Y, Zhang W, Xie YH, Qin J, Wang Y, Zheng HB, Zhao GP, Yang S, Jiang WH. Sheng Wu Hua Xue Yu Sheng Wu Wu Li Xue Bao (Shanghai). 2003 Aug;35(8):774–8. Chinese. PMID: 12897976. https://pubmed.ncbi.nlm.nih.gov/12897976/

(18) High-Throughput Screening in Protein Engineering: Recent Advances and Future Perspectives. Wójcik, M., Telzerow, A., Quax, W. J., & Boersma, Y. L. (2015).International journal of molecular sciences, 16(10), 24918–24945. https://doi.org/10.3390/ijms161024918

(19) Protein Engineering Protocols. Methods in Molecular Biology™. Denault M., Pelletier J.N. (2007) Protein Library Design and Screening. In: Arndt K.M., Müller K.M. (eds) , vol 352. Humana Press. https://doi.org/10.1385/1-59745-187-8:127

(20) A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Menachery, V., Yount, B., Debbink, K. et al. Nat Med 21, 1508–1513 (2015). https://doi.org/10.1038/nm.3985

(21) Airborne transmission of influenza A/H5N1 virus between ferrets. Herfst, S., Schrauwen, E. J., Linster, M., Chutinimitkul, S., de Wit, E., Munster, V. J., Sorrell, E. M., Bestebroer, T. M., Burke, D. F., Smith, D. J., Rimmelzwaan, G. F., Osterhaus, A. D., & Fouchier, R. A. (2012). Science (New York, N.Y.), 336(6088), 1534–1541. https://doi.org/10.1126/science.1213362

(22) Gain-of-Function Research: Summary of the Second Symposium, Board on Life Sciences; Division on Earth and Life Studies; Board on Health Sciences Policy; Health and Medicine Division; Committee on Science, Technology, and Law; Policy and Global Affairs; National Academies of Sciences, Engineering, and Medicine. March 10–11, 2016. Washington (DC): National Academies Press (US); 2016 Jun 20. 3, Issues for U.S. Policy. Available from: https://www.ncbi.nlm.nih.gov/books/NBK373314/

(23) Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003. Vega, V.B., Ruan, Y., Liu, J. et al. BMC Infect Dis 4, 32 (2004). https://doi.org/10.1186/1471-2334-4-32

(24) A Mouse-Adapted SARS-CoV-2 Induces Acute Lung Injury and Mortality in Standard Laboratory Mice. Leist SR, Dinnon KH 3rd, Schäfer A, Tse LV, Okuda K, Hou YJ, West A, Edwards CE, Sanders W, Fritch EJ, Gully KL, Scobey T, Brown AJ, Sheahan TP, Moorman NJ, Boucher RC, Gralinski LE, Montgomery SA, Baric RS. Cell. 2020 Nov 12;183(4):1070–1085.e12. Epub 2020 Sep 23. PMID: 33031744; PMCID: PMC7510428. doi: 10.1016/j.cell.2020.09.050.

https://pubmed.ncbi.nlm.nih.gov/33031744/

(25) Human airway cells prevent SARS-CoV-2 multibasic cleavage site cell culture adaptation. Lamers, M., Mykytyn, A., Breugem, T., Wang, Y., et al. (2021). doi: https://doi.org/10.1101/2021.01.22.427802, https://www.biorxiv.org/content/10.1101/2021.01.22.427802v1

(26) Benchmarking evolutionary tinkering underlying human–viral molecular mimicry shows multiple host pulmonary–arterial peptides mimicked by SARS-CoV-2. Venkatakrishnan, A.J., Kayal, N., Anand, P. et al. Cell Death Discov. 6, 96 (2020). https://doi.org/10.1038/s41420-020-00321-y

(27) Synthetic Viral Genomics. In: Working Papers for Synthetic Genomics: Risks and Benefits for Science and Society, pp. 35–81. Baric RS. 2006. Garfinkel MS, Endy D, Epstein GL, Friedman RM, editors. 2007.

https://www.jcvi.org/sites/default/files/assets/projects/synthetic-genomics-options-for-governance/Baric-Synthetic-Viral-Genomics.pdf

(28) Comparative Genomic Analyses Reveal a Specific Mutation Pattern Between Human Coronavirus SARS-CoV-2 and Bat-CoV RaTG13. Lv Longxian, Li Gaolei, Chen Jinhui, Liang Xinle, Li Yudong Frontiers in Microbiology 11 2020 3013 10.3389/fmicb.2020.584717 ISSN=1664–302X
https://www.frontiersin.org/article/10.3389/fmicb.2020.584717

(29) The SARS-CoV-2-like virus found in captive pangolins from Guangdong should be better sequenced. Hassanin, A. (2020). BioRxiv. https://www.biorxiv.org/content/10.1101/2020.0

(30) Single source of pangolin CoVs with a near identical Spike RBD to SARS-CoV-2. Chan, A. Y., & Zhan, H. S. (2020). BioRxiv. https://doi.org/10.1101/2020.07.07.184374

(31) The Pan-SL-CoV/GD sequences may be from contamination. Zhang, D. (2020). Zenodo. http://doi.org/10.5281/zenodo.4395025 )

(32) How EcoHealth Alliance and the Wuhan Institute of Virology Collaborated on a Dangerous Bat Coronavirus Project. #DRASTIC (September 20th 2021).

(33) Coronaviruses with a SARS-CoV-2-like receptor-binding domain allowing ACE2-mediated entry into human cells isolated from bats of Indochinese peninsula. Some Frenchies and friends in Laos. Institute Pasteur, Institute Pasteur Lao & University of Lao (2021)

(34) A pneumonia outbreak associated with a new coronavirus of probable bat origin. Zhou, P., Yang, XL., Wang, XG. et al. Nature 579, 270–273 (2020). https://doi.org/10.1038/s41586-020-2012-7

(35) Systematic exploration of guide-tree topology effects for small protein alignments. Sievers, F., Hughes, G.M. & Higgins, D.G. BMC Bioinformatics 15, 338 (2014). https://doi.org/10.1186/1471-2105-15-338

--

--

Sherlock G.Nomes

Sherlock Genomes has spent over 25 years analyzing genetic data sets for linkage analysis, genomic sequencing/automation, bioinformatics and bioengineering.