How the source of SARS was found

And what it tells us about the origins of Covid, and the lab leak theory.

Peter Miller
Microbial Instincts
14 min readMar 20, 2023

--

The big debate today is over where Covid came from. I’ve written a lot about that already, so instead of getting right back into that debate, I want to step back to simpler times. Everyone agrees that the 2003 SARS virus spilled over naturally. But, how do we know where it came from?

There are some similarities between how the two viruses started. Both first appeared in November, maybe there’s some meaningful seasonality there.

SARS was first found in Guangdong province, a thousand miles away from Yunnan, where the closest bat viruses were later found. Covid was also found similarly far away from Yunnan:

Image from Miles Fujimoto via Michael Worobey

The closest bat virus to SARS is called WIV16. The two are 96% similar.

For Covid, the closest bat virus we’ve found is 96.8% similar.

That is to say, we’ve found a bat virus that’s closer to Covid than the closest bat virus is to SARS. Why do people say that we know the source of SARS but not of Covid?

It comes down to a few things.

We quickly found the intermediate host animal, for SARS

Let’s start with the most obvious difference.

Viruses can jump directly from bats to humans. That happens with the Nipah virus, for instance. More often, they jump from bats to humans via some intermediate species that help bridge the gap between the two different hosts.

A few months after the SARS outbreak started, we found SARS in civets, and that virus was 99.8% similar to the human SARS virus.

Masked palm civet, image from Wikipedia

But, we didn’t find it right away, it took months of searching.

The SARS virus was also later found in Ferret Badgers and Raccoon Dogs (the species suspected of starting the Covid pandemic).

The family tree for Covid looks similar to SARS. Here’s Covid:

Covid virus family tree from Wikipedia

And here’s SARS:

SARS virus family tree, from Wikipedia

In both cases, there are 3 bat viruses that are about 94 to 97% similar.

In the case of SARS, the big difference is that we also found a virus that was 99.8% similar from a civet.

In the case of Covid, we never found a live animal that tested positive. That’s probably because the live animals at the market were killed without testing. It could also be that the people selling illegal animals removed those animals before the authorities arrived to clean the market. Scientists did some detective work to show that raccoon dogs are a likely host. Then the covid positive swabs from the market showed raccoon dog DNA.

In some sense, the evidence for Covid is worse than for SARS, because no live animals were tested. In another sense, it’s better, because the data comes from the same market where the virus was first seen, not a different market months later.

Soon after Covid started, China cracked down on the wildlife trade, shut down farms, then locked down their whole country to stop human infections.

If the same measures had been taken after SARS started, and people had stopped selling civets, it’s unlikely they would have found the host animal. It took months of searching, while all the markets were still open.

For many other human viruses, we’ve never figured out how they jumped into the human species.

For SARS, it took a few months to find the animal. In the case of MERS, we found the intermediate host (camels) after 9 months. But we still don’t know how some mild coronaviruses, like HKU-1, jumped into humans.

It took decades to figure out the host species for Ebola (turns out it's also bats). And we still don’t know how Ebola gets from bats into humans.

It took decades to figure out how HIV came from chimpanzees, during which time there were multiple conspiracy theories about HIV being made in a lab.

Viruses frequently recombine, so we look for multiple ancestors

That closest virus to SARS, WIV16, is still not the direct ancestor of SARS. There are 1,200 mutations separating the two.

These viruses pick up about 30 mutations per year. Current strains of Covid are about 90 mutations different than from the start of the pandemic.

WIV16 has 40 years of evolutionary distance between it and SARS.

WIV16 is over 90% similar to SARS across much of its genome, but it dips down to 80% at the end:

Figure from Hu et al, 2017, colors adjusted for readability

Another bat virus, Rf4092, matches SARS more closely at the end.

These viruses don’t have a simple family tree, they don’t just have parents and children. They replicate and they mutate, but they also recombine with each other.

When an animal is infected with two viruses at once, the RNA polymerase can accidentally combine segments of RNA from each of the two.

If you combined WIV16 with Rf4092, you would have a virus that was much closer to SARS. It would still be years apart, in terms of small mutations, but it wouldn’t have that loss of similarity at the end of the genome.

What scientists often look for is to find a mosaic that might explain the natural origin — if we find several bat viruses in the same cave and see a way they could recombine to make SARS, then we say it’s likely that SARS could have come from that cave.

It still took 15 years from the start of the SARS epidemic to find a suitable cave, and the recombination of 2 viruses in that cave would still be a few percent different from SARS.

We still don’t know for sure if that particular cave is the place that SARS came from. We don’t know for sure if and when those 2 viruses combined. And we don’t know how the virus got from the bats in that cave to the civets that went on to infect humans.

We will never know the exact sequence of events, for SARS or for Covid.

Can we explain the origin of covid, as a mosaic of other viruses?

Sort of.

First off, the closest bat virus (BANAL-20–52) is already more than 90% similar to Covid, across the full genome:

Figure from Temmam et al, 2022

It’s already a better match to Covid than WIV16 is to SARS. There aren’t any places where it drops off to lower similarity, like WIV16 does.

If you split up the genome, you can say that certain segments of Covid are a bit closer to different bat viruses, so maybe it recombined with these at some point in its history:

Figure from Temmam et al, 2022

But that’s different than the cave argument for SARS, because these viruses similar to Covid were sampled across Asia, not all within the same cave.

The truth is, none of these viruses are close ancestors. Most of them are decades apart from Covid:

Figure from Lytras et al, 2022

To find something closer, we’d have to do much more sampling of bats, or maybe of other species like wild raccoon dogs or wild civets.

We will probably still find something closer.

We’ve only been looking for the origin of Covid for 3 years, not 15 years.

Could a lab also combine viruses to make a new one?

This brings us back to the lab leak theory.

Scientists have combined viruses in labs. But they tend to do so in predictable ways that are different than what you see in nature.

In nature, the combinations are random and accidental.

In the lab, scientists like controlled experiments, where they change one variable and see what happens.

One thing they might try is to put the spike protein of a new virus into the backbone of an old virus that they’re familiar with, that they have tools to work with. Since they already know how that backbone works in human cells, they can use that to learn what’s different about the new spike.

Making a virus in the lab is complicated. They need to pick a backbone virus to start with. Then, they make a reverse genetics system to work with that virus (several months of work, at least). Then, they can make alterations to the genetic code, and swap out pieces with other viruses.

Scientists have done this before. For instance, scientists in North Carolina have built these reverse genetics systems for SARS and tried modifying them.

But they don’t just take viruses at random and tinker with them randomly. You don’t learn much with an experiment like that.

The backbone of Covid looks very different from SARS. To create Covid, scientists would have to discover some unknown virus, recognize that it was important and dangerous, build a reverse genetics system for it, and start modifying it.

The DEFUSE proposal was about modifying SARS, not Covid

Now that you’ve seen the family tree of SARS, it might be a little bit easier to understand what scientists were proposing to do in the “DEFUSE proposal”, which lab leak theorists frequently talk about. The 2018 proposal requested funding to modify bat viruses, for instance by putting a furin cleavage site into one.

The proposal was never funded. The suggested work was also planned to be done in North Carolina, not in Wuhan.

Some people claim that the work might have secretly moved to Wuhan, instead. I don’t know what secretly happened. Maybe some similar work did happen in Wuhan.

But the work that DEFUSE proposed still couldn’t make Covid.

If you read the text carefully, you find that they weren’t talking about manipulating random viruses. They were talking about working with the backbone of known viruses.

Here’s what the DEFUSE proposal said they wanted to do:

They talk about using two known viruses (WIV1 and SHC014) as backbones, and then making changes to them.

Look again at that SARS virus family tree:

They wanted to work on those viruses because they are similar to SARS.

Scientists were worried about a repeat of the SARS outbreak, and they wanted to know if a close ancestor of SARS could recombine or mutate in such a way as to become dangerous.

I’m not going to argue that the DEFUSE grant was a good idea, or that gain of function research is a good idea. Their experiment does sound risky. And it also looks like the scientists were trying to evade the ban on gain of function research by not working with SARS but instead working with two viruses very similar to SARS.

As risky as that might have been, it’s still not what created Covid. Covid is definitely not made from one of those backbones. There’s also no known virus that could have been used as the backbone.

The actual pandemic we got surprised scientists, because it came from a different branch of the bat virus family tree.

Covid is only 79% similar to SARS. If a scientist in Wuhan had collected it, or some other virus very close to it, they likely would have had no idea that it was important.

And, in fact, that’s exactly what happened. They found RATG-13, a virus 96% similar to Covid, back in 2013. They partially sequenced it and then ignored the sample for years. It wasn’t close enough to SARS to seem relevant.

The future of the lab leak conspiracy theory

Looking for more viruses in bats may eventually make it clearer where Covid came from. But it’s hard to get an exact match. After 15 years, we never found something more than 96% similar to SARS. We’ve already found a bat virus 96.8% similar to Covid. Finding closer and closer bat viruses will not convince any lab leak believers.

Chinese scientists finally shared DNA evidence from the market. We now think we know which animal started the covid pandemic, and even the shop within the market which sold that animal and started the pandemic. It looks like this also hasn’t convinced many lab leak believers.

I’m not sure we’re going to get much closer than that. It might still be possible to get a bit more evidence — if the Chinese authorities would interview that shop owner, they could find out where he bought the animals, which farms they came from, and so on.

But it’s not obvious that the evidence will get better than what we have now. The animals on that farm probably aren’t still sick, 3 years later. That particular farm probably isn’t even active, anymore. This kind of detective work would have to be done quickly, at the beginning of the outbreak.

Conspiracy theories can’t be killed, they just mutate into new versions. For instance, it wasn’t enough to do a trial showing that hydroxychloroquine doesn’t work against covid. People said it only works if you to treat the patients earlier. When that failed, they said you have to add azithromycin. When that failed, they said you have to add zinc. When that failed, they still didn’t give up, they just said that the trials were all rigged and all doctors are in on the conspiracy.

The lab leak theory is destined to go down the same path. It seemed plausible at first. Lots of reasonable people, myself included, thought that it could be true. Today, it seems highly unlikely to be true.

Over time, various discoveries started chipping away at it, and the theory mutated. At first, people thought Covid might be lab created because it bound very well to human ACE2. But, then we found pangolin viruses that also bind very well to human ACE2. So, lab leak theorists said that maybe the lab had combined bat and pangolin viruses. Then, scientists found bat viruses in Laos that bind just as well to human ACE2. So, lab leak theorists moved on to say that maybe the Wuhan lab previously found these Laotian viruses and started to modify them.

The same process went on for a few years. Lab leak got less and less likely, but the theories changed.

Lab leak theorists kept asking was, “where is the infected animal?”

They kept saying that “80,000 animals were tested and none were positive”, without noticing that the 80,000 were mostly livestock that couldn’t be the host of covid and that only 15 wild raccoon dogs had been tested.

Now, we have DNA evidence from the market showing a likely animal host that could have started the pandemic.

And the response has been:

Which is, of course, false. Lab leak theorists disputed that over and over:

So, after disputing that there were raccoon dogs at the market, here and many other times, you’d think these people would update her views in response to some new evidence, maybe admit they were wrong about that particular point?

Of course not.

Now they deny ever saying that:

Alina is also questioning whether the evidence is any good:

Also, she’s obfuscating the history of SARS:

As you’ve learned, it took months to find SARS in animals, it was not “immediate”.

I spend a lot of time writing about conspiracy theories, analyzing evidence to show what’s true and what’s false. But, honestly, you don’t need to learn all that much science to figure out who’s lying. You can just watch the conspiracy theorists and the way they update their stories to avoid ever having to admit any new facts.

Faithful lab leak supporters will go on to argue that this new data doesn’t matter. Even if there were infected animals at the market, they will say that someone from the Wuhan lab must have brought the lab virus in and infected those animals.

On Twitter, they’ve also started blaming some zoo in Wuhan:

No Billy, you didn’t, those are raccoons in the picture. Kinda similar but not the same

Failing that, they’ll just say that the data is fake. Like all the hydroxychloroquine studies and global warming.

In some sense, it’s not realistic to expect that everyone will stop believing a conspiracy theory. 10% of Americans believe the moon landing never happened and 14% are unsure. That’s not because NASA didn’t provide enough evidence.

The interesting question here is perhaps where lab leak will settle, in the future. Right now, more than half of Americans believe the lab leak theory. That’s unlikely to ever go down to 10%.

Perhaps it will stay above 50%, no matter how much evidence disproves it. It may stay popular because the lab leak theory is easier to understand, more entertaining, and gets more attention in the news.

Perhaps it will become tribally divided, where most Republicans believe the lab leak theory and most Democrats do not. The same thing happened with vaccines and masks, which Democrats decided to be for and Republicans decided to be against.

In the case of vaccine conspiracies, that had immediate real-world consequences — somewhere around 300,000 unvaccinated Americans died of Covid after the vaccines were widely available.

The lab leak theory has no immediate consequences, in the same way. No one is going to die tomorrow for being a lab leak theorist.

In the long run, though, this could be one of the most devastating outcomes of the pandemic.

If we want to prevent the next pandemic, we need to know where it’s likely to come from and we want to notice it as early as possible. We want to figure out which animals are too dangerous to farm or eat. We want to monitor abnormal pneumonia cases as soon as they show up and respond quickly, rather than waiting several months and then shutting the world down. It might help to keep sampling natural viruses to know what’s out there. And we want scientists in different countries to work together to share their knowledge about virology.

Instead of that, it looks like we’re going to end up with a world where Chinese scientists can’t work with scientists in other countries because of the political tension.

We’re going to live in a world where half the people have been convinced that masks are useless, vaccines are deadly, and that viruses come from scientists, not from nature.

I can only guess how bad the next pandemic will be.

--

--