Designing a Universal Coronavirus Vaccine
Part I: Analyzing the Spike Protein of Human Coronaviruses
Given there have been three coronavirus spillover events in the last 20 years (SARS, MERS, COVID-19) and the ease of international travel in the modern world, it’s almost certain there will be future coronavirus spillover events in the future. Modern globalization coupled with the fact there may be thousands of coronaviruses in bats alone point to an interconnected world with an extensive reservoir of coronaviruses for potential zoonotic transmission. I’m certainly not the first person to sound this alarm (and hopefully not the last), but I want to amplify the message. As bad as COVID-19 was, it could have been much worse. MERS-CoV, for instance, has a much higher fatality rate than SARS-CoV-2, so just imagine a more transmissible version of MERS-CoV. So, the near certainty of another coronavirus spillover with a severity “to be determined” points to the urgent need for a universal coronavirus vaccine that can protect against severe disease and death from this future virus. Now, an increased supply of N-95 masks and increased public acceptance of social distancing and wearing said masks are essential takeaways from the current pandemic. However, the best tool against a viral pandemic is a vaccine. With that in mind, I want to work on an initial mockup of a universal coronavirus vaccine. That’s a significant undertaking, so I’ll break the project down into several segments. In part I, I’ll review the spike (S) protein from human coronaviruses since it’s the protein responsible for initiating infection in humans and what the current COVID-19 vaccines are based upon. The S protein is a good target to start with, but later I’ll begin to consider adding additional immune targets to the vaccine as well. So, let’s start a review of the S protein from human coronaviruses (HCoVs).
For more on universal coronaviruses vaccines, see here and here.
There are currently nine coronaviruses known to infect humans (and possibly more that haven’t been detected yet). The HCoVs all have a spike protein, but not every HCoV spike protein uses the same host receptor in humans. For instance, SARS-CoV-2 uses the angiotensin-converting enzyme 2 (ACE2) receptor, but MERS-CoV uses dipeptidyl peptidase 4 (DPP4). Therefore, I want to review the spike protein sequence of the various HCoVs along with the host receptor they use to infect humans to see what insights may be gained (Table 1). From the perspective of designing a vaccine, it would be nice if all the HCoVs had very similar sequences that lined up nicely. As we’ll see, nature immediately throws cold water on that hope, but there are still insights to be gained.
For more on coronavirus host cell receptors, see here. For more on CCoV-Haiti and PDCoV, see here.
Next, let’s briefly look at the S protein of SARS-CoV-2 and use it as a model for the HCoV spike proteins. The S protein is present as a homotrimer extending from the viral membrane and has two major functional domains, S1 and S2, as seen in Figure 1. The trimeric S proteins are heavily glycosylated, which plays a role in the final protein structure, host protease interaction, and avoidance of host immune response. The S1 subunit consists of residues 14–685 and is responsible for ACE2 receptor binding. Upon receptor binding, there are two host-mediated cleavage events involving the spike protein. The first cleavage event occurs at the S1/S2 subunit boundary, separating the S1 and S2 (residues 686–1273) subunits. The first cleavage event leads to a conformational change in the S2 subunit, which, while separated, is still noncovalently associated with the S1 subunit. The second host-mediated cleavage event leads to subsequent viral entry into the host cell.
For an excellent review on the molecular virology of coronaviruses, see here.
With that brief review of the SARS-CoV-2 spike protein let’s see how all of the spike proteins from the eight HCoVs stack up. How different are they? Are there any similarities that can be used when designing a universal HCoV vaccine? To do this, I’ll use a technique called multiple sequence alignment (MSA), which will compare the primary amino acid sequence for the eight HCoVs. The algorithm will align the spike protein sequences the best it can, giving some insight into which areas of the spike protein are conserved among HCoVs and which are more variable.
For all of the MSAs, I used Unipro UGENE version 41. “Unipro UGENE: a unified bioinformatics toolkit” Okonechnikov; Golosova; Fursov; the UGENE team Bioinformatics 2012 28: 1166–1167
The example above (Figure 2) is a multiple sequence alignment of the eight HCoV spike proteins. At the bottom, you can see the visual depicting how conserved regions of the spike protein are compared across the eight HCoVs. As we can see, the S1 subunit of the spike protein is not very conserved, meaning it’s pretty variable from one coronavirus to another. And this makes sense given the receptor binding domain is in the S1 region and different coronaviruses bind to different receptors. Just think about all the variants from the original SARS-CoV-2 virus. The spike protein, specifically the receptor binding domain, was a mutation “hot spot” for all the subsequent variants of the original SARS-CoV-2. All this is to say; the S1 subunit is a challenging (but not impossible) region to develop a universal coronavirus vaccine given its high rate of variability and change.
For more on multiple sequence alignments, see here.
In the examples above (Figure 3), we see what a more conserved region of the spike protein looks like. Most, if not all, HCoVs have the same protein sequence or, at the very least similar amino acids. And this is important because the more conserved a protein region is, the more likely an unknown future coronavirus that spills over into the human population will have that same (or very similar) sequence. That’s good news for vaccine design.
So, what does all of this mean? It means this is a successful beginning to designing a universal HCoV vaccine candidate. Since the S1 subunit is so variable, I’ll need to look a little closer at it next time, including some of the SARS-CoV-2 variants that became successful, to see what that analysis can tell us. But this is a promising start on the S2 subunit of the S protein, and the vaccine would ideally use the consensus sequence when applicable, which is great since there is a good amount of agreement in the S2 subunit. When the sequence alignments don’t agree, insert the amino acid residue(s) from SARS-CoV-2, which has proven its ability to create pandemics — so it’s a good fallback option when things aren’t as straightforward. While it’s good news the S2 subunit has some conserved regions, when looking back at the spike protein diagram, the S2 subunit is not on the “top” portion of the protein. So there might be some challenges with the immune system being able to recognize it — the S2 subunit might be “too buried” to be noticed. Alas, no one said this would be easy. I still consider this a good start but will need to develop some solutions for the variable S1 subunit, along with considering additional surface proteins from HCoVs to include in the finished universal vaccine candidate.