Basic Virology and Epidemiology in 15 Minutes

Tools to help you sift through the noise of COVID-19 or any virus outbreak

Dennis Saw
Dialogue & Discourse
17 min readMar 3, 2020

--

Photo: CDC/Dr. Fred Murphy — This media comes from the Centers for Disease Control and Prevention’s Public Health Image Library (PHIL), with identification number #4814. Public Domain, https://commons.wikimedia.org/w/index.php?curid=822112

We are in the middle of an outbreak of a new type of virus, the Corona Virus 2019 or CoV-19, and the very noisy internet is pummelling us with data and commentary. How do you sift through what are patently statements of the uninformed and what is likely to have a kernel of truth?

I studied virology at university and I thought I’d write a basic introduction to viruses, their survival strategy, how the immune system deals with an infection and finally the statistics of a virus outbreak — epidemiology.

My hope is that some basic understanding will better equip you to sift through the noise. This article will discuss in order:

  1. The molecular machinery that replicates genetic information in cells;
  2. Viruses are parasites: they do not have the molecular machinery and need to hijack host cells to replicate;
  3. How a virus gets into a host cell;
  4. How newly made viruses shed and infect more cells;
  5. How the body’s immune system responds;
  6. Epidemiology — the statistical models of a viral disease: key parameters for discussion.

The molecular machinery that replicates genetic information in cells

It is the biological imperative of all living things, from viruses to blue whales, that their genes get passed on to the next generation. Understanding this is the key to understanding viruses, because they represent the purest form of this idea.

Genes are encoded in DNA¹ (and technically RNA, but we’ll come to that later). Hence to pass genes onto the next generation the organism first has to replicate its genome (the genome is all the genes that encode the organism).

The Molecular Machinery of a Cell

In a cell, whether plant or animal, the molecular mechanism to replicate DNA exists in the nucleus. The nucleus of a cell is a special compartment where DNA is kept separate from the rest of the cell; a structural analogy would be the yolk of an egg. In bacteria, which is a more primitive form of life, there is no nucleus and no separation.

The DNA in your cells exist in extremely long strands. Along the DNA strands there are sections where sequences of the 4 “letters” in DNA encode genes. Genes are essentially stretches of code that tell the cell what amino-acids³ from a selection (of 21 types for animals) to string together in sequence in order to make particular proteins. These proteins have all sorts of functions, from structural (the scaffolding that gives nerve cells their shape for instance), to pointers (what sticks to DNA and tells another protein to start a process from that point), to enzymes (what causes the precursors of DNA to attach to a growing copy), etc.

However, proteins are not built directly on DNA. DNA is unwieldy: it’s long and in animals and plants it is double stranded, coiled around proteins called histones and coiled again. In any case, the molecular machinery that builds proteins is not in the nucleus of the cell but outside it in the cytoplasm (in our egg analogy it’s in the egg white). So the code to build a protein has to somehow be transmitted from the DNA in the nucleus to the cytoplasm.

To do this, genes encoded in DNA first have to be copied into another type of nucleic acid molecule: RNA² (the technical term is “transcribed”). Why RNA? The activation energy for certain biochemical processes are lower (that is it requires less energy), and RNA exists as a single strand (which may fold onto itself) rather than a double helix. A better form to carry information to the protein factory in the cytoplasm compared to the information storage function of DNA.

RNA which encode proteins are called messenger RNA (mRNA). The RNA leaves the nucleus through pores in the nucleus wall and once in the cytoplasm, the gene now encoded in mRNA is used to build proteins as described above.

Interestingly, some of these proteins are required back in the nucleus, for instance proteins that replicate DNA (enzymes called DNA polymerase) and indeed RNA polymerases that are required to transcribe DNA to mRNA in the first place! These proteins, once made in the cytoplasm, are shuttled back into the nucleus.

Viruses do not have the molecular machinery and need to hijack host cells to replicate

Viruses are organisms that are so small, they do not have all or most of the machinery to replicate their own genome or build the protective protein case which protects their genome. Hence they have to hijack the DNA/RNA replication mechanism and protein factories in the cells of other organisms to survive.

In fact, the genome of a virus only contains a handful of genes which together ensure that those genes themselves get into their unwitting host to be replicated and transmitted to the next generation. They are, in a sense, genomes with the minimum number of genes that can ensure their own transmission to the next generation.⁵

For instance the human parainfluenza virus’s genome is only around 17,000 bases long (a base is one of the 4 letters of the genetic code) and codes for just a handful of proteins.⁴ Compare this with the E. Coli bacteria, which genome is over 5 million bases, or the human genome which is 3.2 billion bases (haploid) coding for 19–20,000 proteins.⁶

The genome of a virus can be encoded in either DNA or RNA. The naked DNA or RNA (collectively called “nucleic acid”) is protected by a protein coat made up of regular sub-units, much like we can build protective structures using identical bricks. This protective structure is called the capsid and in general is the geometric shape you see in an electron micrograph of a virus. They can be icosahedral, helical or more complex.⁷

Naked nucleic acid is fragile. The capsid allows the virus to survive for a longer period of time outside the host cell. For instance under the right conditions, norovirus (sometimes known as the winter vomiting bug) can survive for years.

How does a virus get into a host cell?

The outer wall of cells is composed of a phospho-lipid bilayer.⁸ Lipids are molecules that are insoluble in water. In the case of phospholipid, the lipid part of the molecule are two elongated fatty acid chains, connected to a more spherical phosphate group which is soluble in water (hydrophilic). Imagine a sewing pin where the spherical end is the water-soluble phosphate group and the long pin itself (better still, imagine 2 pins in parallel connected to the sphere) the insoluble fatty acid.

The environment inside and outside the cell is water-based, hence to form a barrier, two phospholipid sheets have to line up together where the fatty acid side of both layers face each other and the phosphate groups face the outside — the inverse of the soapy film in a bubble.

All cells interact with the external world though complex proteins that stick out of the cell wall. Some of these are trans-membrane proteins which span the phospholipid bilayer. The proteins that stick out can be sensors, called “receptors” because they are waiting to connect or “receive” particular molecules before causing a biochemical cascade inside the cell. One example of a cascade is “endocytosis” where on receptor contact with a molecule of the right shape, the cell membrane invaginates and drags the connected molecule into the cell, budding off into a spherical body surrounding the molecule inside the cell (called an “endosome”), while the cell wall self-heals.

These membrane proteins sticking out of the cell wall can be cell-type specific, such as the protein CD4, which exist only on certain types of cells in the blood, or stage specific, for instance, only immature red-blood cells (called red blood cell progenitors) have the protein ⍺5β1-integrin on its surface, distinguishing it from mature red blood cells.

The capsid of viruses contain attachment proteins which recognize specific receptors on the cell wall. Once the attachment protein has found its receptor, some viruses proceed to create pores in the cell wall and inject their genetic payload directly — the poliovirus uses such a strategy⁹.

Other viruses cause the host cell to take them in directly through endocytosis. Once the endosome is inside the cell, other processes kick in: for instance the cell may pump the endosome with protons (hydrogen atoms without their electrons) to make its internal environment acidic¹⁰. Normally, this is done to cause the attached molecule to detach from the receptor and also to destroy any potential activity the foreign molecule might have.

Viruses hijack this process: the acidic environment changes the shape of their capsid which then destroys the endosome or cause it to create pores in the endosomal wall, injecting their genome into the cell. Some viruses such as the parvovirus survive the cell’s attempt at destroying the contents of endosome and end up in the cytoplasm, travelling and entering the nucleus directly¹¹.

A Virus Needs to Connect to Cell Receptors to Enter a Cell

It is important to realise that host cells have not developed receptors nor processes such as endocytosis for viruses. These receptors and processes have normal cellular functions — viruses have evolved to take advantage of these systems and processes to enter and hijack their host.

In fact, viruses have evolved to infect only particular subset of cells or cells in a particular stage of life. Hence a virus will not enter a cell unless it is the right type of cell, and it uses the cells’ receptor system to determine if it is the right host¹². For example the HIV virus homes into the CD4 protein displayed on T cells while the corona virus is thought to use the ACE2 which exists on many cells but is abundant on lung alveolar epithelial cells and enterocytes (cells of the small intestines)¹⁴.

Some viruses are specific to cells of a certain stage in its life cycle: for instance parvovirus B19 infects the precursor of red blood cells because mature red blood cells have no nucleus, and it does this by attaching only to the ⍺5β1-integrin receptor which disappears once a red blood cell matures as mentioned above.¹³

This is also why viruses evolved with specific hosts, say a monkey, do not readily infect another species, such as human beings (however, see below about mutations).

Enveloped Viruses

Some viruses have their nucleocapsid (the genome plus capsid) encased in a lipid bilayer called a “coat”. The lipid bilayer is actually taken from the host’s external cell wall when the virus was replicated.

This envelope makes it easier for the virus to enter its host by simply (after correct receptor binding) fusing its lipid bilayer directly with the host cell — similar to how two bubbles in contact can fuse into a single one.

Influenza¹⁵ and corona¹⁶ viruses are examples of enveloped viruses.

Because lipid bilayers are susceptible to disruption by surfactants, we can destroy the viral envelope using detergents such as soap, rendering the virus inert (since the virus proteins which connect to the host receptors are also part of the coat).

This is why non-enveloped viruses are more resistant to deactivation by washing with soap or detergents. Alcohols or specific chemicals are used in laboratories to deactivate them.

In addition, enveloped viruses are also susceptible to drying out. Hence while influenza (and likely the corona) virus become inactivated within days outside the host¹⁷, in contrast a non-enveloped virus like norovirus can survive for years¹⁸.

Another thing to note is the difference between “inactivation” and “presence”. So while influenza virus on a door knob will lose its infectivity after a several days, it can likely still be detected for a longer period using tools to look for its genetic signature.

How newly made viruses shed and infect more cells

Once inside the host cell, the virus usurps the DNA/RNA replication machinery and protein factory described above to make many thousands of copies of itself inside the cell. This process takes hours.

At some point, viral shedding occurs. The newly made viruses leave the host cell through one of several methods. Which method is used depends on the type of virus.¹⁹

Enveloped viruses egress the cell through budding. The nucleocapsid is transported to just under the cell wall where it creates a bud eventually pinching off so that it is surrounded by the phospholipid bilayer of the cell.

Non-enveloped viruses egress either by the host cell breaking up (bursting open) or through a process called exocytosis, which is the opposite of endocytosis we encountered above.

Regardless of the process, at this stage the host cell is killed. If not by the virus then eventually by the body’s own immune system, or the cell’s own self-destruct mechanism.

The newly released viruses then go on to infect surrounding cells, causing an exponential growth in the number of viruses over a few cycles.

How the immune system responds

It is important to understand 2 consequences of this speed and exponential effect: firstly, an infected animal or person is infectious as more and more viruses are being produced and released. Secondly, the immune system in an animal or person requires time to recognize a viral invasion, and time also to build up a defence.

The Incubation Period

This means that while the first generations of infected host cells in a body are silently churning out viruses and the host organism is infectious, the animal or person’s immune system hasn’t yet kicked in, hence has no symptoms. This time before the first symptoms appear is called the “incubation period”.²⁰

It’s worth repeating that during the incubation period, infected persons appear asymptomatic but are infectious.

Influenza has a 1–4 day incubation period (2 day average).²¹ The 2002 SARS outbreak had a 1–14 day incubation period (with the average at 4–6 days).²² For COVID-19 (Coronavirus Disease 2019) this has been observed to be between 0 to 24 days (but note that the median is 3 days) by Chinese physicians and researchers published in a 6 February 2020 paper.²³

Before the virus causes diseases-specific symptoms, such as pustules for chicken pox, there is a period of “prodromal symptoms”. These symptoms are caused by the immune system starting to battle with the infection and include general malaise, aches, chills, etc.²⁴

The Immune Response

The immune response to a viral infection is complex and consists of a whole host of specialised cells.

Once a virus is inside a cell, it is invisible to the immune system. To remove this blind-spot, all cells actually break internal proteins up and display those fragments on special trans-membrane molecules, called Major Histocompatitbility Complex Class I (MHC I), outside its cell wall. The immune system ignores fragments from the body’s own proteins. Hence any foreign internal proteins can be detected.

T cells in the circulatory system have special receptors that recognize viral fragments. When they detect such fragments, they release special factors towards the infected cell that kill it.

Some viruses have evolved to disrupt the MHC system. In the arms race of our immune system, there is another type of cell called the natural killer (NK) cell that detects if cells are displaying unusually low numbers of MHC molecules and kills those!

In addition, if a cell is infected by a virus, it releases small proteins called interferon which, in addition to interfering with viral reproduction, signals that the cell is infected to the immune system and surrounding cells. Surrounding cells notified by interferon increase the number of MHC I molecules on display increasing the chance that the infected cell will be recognised by T cells!

At this point other cytotoxic cells are drawn to the infection. And they in turn also release more signalling molecules resulting eventually in the prodrome symptoms an infected person feels — elevated temperature, aches, etc.

Finally, the immune system will develop antibodies to viral proteins. This build-up takes many days. Antibodies circulating in the blood inactivate viruses by binding to them, preventing them from entering cells and signalling other immune-system cells to ingest them.³²

You can imagine that a patient with a compromised immune system is more susceptible to a viral infection and in fact will have a higher chance of succumbing to the infection.

Note that antibiotics are chemicals that kill bacteria and will have no effect on a viral infection, but may be useful for secondary infections that follow a viral infection such as bacterial pneumonia.

Epidemiology — the statistical models of a viral disease: key parameters for discussion

How a virus persists (and hence is successful at passing its genes forward in time) or spreads in a population is dependent on many factors. Most of these are really common-sensical and although we can model the epidemiology of any disease mathematically, we can have a reasonable conversation around the parameters that contribute to how a virus (and its associated disease) spreads without mathematical formulae.

Here are some key parameters:

Mode of Transmission

Does the virus have a reservoir, for instance dogs for the rabies virus? Or does the virus need a “vector” to be transmitted, such as arboviruses that spread through sandflies or ticks or the zika virus through mosquitoes? Or does the virus spread from person to person like CoV-19 or Influenza A?

Reservoirs allow the virus to persist outside the sphere of human population immunity. Some viruses jump the species barrier when mutations and re-arrangements allow them to recognise the receptors in human cells and if their genetic (and sometimes non-structural protein that is packaged with the genome) payload is able to co-opt the host’s machinery to successfully make copies of itself. SARS (2002–2003), swine flu and the current COVID-19 outbreaks are good examples. HIV and ebola are also thought to have come from animal reservoirs.

The Basic Reproduction Number R0

The basic reproduction number (R-zero) is a measure of how many new infected people one infected person can cause. This is more complex than just pure “infectivity” as it depends on many factors such as the environment (enveloped viruses would be less effective in dry and hot environments), social behaviour, whether a vector is involved, etc.²⁵ Hence when you see R0, consider also when the outbreak occurred in human history, and what was measured.

If R0 < 1, an infection won’t go far (for instance a deadly virus that kills its host before effective spreading, or the act of isolating infected individuals). If R0 > 1 a disease will spread with the rates turning exponential as it climbs towards 2 and beyond.

In a fascinating study on the value of R0 in influenza pandemics and novel virus infections, it was found that R0 for the 1918 pandemic was 1.80, the 1957 pandemic was 1.65, the 2009 H1N1 pandemic was 1.46–1.48 while seasonal epidemics around 1.28.²⁶

R0 for COVID-19 in China in the early phase was estimated to be 3.30–5.47 by Chinese scientists²⁷, but a study published at the end of January 2020 in Eurosurveillance estimated R0 in the early outbreak to be between 1.4–3.8²⁸. The R0 estimated for the Diamond Princess cruise ship was 2.28 at the early stage.²⁹

In fact, the authors of the Eurosurveillance paper indicated that the early pattern of disease spreading of COVID-19 is similar to SARS in 2002.

Dispersion Parameter, k

The R0 number can be thought of as a mishmash of a number of variables. In order to tease out superspreading events, such as hospital infections or large gatherings, scientists have modelled a dispersion parameter, k. The lower k is, the higher the superspreading.³⁰

In the previous study published in Eurosurveillance the authors’ simulations showed (see figure 3) that the early stage of COVID-19 when both R0 and k are compared to other outbreaks, show a likely similarity to the 1918 influenza outbreak compared to SARS or MERS.²⁸ The authors, however, acknowledge that the data used for modelling is scarce and k is likely imprecise.

Susceptible Individuals, S

The proportion of people in a population susceptible to an infection either because they have not been exposed to the virus before so have no immunity, or have not been vaccinated, is denoted by S.³¹

While the basic reproduction number is R0, the effective reproduction number is R0 x S. Remember that if R0 x S < 1, the disease will stop spreading. Therefore by reducing S sufficiently we can stop a viral disease from spreading.

How do you reduce the number of susceptible individuals? You vaccinate a proportion of the population until R0 x S < 1. Hence vaccines play a key role in controlling or eradicating viral disease.

For fast spreading epidemics like influenza seasons, as more people recover and develop resistance, S gets smaller and the disease eventually stops spreading.

Mortality Rates

The rate at which a viral disease kills its host is one of the factors that determine its effectiveness at spreading through a population. Mortality rates are measured either on a per 100,000 population per year basis, or in the case of specific outbreaks, the case specific mortality rate. Mortality rates can also be stratified according to age groups and/or gender.

It is still too early to get a read on the mortality rate of COVID-19 because it also requires an accurate read on the total number of people infected. Also, the true mortality rate of a viral infection in a population can only be ascertained after a large enough population has been infected. (Imagine if in the beginning of an epidemic, out of the first 2 infected persons, one dies. Is the mortality rate 50%?) This is why you see such wildly varying numbers being bandied about the internet.

Hence although from the COVID-19 dashboard by the Johns Hopkins CSSE³³ you can divide the total deaths by the total confirmed cases to get a figure (currently around 3.4%), consider that the total number of cases is most likely to be an underestimate.

In conclusion

There is actually more to viral biology and epidemiology than we have discussed above. However, this article is long enough and I hope there is enough to help you get a better understanding of this COVID-19 outbreak and indeed to give you the tools to understand other outbreaks in the future.

Further Reading and Notes

¹ DNA (deoxyribonucleic acid): https://en.wikipedia.org/wiki/DNA

² RNA (ribonucleic acid): https://en.wikipedia.org/wiki/RNA

³ Amino acids: https://en.wikipedia.org/wiki/Amino_acid

⁴ Human parainfluenza virus 4a complete genome https://www.ncbi.nlm.nih.gov/nuccore/MN369047.1. If you like data, here is a collection of most of the genomes of viruses that have been sequenced: https://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=10239 or https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/

⁵ There is in fact a class of virus called the dependoparvovirus which are so small that they contain only a subset of the genes necessary to ensure their replication and require that the host is already infected by another virus! These are parasites of parasites! https://en.wikipedia.org/wiki/Dependoparvovirus

⁶ Human genome: https://en.wikipedia.org/wiki/Human_genome

⁷ Virus capsid structures: https://en.wikipedia.org/wiki/Capsid

⁸ Lipid bilayer: https://en.wikipedia.org/wiki/Lipid_bilayer

⁹ Poliovirus entry into a host cell: https://en.wikipedia.org/wiki/Poliovirus#Replication_cycle

¹⁰ How endosomes work: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4596472/

¹¹ Parvoviruses are able to enter the nucleus directly through pores in the nuclear wall because they are some of the small viruses known. See also: https://en.wikipedia.org/wiki/Parvovirus#Replication_as_disease_vector

¹² The host’s DNA and protein replication machinery is dynamic, with different functions turned on at different parts of a cell’s life cycle. Since viruses have evolved to be tightly linked to the cell’s machinery, it only works (or works efficiently) with those same cell types which are in the same phase of the cell’s life cycle.

¹³ Parvovirus B19: https://en.wikipedia.org/wiki/Parvovirus_B19

¹⁴ Hamming I, et. al., 2004. Tissue distribution of ACE2 protein, the functional receptor for SARS coronavirus. A first step in understanding SARS pathogenesis. J Pathol 2004; 203: 631–637. DOI: 10.1002/path.1570

¹⁵ Influenza virus: https://en.wikipedia.org/wiki/Influenza

¹⁶ Corona virus: https://en.wikipedia.org/wiki/Severe_acute_respiratory_syndrome-related_coronavirus

¹⁷ Influenza virus outside the body: https://www.nhs.uk/common-health-questions/infections/how-long-do-bacteria-and-viruses-live-outside-the-body/

¹⁸ Norovirus persistence: https://en.wikipedia.org/wiki/Norovirus#Persistence

¹⁹ Virus shedding/egress: https://en.wikipedia.org/wiki/Viral_shedding

²⁰ Incubation period: https://en.wikipedia.org/wiki/Incubation_period

²¹ Incubation periods of acute respiratory viral infections: a systematic review: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4327893/

²² SARS incubation period: https://en.wikipedia.org/wiki/Severe_acute_respiratory_syndrome#Signs_and_symptoms

²³ Clinical characteristics of 2019 novel coronavirus infection in China: https://www.medrxiv.org/content/10.1101/2020.02.06.20020974v1

²⁴ See for instance: https://academic.oup.com/cid/article/39/12/1810/323131

²⁵ Basic reproduction number, R0: https://en.wikipedia.org/wiki/Basic_reproduction_number

²⁶ Biggerstaff et al, 2014. Estimates of the reproduction number for seasonal, pandemic, and zoonotic influenza: a systematic review of the literature. BMC Infectious Diseases https://bmcinfectdis.biomedcentral.com/articles/10.1186/1471-2334-14-480

²⁷ Zhao, S., et al. 2020. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak https://www.biorxiv.org/content/10.1101/2020.01.23.916395v1.full.pdf

²⁸ Riou, J. & Althaus, CL., 2020. Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. Eurosurveillance; 25(4). https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.4.2000058

²⁹ Zhang, S., et al. 2020. Estimation of the reproductive number of Novel Coronavirus (COVID-19) and the probable outbreak size on the Diamond Princess cruise ship: A data-driven analysis. Int. J. of Infectious Diseases. https://www.ijidonline.com/article/S1201-9712(20)30091-6/fulltext

³⁰ Lloyd-Smith, JO et al., 2005. Superspreading and the effect of individual variation on disease emergence. Nature 438, 355–359 https://www.nature.com/articles/nature04153

³¹ Susceptible individuals, S: https://en.wikipedia.org/wiki/Susceptible_individual

³² A very good summary of the immune response to viruses: https://www.immunology.org/public-information/bitesized-immunology/pathogens-and-disease/immune-responses-viruses

³³ Johns Hopkins CSSE COVID-19 Dashboard: https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

--

--

Dennis Saw
Dialogue & Discourse

Scientist, ex-high tech investment banker, brokerage co-founder & biotech CEO. Currently at the intersection of biotech/pharma & data science/machine learning.