Part I: Introduction to Predicting Vaccine Candidate

Introduction

Coronavirus disease 2019 (COVID-19) is a kind of viral pneumonia with an unusual outbreak in Wuhan, China, in December 2019, which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). With a reproduction factor (R0 — R naught) of 2.2, Covid-19 has already spread across boundaries and infected more than half a million people in the world at the time of this writing.

COVID-19 is not the first severe respiratory disease outbreak caused by the coronavirus. According to the World Health Organization (WHO), viral diseases continue to emerge and represent a serious issue to public health. In the last twenty years, several viral epidemics such as the severe acute respiratory syndrome coronavirus (SARS-CoV) in 2002 to 2003, and H1N1 influenza in 2009, have been recorded. Most recently, the Middle East respiratory syndrome coronavirus (MERS-CoV) was first identified in Saudi Arabia in 2012.

General Problem

Immune Response for Viral Infection

At a high-level, when the virus enters the cells, some of its protein fragments (peptides) called antigen will be presented to the immune system. These antigenic peptides first need to able to bind with major histocompatibility complexes MHC molecule (which is called human leukocyte antigen (HLA) in humans) and then recognized by virus-specific cytotoxic T lymphocytes cells (T-Cells). Thus the immune system can recognise and destroy virus infected cells. Chart below shows the MHC-Peptide binding presentation to T-Cell, T-Cell receptors (TCR) needs to be able to accept binding to produce an immune reaction that kills infected cells.

Basically, the immune system recognizes some proteins which compose the virus, where the recognition process is facilitated by the process of binding virus peptide to the HLA in humans. Such binding makes the virus “visible” to the immune system.

Key Challenges

The HLA-Peptide binding is usually characterized by very high selectivity achieved through the interaction of the HLA with several critical (anchoring) residues of a peptide. Thus, despite the fact that bio-degradation of antigenic proteins can theoretically produce a very large diversity of peptides, the actual number of them selectively bound to a specific HLA allele is very limited.

This makes it non-trivial and a very important goal to identify those specific fragments of protein sequences that are capable of selective interaction with specific HLA alleles. It is believed that the ability of predicting HLA binding is also an essential step of ‘in silico’ vaccine development.

If we identify possible virus peptides which bind HLA then we can use them as potential candidates for peptide vaccines to train the human immune system to battle the virus.

Problem is that there are often hundreds of virus peptides which do not bind to HLA.

Classification of those peptides by binding score and binding quality would speed up the selection potential candidates for vaccine development. Identifying possible antigen candidates through AI drastically reduces the number of bindings to test for a viable vaccine solution.

In this series of publications we will demonstrate use of Google AI platform to build generic pipelines to generate candidates for vaccine development.

We will explore various stages involved in research process leading to vaccine design.

Starting with exploring public data sets for HLA and peptides.

Which HLA we might want to consider

Previous research shows HLA-DR0301, HLA-Cw1502 and HLA-A*0201 alleles are related to the protection from SARS infection. These researches might be valuable clues for the prevention, treatment, and mechanism of COVID-19. For AI/ML models, one can slide learning/prediction by HLA Allele or consider it as a categorical feature to include in learning.

Reference to look for highly possible HLA Alleles:

  1. Predicting HLA-A2 binding peptides
  2. HLA Class consideration for coronavirus

What Peptides should we consider

For correct immune response to viral infected cells, it is important to understand the binding affinity between peptide and MHC Allele. Coronavirus S protein has been reported as a significant determinant of virus entry into host cells. The envelope spike glycoprotein binds to its cellular receptor, ACE2 SARS-CoV-2. Thus we might want to consider binding affinity of S protein peptides. Research has shown that for a set of Allele most likely to bind with 9-mer or 10-mer peptides. For our example, we will filter for 9-mer and 10-met peptides for ML models.

Reference to look for HLA-Peptide binding affinity:

  1. New:Cov-19 — Candidate Target for immune system
  2. Peptide — HLA Binding CNN model

Part II: Analyze Public Epitopes Data with BigQuery

We will provide a map of Google services which can facilitate various stages involved in vaccine research.

and we will walk you through sample labs to explore data and develop ML Models for Peptide Vaccine Prediction.

--

--