Tissue specific expression and genetic regulation of SARS-CoV-2 receptors ACE2 and TMPRSS2

Yuan He
11 min readMar 26, 2020


Yuan He, Marios Arvanitis, Princy Parsana, Ashton Omdahl, Jessica Bonnie, Zeyu Chen, Christopher D. Brown, Alexis Battle

COVID-19, the disease caused by the virus SARS-CoV-2, has become a global pandemic as announced by the World Health Organization (WHO Director-General’s opening remarks at the media briefing on COVID-19–11 March 2020). Recent studies have demonstrated that SARS-CoV-2 uses the gene Angiotensin I Converting Enzyme 2 (ACE2) as the key receptor to invade cells. SARS-CoV-2 shares this characteristic with a previously identified SARS coronavirus, SARS-CoV, identified in 2003, and the affinity of SARS-CoV-2 binding to ACE2 is higher than SARS-CoV (Yan R. et al. Science, 2020). It has been also reported that Transmembrane Serine Protease 2 (TMPRSS2) can work as a protease co-receptor for SARS-CoV-2 to enter cells, and TMPRSS2 expression can also enhance ACE2-mediated SARS-CoV-2 cell invasion (Hoffmann M. et al. Cell, 2020). The Genotype-Tissue Expression (GTEx) project provides genotype information and gene expression levels across 49 human tissues from 838 donors, allowing us to examine the expression patterns of ACE2 and TMPRSS2, both across tissues as well as across individuals, and genetic regulation of the two genes revealed by expression quantitative trait loci (eQTLs) (GTEx consortium, 2019, Biorxiv).

Expression of ACE2 and TMPRSS2 across human tissues

Several studies have measured ACE2 and TMPRSS2 expression in diverse tissues. ACE2 expression has been reported in the lung, particularly among epithelial cells, as well as small intestine, kidney, testis, and liver (Ziegler C. et al. 2020, Cell; Seow J. et al. 2020, bioRxiv; Fan C. et al, medRxiv; Chai X. et al. 2020, bioRxiv; Chen J. et al. 2020, Preprints).

Using the GTEx dataset, we confirm that ACE2 is widely expressed at moderate levels across multiple tissues (Figure 1), including the respiratory system (lung), gastrointestinal (GI) system (colon, small intestine, etc), circulatory system (heart, arteries, etc.), urinary system (kidney), and reproductive system (testis, ovary). Indeed, reports have shown that some COVID-19 patients exhibit multi-organ complications including, among others, gastrointestinal symptoms, acute cardiac injury, and acute kidney injury. (Huang C, et al. 2020, Lancet; Shi Q. et al. 2020, medRxiv). We observe that TMPRSS2 is also broadly expressed in multiple tissues including those in the GI system, lung, and kidney (Figure 1). The high expression of ACE2 and TMPRSS2 in the small intestine could potentially explain the viral presence in stool samples from infected individuals that has been observed in recent studies (Hindson J. 2020, Nat Rev Gastroenterology).

Figure 1. Gene expression for ACE2 and TMPRSS2 across tissues. TPM stands for Transcripts Per Kilobase Million. (data source:https://gtexportal.org/home/gene. Setting: Log scale)

Age and gender dependence of ACE2 expression and TMPRSS2 expression

Higher mortality and critical hospitalization among elderly along with worse outcomes in males led us to explore how ACE2 and TMPRSS2 expression varies between different age groups and sexes in 49 human tissues from the GTEx project (Zhou F. et al. Lancet, 2020; Wu Z & McGoogan J, JAMA, 2020; Jin J. et al. medRxiv, 2020).

To account for confounding in RNA-seq data while preserving our signal of interest (age or sex), we inferred surrogate variables (SVs) using surrogate variable analysis (SVA) (Leek J et al. 2012, Bioinformatics). Using a linear regression model, we tested the association between gene expression(log10(TPM+1)) and age, controlling for the effect of age-specific SVs (as defined in the code attached, details can be found in https://github.com/heyuan7676/COVID-19/). Age was encoded in publicly available GTEx data as integers from 1 to 6, corresponding to each of the age groups: 20–29 yr, 30–39 yr, 40–49 yr, 50–59 yr, 60–69 yr, and 70–79 yr. Similarly, to test the association between gene expression(log10(TPM+1)) and sex we used a linear regression model while controlling for effects of sex-specific SVs. Sex was encoded as integers 1 and 2, corresponding to male and female. The analysis was run for each of the tissues with median TPM > 1. For each gene, the false discovery rate (FDR) was computed using Benjamini-Hochberg (Benjamini Y. & Hochberg Y., Journal of the Royal Statistical Society. Series B (Methodological), 1995). Table 1 and Table 2 shows the associations at FDR < 0.05. The full list of results are attached in the Supplementary Table 1 and Supplementary Table 2.

We found that ACE2 expression level increases with age in the lung at FDR < 0.05. In tissues of the GI system including transverse colon and minor salivary gland, we observe that both ACE2 and TMPRSS2 decrease with age at FDR < 0.05. Recent publications report that infections in children are generally milder compared to adults with less severe lung involvement (Lu et al. 2020, NEJM). A recent preprint reports that in three pediatric cases of COVID-19, persistent presence of viral RNA was found in stools after SARA-CoV-2 clearance in the respiratory tract (Xing Y. et al. 2020, medRxiv). Also a recent paper reported eight children persistently tested positive of the virus RNA on rectal swabs after being tested negative on nasopharyngeal swabs (Xu Y. et al. 2020, Nature medicine).

To assess the sensitivity of this analysis to different methods for handling confounding factors, we also performed association tests for age and sex controlling for known confounders of gene expression that included five genotype PCs, death circumstances, RIN number, total ischemic time, and exonic rate (GTEx consortium. 2019, Biorxiv). Notably, since SVA is supposed to capture unknown confounders in the data that may not be fully represented by the known confounders used in this analysis, controlling for SVs results in more significant associations. This analysis is largely concordant with the SVA approach, confirming that our results are not highly sensitive to methodology (Supplementary Table 3, 4; Supplementary Figure 1, 2).

Figure 2. Boxplots of gene expression across samples in different age groups controlling for confounders learned from SVA. The number in the parentheses in x-axis labels indicate the number of data points in each bin.

cis-eQTLs for ACE2 and TMPRSS2 across human tissues

Using the GTEX dataset (GTEx consortium. 2019, Biorxiv), we are able to further explore how common genetic variation could affect ACE2 and TMPRSS2 expression. We observe that most of the cis-eQTL signals in the tissues belong to the nervous system, including brain tissues, tibial nerve, and pituitary. There are also cis-eQTL signals in adipose tissues, testis, and tibial arteries. However, ACE2 has no significant cis-eQTLs in tissues with high ACE2 expression, including lung, heart, kidney, and small intestine (Figure 3). This indicates that common genetic variation does not significantly impact ACE2 expression in the most clearly disease-relevant tissues. We note that ACE2 is located on chromosome X, and future work will include separate analysis of cis-eQTLs among males and females separately. On the other hand, we observe that TMPRSS2 has strong cis-eQTL signals in the lung, prostate, and testis.

Figure 3: Each panel shows the -log10(gene level cis-eQTL p-value) across the GTEx tissues. The asterisk on top of a bar represents that the gene has a q-value < 0.05 in that tissue, which indicates the gene has at least one significant cis-eQTL in that tissue (GTEx consortium. 2019, Biorxiv).

Rare variants and loss of function intolerance in ACE2 and TMPRSS2

Additionally, we accessed the gnomAD v2.1.1 database (Karczewski et al. 2019, bioRxiv) to evaluate the number of rare variants and variation tolerance status of ACE2 and TMPRSS2. The results in Figure 4 show that there are 233 missense variants in ACE2 and the gene is predicted to be highly loss of function variant intolerant with a pLI of 1, whereas the same was not true for TMPRSS2. Together with the lack of significant cis-eqtls for ace2, this indicates that there is little effect of genetic variation on ACE2 function or expression, which may be due in part to purifying selection.

Figure 4. Rare variation in ACE2 and TMPRSS2

Cellular deconvolution of GTEx Lung tissue reveals cell-type specific expression patterns of ACE2 and TMPRSS2

Using a previously described protocol (Donovan MKR et al. 2020, Nature Communications) we performed cell type deconvolution of GTEx v8 lung tissue with the goal of assessing how ACE2 and TMPRSS2 expression are affected by the different lung cell types. Specifically, we downloaded the gene expression and annotated single-cell clusters of mouse lung RNA-Seq generated by sorting based on Fluorescence-activated cell sorting (FACS) for specific populations from the Tabula Muris database (Tabula Muris consortium, 2018, Nature) (https://figshare.com/articles/Robject_files_for_tissues_processed_by_Seurat/5821263/3). We then calculated mean expression for signature genes (Supplementary Data 1A from Donovan et al. Nature Communications 2020;11:955) and used that as input to CIBERSORT. CIBERSORT (Newman et al. 2015. Nature Methods) was run with default parameters using the corresponding signature genes for lung. We subsequently excluded from analysis all cell subsets that were present in proportion <0.05 in over 5% of our samples. This procedure allowed us to evaluate the following cell types:

  • Epithelial cell of the lung
  • Endothelial cell of the lung
  • Stromal cell
  • Monocyte
  • Myeloid cell
  • Ciliated columnar cell of tracheobronchial tree
  • T cell

For each of the above cell types, we assessed for an association between the TPM of ACE2 and TMPRSS2 in log10 scale and the proportion of that cell subset in each of the GTEx Lung samples by fitting a linear model controlling for the following covariates: age, sex, cohort, death circumstances, RIN number, total ischemic time, exonic rate, and five genotype principal components. We subsequently plotted the findings that were significant at an FDR of 0.05.

A summary of the cell type deconvolution results for ACE2 and TMPRSS2 is found on Table 3 and plots of the significant associations at an FDR of 0.05 are presented in Figures 5 and 6. We see that ACE2 expression is significantly associated with the proportion of lung epithelial cells. These results are consistent with previous single cell RNA-seq data from multiple studies (Zhao Y. et al. 2020, bioRxiv; Zou X. et al. 2020, Front. Med.) that show that ACE2 is overexpressed in type IIA epithelial pneumocytes. In addition, we show that TMPRSS2 expression is overexpressed in lung epithelial cells and significantly underexpressed in stromal cells.

Figure 5. Scatter plots of GTEx v8 lung ACE2 expression over cell type proportion
Figure 6. Scatter plots of GTEx v8 lung TMPRSS2 expression over cell type proportion


In summary, we analyzed the gene expression levels of ACE2 and TMPRSS2 across human tissues using data from the GTEx project, and showed informative associations between gene expression and age and sex in tissues including lung and transverse colon. We also evaluated the genetic regulation of ACE2 and TMPRSS2 expression across human tissues and showed that ACE2 is predicted to be intolerant to loss of function variation and similarly has little evidence of eQTLs in disease-relevant tissues. Via cellular decomposition of bulk RNA-seq data from the lung, we revealed an association between ACE2 and TMPRSS2 expression and cell subtype proportions.

Supplementary Figures

Supplementary Figure 1. Boxplots of gene expression across samples in different age groups. The expression of each gene is corrected for sex, five genotype PCs, death circumstances, RIN number, total ischemic time, and exonic rate. The number in the parentheses in x-axis labels indicate the number of data points in each bin.
Supplementary Figure 2. Compare association tests between using inferred SVs and known confounders. Each dot in the plots represent the test for one gene, in a tissue, for either age or sex. The significant hits represent the associations with FDR < 0.05 when controlling for SVs.

Supplementary Tables

* Bold indicates the significant associations at FDR < 0.05.

Code availability


Code to fit SVA:

### keep AGE when estimating SVs

mod = model.matrix(~AGE_GROUP,data=sample_in_the_tissue)

mod0 = model.matrix(~1, data=sample_in_the_tissue)

n.sv = num.sv(as.matrix(gene_tpm_in_the_tissue), mod, method = ‘be’)

sva_age = sva(as.matrix(gene_tpm_in_the_tissue),mod,mod0,n.sv=n.sv)

## age-specific SVs refer to sva_age$sv

### keep SEX when estimating SVs

mod = model.matrix(~SEX,data=sample_in_the_tissue)

mod0 = model.matrix(~1, data=sample_in_the_tissue)

n.sv = num.sv(as.matrix(gene_tpm_in_the_tissue), mod, method = ‘be’)

sva_sex = sva(as.matrix(gene_tpm_in_the_tissue),mod,mod0,n.sv=n.sv)

## sex-specific SVs refer to sva_sex$sv

Code to fit linear regression controlling for known confounders:

model=lm(geneEXP~PC1+PC2+PC3+PC4+PC5+AGE_GROUP+factor(DTHHRDY)+SMRIN+SMTSISCH+SMEXNCRT,data = exp_for_tiss.complete)


[1]: WHO Director-General’s opening remarks at the media briefing on COVID-19–11 March 2020.https://www.who.int/dg/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020

[2]: Yan, Renhong et al. (2020). Structural basis for the recognition of the SARS-CoV-2 by full-length human ACE2. Science. eabb2762. 10.1126/science.abb2762.

[3]: Hoffmann, Markus et al. (2020). SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell. 10.1016/j.cell.2020.02.052.

[4]: GTEx consortium (2019). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Biorxiv. 10.1101/787903.

[5]: Ziegler, Carly et al. (2020) SARS-CoV-2 Receptor ACE2 is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Enriched in Specific Cell Subsets Across Tissues. Cell. Available at SSRN: https://ssrn.com/abstract=3555145 or http://dx.doi.org/10.2139/ssrn.3555145

[6]: Seow J. et al. (2020). scRNA-seq reveals ACE2 and TMPRSS2 expression in TROP2+ Liver Progenitor Cells: Implications in COVID-19 associated Liver Dysfunction. bioRxiv 2020.03.23.002832; doi: https://doi.org/10.1101/2020.03.23.002832

[7]: Fan C. et al. (2020). ACE2 Expression in Kidney and Testis May Cause Kidney and Testis Damage After 2019-nCoV Infection. medRxiv 2020.02.12.20022418. doi:https://doi.org/10.1101/2020.02.12.20022418

[8]: Chai X. et al. (2020). Specific ACE2 Expression in Cholangiocytes May Cause Liver Damage After 2019-nCoV Infection. bioRxiv 931766. doi: 10.1101/2020.02.03.931766.

[9]: Chen, J. et al. (2020) Individual Variation of the SARS-CoV2 Receptor ACE2 Gene Expression and Regulation. Preprints, 2020030191

[10]: Huang, C. et al. (2020). Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.The Lancet. VOLUME 395, ISSUE 10223, P497–506. DOI:https://doi.org/10.1016/S0140-6736(20)30183-5

[11]: Shi Q. et al. (2020). Clinical characteristics of 101 non-surviving hospitalized patients with COVID-19: A single center, retrospective study. medRxiv 2020.03.04.20031039; doi:https://doi.org/10.1101/2020.03.04.20031039

[12]: Hindson, J. COVID-19: faecal–oral transmission?. Nat Rev Gastroenterol Hepatol (2020). https://doi.org/10.1038/s41575-020-0295-7

[13]: Zhou F. et al. (2020). Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. The Lancet. 10.1016/S0140–6736(20)30566–3.

[14]: Wu, Z. & McGoogan, J.. (2020). Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention. JAMA. 10.1001/jama.2020.2648.

[15]: Jin J. et al. (2020). Gender differences in patients with COVID-19: Focus on severity and mortality. medRxiv 2020.02.23.20026864; doi:https://doi.org/10.1101/2020.02.23.20026864

[16]: Leek, J. et al. (2012). The SVA package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics (Oxford, England). 28. 882–3. 10.1093/bioinformatics/bts034.

[17]: Leek J. et al (2019). sva: Surrogate Variable Analysis. R package version 3.30.1.

[18]: Benjamini, Y. & Hochberg, Y. (1995). Controlling The False Discovery Rate — A Practical And Powerful Approach To Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological). 57. 289–300. 10.2307/2346101.

[19]: Lu X. et al. (2020). SARS-CoV-2 Infection in Children. NEJM; doi: 10.1056/NEJMc2005073

[20]: Xing Y. et al. (2020). Prolonged presence of SARS-CoV-2 in feces of pediatric patients during the convalescent phase. medRxiv 2020.03.11.20033159; doi: https://doi.org/10.1101/2020.03.11.20033159

[21]: Xu, Y., et al. Characteristics of pediatric SARS-CoV-2 infection and potential evidence for persistent fecal viral shedding. Nat Med (2020). https://doi.org/10.1038/s41591-020-0817-4

[22]: Karczewski KJ et al. (2019). Variation across 141456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. bioRxiv 2019.531210

[23]: Donovan MKR et al. (2020). Cellular deconvolution of GTEx tissues powers discovery of disease and cell-type associated regulatory variants. Nat Comm 2020;11:955

[24]: Tabula Muris consortium, (2018). Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 2018; 562:367–372

[25]: Newman AM et al. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 2015;12:453–457

[26]: Zhao Y. et al. (2020). Single-cell RNA expression profiling of ACE2, the putative receptor of Wuhan 2019-nCov. bioRxiv 2020;01.26.919985

[27]: Zou X. et al. (2020). Single-cell RNA-seq data analysis on the receptor ACE2 expression reveals the potential risk of different human organs vulnerable to 2019-nCoV infection. Front. Med. 2020; doi: 10.1007/s11684–020–0754–0



Yuan He

Computational genomics graduate student in Battle Lab at Johns Hopkins