A dynamic list of single-cell genomics resources!

Vivek Das
Musing’s of a Data Scientist in Medicine
9 min readJul 13, 2019

This weeks post is in space of single-cell genomics. It is a part of my continuation of a series of stories I intend to publish in this space. The field is moving ahead in lightning speed and with massive data generation(already in place) and more to take place. To this end, I intend to create a repository for me and others who might find it beneficial at their end. This resource list will be under development and updates in the field as it progresses with additional materials. This document will also be a list of data sources as well (at least I intend to).

Having said that, I will currently aim to provide an overview of the repository structure as below(this a work in progress). Below are the topics, that I will be covering in this Medium post:

  1. Scope of single-cell genomics
  2. Current protocols, technologies & benchmarking being pursued in lieu of HCA
  3. Analytical space in single-cell genomics
  4. Problems scope of rule-based cell annotations and cluster assignments
  5. Dimension Reduction
  6. Downstream functional applications of single-cell data
  7. Current state of art single-cell analysis workflows.
  8. Scope for Machine Learning and Deep learning in single-cell genomics
  9. Public datasets repository.
  10. Data portals available for visualization.

In Progress:

Farther, I intend to expand the above list. This will also outline some potential problem scope that we have in the field, that should be a focus of concern for single-cell genomic. Thus it will be help us to understand the scopes and the interesting queries one can address in the field to initiate advancements in the field.

Note: Single cell genomics here includes single cell RNA-Seq, single nuclear RNASeq, single cell ATAC-Seq, etc.

  1. Scope of single-cell genomics

If one is to understand the scope of single-cell genomics one can search PubMed to see the number of publications over the years.

Search criteria {Group by: “single cell”} in PubMed (year selected 2010–2019)

Figure 1. Represents the publications of a single cell by year search from 2010

Note: 2019 is still not over & we expect more to be there by the end of the year.

The above can be misleading since the criteria is just based on single-cell that can also be taking into account a lot of other factors of single cell dissociation techniques. Hence I put another criterion below for search {Group by: “single cell sequencing”} in Pubmed search to get the below result.

Figure 2. Represents the publications of a single cell sequencing by year search from 2010 in PubMed

These are just data insights from Pubmed as to how the explosion of the field has happened over the years with advent of genomics and now more precisely with single cell sequening. Some other interesting insights of the field as a scope for potential exploration can be found in details below:

i) Chan Zuckerberg Science Initiative (website).

ii) Number of tools developed in the space of single-cell genomics that can be found in this webpage scRNA-tools

Figure 3. Represents the number of tools & publications in a single-cell RNA-Seq since 2017. Image sourced from https://www.scrna-tools.org/analysis

Any sequencing technologies need analytical tools that can give a way to make more discoveries. The above Figure 3, provides an idea of the number of tools already out there since Jan, 2017 and the category of the Publication status of those tools. This is already a first hand proof that the field is progressing in leaps and bounds with data analysis tool and there is a lot to be done in this space.

iii) Finally, one can see also some scopes from my previous posts about Human Cell Atlas that can be found here.

2. Current single cell protocols, technologies and benchmarking on it as a part of Human Cell Atlas

This point explores the various upstream technologies and protocols that are available in the single-cell genomics technology space. I am more providing a resource that would give an idea of the benchmarking of such that have already been performed for the same. Having said that, an amazing preprint that really stands out in this category is the benchmarking paper in single-cell space used in by Elizabetta Mereu & Atefeh Lafzi et al., 2019 for Human Cell Atlas. One can find the paper in the below link. Another important metric to consider here is the Figure 4 that outlines the % of tools that is already out there in various Categories in single-cell space. (Note: Not all of these are pubished in a Journal but most of them have a GitHub repository).

i) Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects

ii) Number of tools out there in various categories for single-cell analysis

Figure 4. Represents the percentage of tools categories in single-cell RNA-Seq since 2017. Image sourced from https://www.scrna-tools.org/analysis

3. Analytical space in single-cell genomics

This part of the post provides an understandting of the analytical space. Above plots should already provide a snapshot of analytical space in terms of the number of available tools, publication, and categories of tool development. Below Figure 5 I provide some plots to highlight the platforms (programming languages/scripting base/compiler, etc) for developing the tools and the licenses used so far. All these % provides a metric for technologies being used that serves as a snapshot for the analytical wheels at play in the field.

Figure 5. Represents the percentage of tools under various Platforms and Software licenses in single-cell RNA-Seq since 2017. Image sourced from https://www.scrna-tools.org/analysis

Note: Encourage everyone to follow 2 GitHub pages for latest softwares/notes with regards to single-cell tool and methods evolution.

Ming Tang’s GitHub , pretty detailed notes with paper references in the software sphere.

Seav Davis list of Awesome Single Cell.

4. The issue with rule-based cell assignment and annotation.

This is already a long-standing problem that is currently being faced in the single-cell space. I have written a short comment on it based on my understanding & the discussions with various experts in the field. However, there are some amazing new improvements that are taking place to tackle this.

i) Some of my thoughts can be found here.

ii) A publication based on Deep learning & single-cell by Niklas D. Köhler, Maren Büttner, & Fabian J. can be found here that also points out the concern.

iii) Finally, as per my understanding, this issue(transfer learning process to improve cell cluster annotation) has been addressed to an extent in Serurat v3(Ref: Publication). It looks promising in this space for tackling the issue & will be key when the Human Cell Atlas will create its first reference map of organs. New vignette is also up here.

5. Dimension Reduction (mostly for projection scope of cell type cluster view)

One can already make sense of the current scope and methods of Dimension reduction used based on Figure 4 where tools based on Dimensionality reduction accounts for ~20% that have been out in the single-cell RNA-Seq space. There are few projections currently used like t-SNE, UMAP, Diffusion Map, etc. However, the consensus is yet not reached as to which is the best projection algorithm in this space. To my understanding, there is a scope of more research in upcoming days for us to realize the potential of this and the algorithms that should be used. t-SNE is somehow being used as a gold standard here but I reckon this will change as there are a lot of Pros & Cons in the usage of t-SNE, the underlying properties that it projects, etc. A few of such insights can be found in my previous write up on Need for dimension reduction & high dimension data visualization.

Another publication that in reality is informative in this space is from Lan Huong Nguyen & Susan Holmes titled Ten quick tips for effective dimensionality reduction

Finally, there is a pre-print on UMAP from Cole Trapnell lab that justifies usage of this projection method & where it has an edge over the otherwise normed t-SNE.

I expect in this will be one of the most sought after field in space of single-cell genomics as also pointed out by Nikolay Oskolkov in the write-up Deep Learning for Single Cell Biology in Towards Data Science. I also feel something that will improve the field in Dimension Reduction if we get better with “Curse of Dimensionality” in genomics. This is more apparent with single-cell genomics as we start exploring the cell dimensions from 100 →1k →10k →100k →1Milion cells.

6. Downstream functional applications of single-cell data.

One of the initial downstream application that comes to mind using RNA-Seq is i) Differential gene expression analysis. With the advent of single-cell, this has progressed to differential capture of dimensions, clusters of cell states, genes in cell-state clusters, etc. Other applications in this space are

ii) RNA velocity

iii) Trajectory analysis or lineage tracking(e.g. Monocle 3 and Seurat have the required tools to perform the same). Some interesting insights of trajectory analysis can be found in this publication Concepts and limitations for learning developmental trajectories from single cell genomics.

7. Current state of art single-cell analysis workflows.

Comprehensive single cell analysis workflow (both single workflow and multi-omics) can be found in the below list of tools.

i) Seurat

ii) Monocle 3

iii) LIGER multi-omics of single-cell genomics ()

iv) Workflow for SummerizedExperiment or MultiAssayExperiment.

a) iSEE Working with TCGA data. (https://marionilab.cruk.cam.ac.uk/iSEE_tcga/#)

b) MOFA including single-cell as another added functionality. MOFA paper can be found here. Bioconductor page for the current scope.

8. Scope for Machine Learning and Deep learning in single-cell genomics

To realize the scope we will need to look into the below reviews and blog post in Deep learning space for genomics:

i) Deep learning: new computational modelling techniques for genomics

ii) Best Practises in single cell

iii) Do we have Big Data in Life Sciences?

9. Public datasets repository

I am finally coming to the end of this Medium post where I am trying to create the list of public data resources available for single cell from its inception.

I would be not doing justice if I do not mention the repository that Valentine Svensson has put up in his blogs. It is an excellent resouce and his blogs are worth visiting for learning single cell advances in-lieu of mathematical foundations that paves a way for exploration in the single cell field. scRNASeq studies published so far can be found here in this blog post of his.

Figure 6: Scatterplot shows the number of publications over the years depecting a the number of reported cells in each of them. The adjoining table also lists the date of publication, Citation, DOI, Technique Used apart from total cells reported in the studies. Image Sourced from http://www.nxn.se/single-cell-studies

Second I list out the 10x genomics datasets that is up in their website. This will span across various single-cell omics that 10x is offering (e.g. single cell Exome, scRNASeq, scATAC-Seq, etc)

Third, I cite Conquer(consistent quantification of external rna-seq data, the repository is developed by Charlotte Soneson and Mark D Robinson at the University of Zurich, Switzerland) that provides a multi-species dataset resource.

10. Data visualization portals available for visualization

Finally I am on the verge of concluding the post. In closing remarks of this post I provide some information of data visualzation portals already available in the space of single-cell genomics. I am sure in coming days we will have access to more such free open-source and open-access data visualization portals. One such that I often consult for is provided via Broad Institute, that also serves as a part of the Human Cell Atlas initiative. It is known as the Single Cell Portal .

Hope this post serves as an informative resource for all and also provides a deeper insight into the scope of the field for one and all. I intend to provide a dynamic encyclopedia of knowledge-driven and data-driven insights in the field of single-cell genomics.I will be adding more information with time as the field advances to keep the wheel runnin. I also intend to keep us all updated with the happenings in the field w.r.t the topics I have discussed in this post via regular enhancements.

I thank & acknowledge all the authors whose work I have cited in this story as they provided me enough food for thought to start with & pen down my understanding. This is how I envision the evergrowing field of single-cell genomics based on my learning and understanding. I put my readings in a collective manner for myself and others to make use of it as a resource for future research works and publications. I would be happy to add more information for clarity if needed.

Edit 1: Changed Figure 2 axes that was missing earlier. Added texts and scientific explanations for clarity. Added Figure 6.

Edit 2: Typo fixes and added text clarifications.

Edit 3: Added two GitHub repo list that I have been follwing for a while, and should be a part of this resource.

--

--

Vivek Das
Musing’s of a Data Scientist in Medicine

I am interested in Precision & Personalized Medicine. I use my skills in Biomarkers & Target Discovery employing Integrative Computational Systems Medicine.