How scRNA-seq analysis has made analyzing sequencing data an extremely easy task

How scRNA-seq analysis has made analyzing sequencing data an extremely easy task

Single-cell RNA sequencing is expanding into the clinical setting (Haque et al. 2017). With the inherent complexity of NGS technologies, it becomes a challenge to acquire the laboratory and data analysis techniques necessary for NGS experiments. In scRNA-seq, researchers are taking steps towards improving the accessibility of scRNA-seq tools by providing resources that critically review the available technologies (e.g. Svensson et al. 2017). Some studies evaluate the available data analysis tools and provide insight into their strengths and weaknesses (Chen et al. 2019, Buttner et al. 2017). Numerous papers have been published that give explicit guidance in the workflow of scRNA-seq experiments (e.g. Haque et al. 2017). Below we highlight some of the steps involved in scRNA-seq analysis and some of the available reviews and resources that make scRNA-seq data analysis easier. have ma

ScRNA-seq data pre-processing analysis

Quality control

Quality control is a critical step in single cell RNA-seq analysis since noise is generally higher than in other transcriptome analyses such as bulk RNA-seq (Chen et al. 2019). Quality control includes the identification and removal of cells with low quality. Different ways to identify and deal with low-quality cells have been evaluated (Ilicic et al. 2016).

Read mapping and expression quantification

The overall quality of scRNA-seq data is reflected in the ratio of reads that can be mapped. Much as in bulk scRNA-seq, scRNA-seq reads are generated in a fast format and can be mapped using the same mapping tools which have been critically assessed and reviewed (Li and Homer 2010).


Due to the inherent noisiness (from capture efficiency, sequencing depth, dropouts, etc) of scRNA-seq data it is imperative to normalize the data prior to any downstream analysis. Normalization allows for the removal of unwanted bias that may cloud biologically significant results. Normalization can be conducted between samples and within samples. The fore corrects for biases between gene expression levels and the later corrects for biases such as sequencing depth (Chan et al 2019). Wu et al. (2019) designed a matrix that evaluated 14 normalization methods in scRNA-seq and bulk RNA-seq.

Batch correction

Given the size of scRNA-seq experiments, experiments may be conducted in batches over different days and, in some cases, with different expertise. As a result, variations may appear as an artifact of Batch effect (Hwang et al. 2018). Batch correction techniques encompass linear models and near neighbor matching tools (Haghverdi et al. 2018). Batch effect correction methods have been critically evaluated (B├╝ttner et al. 2017). Good experimental design and methodical execution can minimize the batch effect on the results.


In single cell RNA-seq data can have missing values and dropouts that result from failure to amplify some of the expressed RNA. This can introduce noise that causes false amplification in signal variations (Chen et al. 2019). In this case, bulk RNA-seq correction methods are not applicable, therefore scRNA-seq-specific methods have been devised such as RESCUE (Tracy et al. 2019) and scImpute (Li and Li 2018).

Dimensionality reduction and visualization

Single-cell RNA sequencing data are with a high dimensionality, which may involve thousands of genes and cells. Dimensionality reduction and feature selection reduce the data to manageable dimensions by projecting it to fewer dimensions while still retaining the key signals (Andrews and Hemberg 2018). Principal component analysis (PCA) and T-distributed stochastic neighbor embedding (t-SNE) have been commonly used in single cell RNA-seq analysis (Hwang et al. 2018, Rizvi et al. 2017). T-SNE usually performs better than the linear PCA (Zeng and Dai 2019, Chen et al. 2019). Dimensionality reduction method tailored for scRNA-esq data have been developed, namely UMAP (Becht et al., 2018) and scvis (Ding et al. 2018).


Subpopulation identification

The ability to identify cell-to-cell heterogeneity is at the forefront of single cell RNA-seq analysis (Birnbaum 2018). Cell subpopulations usually constitute distinct cell types that can be identified using clustering analysis. Clustering analysis can be done using ether supervised or unsupervised methods. Supervised methods require a priori information whereas unsupervised do not (Chen et al. 2019). Clustering techniques designed for single cell RNA-seq analysis include Seurat clustering method (Satija et al., 2015) and SC3 (Kiselev et al. 2017). Once cell populations are identified and separated, common inferential statistical analyses e.g. analysis of variance can be used to identify key markers that discriminate cell subpopulations.

Differential expression

Detecting differentially expressed genes (DEGs) between cell groups (subpopulations) involves differential expression analysis (Chen et al. 2019). Differentially expressed genes can provide critical information that can be used in diagnostics and personalized medicine. DEG detection methods tailored to single cell RNA-seq analysis have been developed, including BCseq (Chen and Zheng 2018) and MAST (Finak et al. 2015). Additionally, many of the available differential expression methods have been critically reviewed which provides invaluable information and advice for considerations on DEG detection (Soneson and Robinson 2018).

Pseudotime reconstruction

Single cell RNA-seq has been used to study cell developmental trajectories using a pseudo timeline. Since the same cell cannot be studied along a developmental timeline yet, cells are sampled along a timeline and then lined up in the order they were sampled in(Birnbaum 2018). There are many tools employed in single cell RNA seq analysis to infer cell trajectory, e.g. Monocle2 (Qiu et al., 2017) and DPT (Haghverdi et al. 2015). Trajectories and cell lineage inference tools have been critically assessed (Chen et al. 2018, Saelens et al. 2018).

Alternative splicing

Tools to study alternative splicing in scRNA-seq experiments have been designed that can identify alternative splicing and potentially identify isoforms between cell types. These include SingleSplice (Welch et al. 2016) and Census (Qiu et al. 2017).

Network reconstruction

Network inference of scRNA-seq data visualizes gene expression interactions and correlations that may reveal biologically significant patterns. While ideal for revealing detailed interactions, network inference is sensitive to technical noise and cell state variation and should therefore be handled carefully Chen et al 2019). SCENIC was specifically developed for network inference in single cell RNA-seq studies (Aibar et al. 2017).

Available resources

To increase the accessibility and ease of single cell rna-seq analysis a number of pipelines and databases have been created for scRNA-seq studies. For example, Zappia et al. (2018) created a database that catalogs and annotates tools developed for single cell RNA seq analysis. Web servers, such as scQuery are readily available that enable identification of key genes, cell types, amongst other features (Alavi et al. 2018). Pipelines for scRNA-seq data analysis include Basepair's scRNA-seq pipeline, Seurat and SC3.

Given the inherent complexity of NGS technologies, transitioning techniques such scRNA-seq into clincal settings present some technical challenges related to the complexities involved single cell RNA seq analysis. The wide range of tools available have been critically reviewed and pipelines have been created that make embarking on scRNA-seq experiments much easier. The continuous innovations, a wide range of open-access tools and many online workshops make studying single cell transcriptomics more accessible. The advances in bioinformatics approaches are facilitating biological and clinical studies that continue to provide invaluable insights into gene expression and cell-to-cell heterogeneity.

Leave your vote

How scRNA-seq analysis has made analyzing sequencing data an extremely easy task, Source: