Pre-processing scRNA Seq

I have been working as a single cell bioinformatician, at Newcastle University for nearly two and half years, supporting research groups with their work. I have analysed scRNA-seq, sc-qPCR, CYTOF and high dimensional flow cytometry data and I am regularly amazed by how much biologically meaningful information can be obtained from just one single cell.
In 2015 Stegle et al.1  wrote:

Although some tools for analysing RNA-seq data from bulk cell populations can be readily applied to single-cell RNA-seq data, many new computational strategies are required to fully exploit this data type and to enable a comprehensive yet detailed study of gene expression at the single-cell level.

Three years on, a quick search of the internet will soon show that the computational biology community has embraced this challenge and, when it comes to analysing scRNA seq data, there are numerous methods available. In fact, according to this useful resource, there are currently 182 tools to choose from! With this many resources availble it is difficult to know where to start!

Scater –  Single-cell analysis toolkit for gene expression data in R

I was glad, therefore, to discover the Scater R package early on in my search for analysis solutions. The method was published2 in January 2017 but the R package has been available in Bioconductor for over two years. It is well documented, regularly updated and useable.

What does it do?

Scater sits between raw counts data and downstream biological interpretation in scRNA-seq analysis. It includes a number of functions to quickly calculate simple quality metrics and plot single cell data. This is great because, if you are impatient like me, one of the first things that you will want to know when you are doing analysis is “has the experiment worked?” and secondly “are there any factors which might cloud the biological interpretation of the data?”. Using the Scater package it is possible to gauge the quality of the cells captured by using metrics such as: the total expression per cell, percentage mitochondria or spike ins per sample and the number of genes expressed per cell. When you are analysing mulitdimensional data, is useful to look at it from multiple angles and the numerous plotting functions incorporated into Scater make it very easy to look for possible confounding factors and filter debris or other noise that would obscure the analysis. What is more, because the plots use ggplot2, it is easily possible to customise them if necessary.
The other big selling point of Scater is how well it integrates with other stages of scRNAseq analysis:

  • It includes functions for importing quantified counts from Salmon, Kalisto, and 10x genomics
  • It contains a function to normalise data from pre-computed size factors, such as those generated by the Scran R package
  • The data is stored in SingleCellExperiment S4 object the QCed; normalised data can be used with any other other R package that uses the SingleCellExperiment object

Basically Scater is a great starting point for scRNAseq analysis.

Stegle O,Teichmann SA, and Marioni JC (2015), “Computational and analytical challenges in single-cell transcriptomicsNature Reviews Genetics

McCarthy DJ, Campbell KR, Lun ATL and Wills QF (2017). “Scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R.” Bioinformatics