2. Quantification of scRNAseq data
Mapping
Like for bulk RNA sequencing, we start the analysis of scRNASeq data with a list of fastq files. We want to map these reads on the transcriptome of the specie of interest to get our genes (\(M\)) x cells (\(N\)) matrix \(X^{N \times M}\) that describes, for each cell, the abundance of its constituent transcripts or genes.
Check the size of the data generated for the single-cell RNA-seq of 1.3 million brain cells from E18 mice.
We are not going to map data of this scale during the practical. However, like for RNASeq data, there exist dedicated tools for the mapping of scRNASeq data like CellRanger, KallistoBusTools or StarSolo.
These tools take as input a list of fastq files and a reference transcriptome and output a genes (\(M\)) x cells (\(N\)) matrix \(X^{N \times M}\).
Read the documentation of StarSolo (or one of the other scRNASeq mapping tools) and find the differences, between RNASeq quantification and scRNASeq quantification.
Remember that an scRNAseq experiment can be seen as a multiplexed RNASeq experiment with \(N\) samples (with the addition of UMIs)
In a classical scRNASeq exepriment, a read will contain:
- an illumina barcode
- a cell barcode
- a UMI
- the RNA sequence
From the 10X Genomics v4 protocol what is the maximum number of cells that we can sequence per sample ?
What is the number of barcode present in the kit ?
How can you explain the difference ?
Dealing with this difference is the first step of any scRNASeq data analysis.
Head to the next section where you will learn how to deal with it.
