A 2018 study from La Manno et al., brought forth another interesting aspect of RNASeq data. Particularly interesting in the case of scRNASeq data. RNA abundance is a powerful indicator of the state of individual cells. scRNA can reveal RNA abundance with high quantitative accuracy, sensitivity and throughput. However, this approach captures only a static snapshot at a point in time, posing a challenge for the analysis of time-resolved phenomena such as embryogenesis or tissue regeneration. In their paper, they show that RNA velocity—the time derivative of the gene expression state—can be directly estimated by distinguishing between unspliced and spliced mRNAs in common single-cell RNA sequencing protocols. RNA velocity is a high-dimensional vector that predicts the future state of individual cells on a timescale of hours.

We are going to perform RNA velocity analysis on the 10x 10k neurons from an E18 mouse.

We can get the data with the following commands:

We start by downloading the fastq data:

wget http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/3.0.0/neuron_10k_v3/neuron_10k_v3_fastqs.tar -O data/neuron_10k_v3_fastqs.tar
cd data/
tar -xzvf neuron_10k_v3_fastqs.tar
cd ..

On pedago-ngs.univ-lyon1.fr, you can find the file here:

mkdir data
ln -s /data/share/MADT/TP_lmodolo/data/neuron_10k/ data/neuron_10k

In order to know which reads come from spliced as opposed to unspliced transcripts, we need to see whether the reads contain intronic sequences. Thus we need to include intronic sequences in the kallisto index. This can be done with the BUSpaRse function get_velocity_files, which generates all files required to run RNA velocity with kallisto | bustools. First, we need a genome annotation to get intronic sequences. We can get genome annotation from GTF or GFF3 files from Ensembl with getGTF or getGFF from the R package biomartr, but Bioconductor provides genome annotations in its databases and package ecosystems as well. UCSC annotation can be obtained from Bioconductor package TxDb.Mmusculus.UCSC.mm10.knownGene. Here Ensembl version 97 is used, but Bioconductor 3.10 also provides version 98.

ah <- AnnotationHub()
query(ah, pattern = c("Ensembl", "97", "Mus musculus", "EnsDb"))

## AnnotationHub with 1 record
## # snapshotDate(): 2020-10-27
## # names(): AH73905
## # $dataprovider: Ensembl
## # $species: Mus musculus
## # $rdataclass: EnsDb
## # $rdatadateadded: 2019-05-02
## # $title: Ensembl 97 EnsDb for Mus musculus
## # $description: Gene and protein annotations for Mus musculus based on Ensem...
## # $taxonomyid: 10090
## # $genome: GRCm38
## # $sourcetype: ensembl
## # $sourceurl: http://www.ensembl.org
## # $sourcesize: NA
## # $tags: c("97", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",
## #   "Protein", "Transcript") 
## # retrieve record with 'object[["AH73905"]]'

edb <- ah[["AH73905"]]
dir.create("results/neuron10k_velocity")
BiocManager::install("BSgenome.Mmusculus.UCSC.mm10")
library("BSgenome.Mmusculus.UCSC.mm10")
get_velocity_files(edb, L = 91, Genome = BSgenome.Mmusculus.UCSC.mm10,
                   out_path = "results/neuron10k_velocity",
                   isoform_action = "separate")
rm(ah, edb)

We build the kallisto index for the spliced and unspliced transcript

kallisto index -i results/neuron10k_velocity/mm_cDNA_introns_97.idx \
  results/neuron10k_velocity/cDNA_introns.fa

# or download it from KB as it takes 60Gb and more than 1h to build
ls -s /data/share/MADT/TP_lmodolo/neuron10k_velocity/ results/neuron_10k_velocity

With this spliced, unspliced transcript Reference, we can use kb to perform all the necessary steps to compute the UMI count matrix.

kb count \
  -i results/neuron10k_velocity/mm_cDNA_introns_97.idx \
  -g results/neuron10k_velocity/tr2g.tsv \
  -t 24 \
  -m 100G \
  -x 10xv3 \
  -o results/neuron10k_velocity/kb \
  -c1 results/neuron10k_velocity/cDNA_tx_to_capture.txt \
  -c2 results/neuron10k_velocity/introns_tx_to_capture.txt \
  --lamanno \
  data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L002_R1_001.fastq.gz \
  data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L002_R2_001.fastq.gz \
  data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L001_R1_001.fastq.gz \
  data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L001_R2_001.fastq.gz

[build] loading fasta file results/neuron10k_velocity/cDNA_introns.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 779 target sequences
[build] warning: replaced 4044739 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ...
done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 10053317 contigs and contains 1112521192 k-mers

1 Loading data

We can load the kb outputs with the read_velocity_output() from the neuron10k_velocity/kb/ folder. We will get a list of two sparceMatrix.

res_mat_list <- read_velocity_output(
  spliced_dir = "results/neuron10k_velocity/kb/counts_unfiltered/",
  spliced_name = "spliced",
  unspliced_dir = "results/neuron10k_velocity/kb/counts_unfiltered/",
  unspliced_name = "unspliced"
  )
spliced <- res_mat_list$spliced
unspliced <- res_mat_list$unspliced
rm(res_mat_list)

From the total sum (sum()) of each of these matrix, we can compute the percentage of unspliced UMI counts.

sum(unspliced@x) / (sum(unspliced@x) + sum(spliced@x))

## [1] 0.4114181

dim(spliced)

## [1]   55487 1134440

dim(unspliced)

## [1]  55487 723863

We expect around 10,000 cells. There are over 10 times more barcodes here, since most barcodes are from empty droplets. The number of genes does not seem too outrageous.

2 Quality control

We will have to remove empty droplets with the barcodesRanks() function.

bc_rank <- barcodeRanks(spliced)
bc_uns <- barcodeRanks(unspliced)

knee_plot <- function(bc_ranks) {
  knee_plt <- tibble(rank = map(bc_ranks, ~ .x[["rank"]]),
                     total = map(bc_ranks, ~ .x[["total"]]),
                     dataset = names(bc_ranks)) %>% 
    unnest(cols = c(rank, total)) %>%
    distinct() %>%
    dplyr::filter(total > 0)
  annot <- tibble(inflection = map_dbl(bc_ranks, ~ metadata(.x)[["inflection"]]),
                  rank_cutoff = map_dbl(bc_ranks,
                                        ~ max(.x$rank[.x$total >
                                                        metadata(.x)[["inflection"]]])),
                  dataset = names(bc_ranks))
  p <- ggplot(knee_plt, aes(rank, total, color = dataset)) +
    geom_line() +
    geom_hline(aes(yintercept = inflection, color = dataset),
               data = annot, linetype = 2) +
    geom_vline(aes(xintercept = rank_cutoff, color = dataset),
               data = annot, linetype = 2) +
    scale_x_log10() +
    scale_y_log10() +
    labs(x = "Rank", y = "Total UMIs") +
    coord_flip()
  return(p)
}

knee_plot(list(spliced = bc_rank, unspliced = bc_uns))

Which inflection point should be used to remove what are supposed to be empty droplets? The one of the spliced matrix or the unspliced matrix? For now, for simplicity, the inflection point for the spliced matrix will be used. Construct a sf and uf sparceMatrix without empty droplets.

rm(bc_uns)
rownames(spliced) <- str_replace(rownames(spliced), "(.*)\\..*", "\\1") 
rownames(unspliced) <- str_replace(rownames(unspliced), "(.*)\\..*", "\\1") 
bcs_use <- colnames(spliced)[
  nexprs(spliced, byrow = F) > metadata(bc_rank)$inflection
]
# Remove genes that aren’t detected
genes_use <- rownames(spliced)[
  nexprs(spliced, byrow = T) > 0
]
sce <- SingleCellExperiment(
      assays = list(
          spliced = spliced[genes_use, bcs_use],
          unspliced = unspliced[genes_use, bcs_use]
      )
  )
if (file.exists("results/neuron10k_velocity/sce_raw.Rdata")) {
  load(file = "results/neuron10k_velocity/sce_raw.Rdata",
       verbose = T)
} else {
  save(sce, file = "results/neuron10k_velocity/sce_raw.Rdata")
}

## Loading objects:
##   sce

rm(bcs_use, genes_use, spliced, unspliced, bc_rank)

3 Normalisation

We are going to normalize for library effect with SCTransform.

colData(sce)$umi_spliced <- colSums(assays(sce)$spliced)
colData(sce)$log_umi_spliced <- log10(colData(sce)$umi_spliced)
spliced_norm <- sctransform::vst(
  assays(sce)$spliced,
  cell_attr = colData(sce),
  latent_var = c("log_umi_spliced"),
  return_gene_attr = T,
  return_cell_attr = T,
  show_progress = F) %>% 
  correct() %>% 
  Matrix(sparse = T)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  14%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |=============================================================         |  86%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%

We then normalize and add the unspliced data to our sce object.

colData(sce)$umi_unspliced <- colSums(assays(sce)$unspliced)
colData(sce)$log_umi_unspliced <- log10(colData(sce)$umi_unspliced)
unspliced_norm <- sctransform::vst(
  assays(sce)$unspliced,
  cell_attr = colData(sce),
  latent_var = c("log_umi_spliced"),
  return_gene_attr = T,
  return_cell_attr = T,
  show_progress = F) %>% 
  correct() %>% 
  Matrix(sparse = T)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |===================                                                   |  26%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |===================================================                   |  74%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%

u_genes <- intersect(rownames(spliced_norm), rownames(unspliced_norm))
spliced_norm[rownames(spliced_norm) %in% u_genes, ] %>% dim()

## [1] 14539  9971

unspliced_norm[rownames(unspliced_norm) %in% u_genes, ] %>% dim()

## [1] 14539  9971

sce_norm <- SingleCellExperiment(
      assays = list(
          spliced = spliced_norm[rownames(spliced_norm) %in% u_genes, ] %>% 
            Matrix(sparse = T),
          unspliced = unspliced_norm[rownames(unspliced_norm) %in% u_genes, ] %>% 
            Matrix(sparse = T)
      )
  )

We can save our sce object to use for future crash

if(file.exists("results/neuron10k_velocity/sce_norm.Rdata")){
  load(file = "results/neuron10k_velocity/sce_norm.Rdata",
       verbose = T)
}else{
  save(sce_norm, file = "results/neuron10k_velocity/sce_norm.Rdata")
}

## Loading objects:
##   sce_norm

rm(sce, spliced_norm, unspliced_norm, u_genes)

4 Cell type annotation

With most droplets experiments, we don’t have additional information on the cells (like surface protein markers). Instead of using surface markers, SingleR uses bulk RNA-seq data of isolated known cell types as a reference to annotate cell types in scRNA-seq datasets. The reference uses Ensembl IDs without version number.

mouse_rnaseq <- MouseRNAseqData(ensembl = TRUE)
sce <- sce_norm[rownames(sce_norm) %in% rownames(mouse_rnaseq), ]
annots <- SingleR(
  assays(sce)$spliced,
  ref = mouse_rnaseq,
  labels = colData(mouse_rnaseq)$label.fine,
  de.method = "wilcox",
  method = "single"
)
rm(mouse_rnaseq)

For our RNA velocity analyze, we are mostly interested in neural or glial lineages. For clarity, we filter out cells that are not from these two lineages.

inds <- annots$pruned.labels %in% c(
  "NPCs", "Neurons", "OPCs", "Oligodendrocytes",
  "qNSCs", "aNSCs", "Astrocytes", "Ependymal"
)
colData(sce)$cell_type <- annots$pruned.labels
sce <- sce[, inds]

5 Identification of highly variable genes

Contrary to the first part of the practical we are going to work with Seurat objects instead of SingleCellExperiment objects, from now on. Seurat is an R package which contains a large set of tools for scRNASeq analysis. Of course, it also has its own way of storing single-cell data (with the CreateSeuratObject() function). The functionalities of a SeuratObject are roughly the same as the ones of an SingleCellExperiment.

sce <- logNormCounts(sce, exprs_values = "spliced")
seu <- as.Seurat(sce, counts = "spliced", assay = "spliced")
seu[["unspliced"]] <- CreateAssayObject(assays(sce)$unspliced)
DefaultAssay(seu) <- "spliced"

We can save the seu object.

if(file.exists("results/neuron10k_velocity/seu_annot.Rdata")){
  load(file = "results/neuron10k_velocity/seu_annot.Rdata",
       verbose = T)
}else{
  save(seu, file = "results/neuron10k_velocity/seu_annot.Rdata")
}

## Loading objects:
##   seu

rm(sce, sce_norm)

To speedup the computation and focus on interesting genes, we are going to work only with the highly variable genes. We are going to use the Seurat procedure described here. You can use the FindVariableFeatures() function to identify highly variable genes.

seu <- FindVariableFeatures(seu, selection.method = "vst", nfeatures = 2000)

# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(seu), 10)

# plot variable features with and without labels
plot1 <- VariableFeaturePlot(seu)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot2

By default, we select 2,000 genes per dataset. These will be used in downstream analysis, like PCA.

6 Dimension Reduction

RNA velocity is a cell displacement vector of size the number of genes. Like for gene expression data, we want to visualize these data in a lower dimension space.

When visualizing RNA velocity on reduced dimensions, should the cell embeddings be from the spliced matrix or the unspliced matrix or the sum of both? In my opinion, it makes the most sense to plot RNA velocity over cell embeddings from the spliced matrix. The arrows in RNA velocity visualization stand for where the cell is predicted to be going in the near future. Where does the cell go from? The current state. And the current state is represented by the spliced matrix, while the unspliced matrix represents what is soon to come. Thus all the dimension reduction here will be computed from the spliced matrix.

We can use the DefaultAssay() function to define the assay we will be working on in Seurat. The RunPCA() and ElbowPlot() function performe an SVD decomposition and plot the variance explained by the first axis.

seu <- ScaleData(seu)
seu <- RunPCA(seu, verbose = FALSE, npcs = 70)
ElbowPlot(seu, ndims = 70)

In Seurat when the dimension reduction is computed you can plot the results, with the function DimPlot().

DimPlot(
  seu, reduction = "pca", group.by = "cell_type",
  pt.size = 0.5, label = TRUE, repel = TRUE
  ) +
  scale_color_brewer(type = "qual", palette = "Set2")

The DimHeatmap() function allows for easy exploration of the primary sources of heterogeneity in a dataset, and can be useful when trying to decide which PCs to include for further downstream analyses. Both cells and features are ordered according to their PCA scores. Setting cells to a number plots the ‘extreme’ cells on both ends of the spectrum, which dramatically speeds plotting for large datasets. Though clearly a supervised analysis, we find this to be a valuable tool for exploring correlated feature sets.

DimHeatmap(seu, dims = 1, cells = 500, balanced = TRUE)

We can also use non-linear dimension reduction technique like \(t\)-SNE and UMAP in Seurat. Try the RunTSNE() and RunUMAP() function.

seu <- RunTSNE(seu, dims = 1:50, verbose = FALSE)
  DimPlot(seu, reduction = "tsne",
        group.by = "cell_type", pt.size = 0.5, label = TRUE, repel = TRUE) +
  scale_color_brewer(type = "qual", palette = "Set2")

seu <- RunUMAP(seu, dims = 1:50, umap.method = "uwot")
DimPlot(seu, reduction = "umap",
        group.by = "cell_type", pt.size = 0.5, label = TRUE, repel = TRUE) +
  scale_color_brewer(type = "qual", palette = "Set2")

As expected, one end of the plot has mostly stem cells, and the other end has mostly neurons. Clustering should petition the big blob of NPCs that SingleR could not further partition due to limitations in the SingleR reference for mouse brains. Use the FindNeighbors() and FindClusters() function for the clustering.

seu <- FindNeighbors(seu, verbose = FALSE) %>% 
  FindClusters(resolution = 1, verbose = FALSE) # Louvain
DimPlot(seu, pt.size = 0.5, reduction = "umap", label = TRUE)

7 RNA Velocity

Instead of using the old R package cytoVelo we are going to use the recent scvelo tools which perform a better estimatation of the transcription and degradation rate. The one small problem we will face is that scvelo is only available as a python package.

scvelo can read data with in the form of AnnotData class from .loom file. The LoomExperiment R package makes the SingleCellExperiment to loom conversion possible. You can also use the as.SingleCellExperiment() to convert a SeuratObject into a SingleCellExperiment.

if(file.exists("results/neuron10k_velocity/sce_annot.Rdata")){
  load(file = "results/neuron10k_velocity/sce_annot.Rdata",
       verbose = T)
}else{
  save(sce, file = "results/neuron10k_velocity/sce_annot.Rdata")
}

## Loading objects:
##   sce

load(file = "results/neuron10k_velocity/sce_annot.Rdata", verbose = T)

## Loading objects:
##   sce

sce_clean <- SingleCellExperiment(
    assays = list(
      logcounts = assay(sce, "logcounts"),
      spliced = assay(sce, "spliced"),
      unspliced = assay(sce, "unspliced")
    ), 
    colData = colData(sce),
    rowData = rowData(sce)
)
scle <- LoomExperiment::LoomExperiment(sce_clean)
file.remove("results/neuron10k_velocity/sce_annot.loom")

## [1] TRUE

LoomExperiment::export(scle, "results/neuron10k_velocity/sce_annot.loom")

Then we can launch a python3 console within our singularity container and load scvelo and the data: We can check that the adata variable contains the spliced and unsplicedassay.

import scvelo as scv
adata = scv.read("results/neuron10k_velocity/sce_annot.loom")
adata.col_names = adata.obs["colnames"]
adata.var_names = adata.var["rownames"]
adata

## AnnData object with n_obs × n_vars = 9793 × 11948
##     obs: 'cell_type', 'colnames', 'sizeFactor'
##     var: 'rownames'
##     obsm: 'colnames_factor'
##     layers: 'matrix', 'spliced', 'unspliced'

scv.utils.show_proportions(adata)

## Abundance of ['spliced', 'unspliced']: [0.54 0.46]

With scVelo Preprocessing that is necessary consists of :

gene selection by detection (detected with a minimum number of counts) and high variability (dispersion).
normalizing every cell by its initial size and logarithmizing X.

Filtering and normalization is applied in the same vein to spliced/unspliced counts and X. Logarithmizing is only applied to X. If X is already preprocessed from former analysis, it won’t touch it.

scv.pp.filter_genes(adata, min_shared_counts=10)

## Filtered out 2214 genes that are detected 10 counts (shared).

scv.pp.normalize_per_cell(adata)

## Normalized count data: X, spliced, unspliced.

scv.pp.filter_genes_dispersion(adata, n_top_genes=3000)

## Extracted 3000 highly variable genes.

scv.pp.log1p(adata)

Further, we need the first and second order moments (basically mean and uncentered variance) computed among nearest neighbors in PCA space. First order is needed for deterministic velocity estimation, while stochastic estimation also requires second order moments.

scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

## computing neighbors
##     finished (0:00:18) --> added 
##     'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
## computing moments based on connectivities
##     finished (0:00:03) --> added 
##     'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)

The gene-specific velocities are obtained by fitting a ratio between precursor (unspliced) and mature (spliced) mRNA abundances that well explains the steady states (constant transcriptional state) and then computing how the observed abundances deviate from what is expected in steady state. (We will soon release a version that does not rely on the steady state assumption anymore).

scv.tl.velocity(adata)

## computing velocities
##     finished (0:00:07) --> added 
##     'velocity', velocity vectors for each individual cell (adata.layers)

This computes the (cosine) correlation of potential cell transitions with the velocity vector in high dimensional space. The resulting velocity graph has dimension 𝑛𝑜𝑏𝑠×𝑛𝑜𝑏𝑠

and summarizes the possible cell state changes (given by a transition from one cell to another) that are well explained through the velocity vectors. If you set `approx=Tru`` it is computed on a reduced PCA space with 50 components.

The velocity graph can then be converted to a transition matrix by applying a Gaussian kernel on the cosine correlation which assigns high probabilities to cell state changes that correlate with the velocity vector. You can access the Markov transition matrix via scv.tl.transition_matrix. The resulting transition matrix can be used for a variety of applications shown hereinafter. For instance, it is used to place the velocities into a low-dimensional embedding by simply applying the mean transition with respect to the transition probabilities, i.e. scv.tl.velocity_embeddin. Further, we can trace cells back along the Markov chain to their origins and potential fates, thus obtaining root cells and end points within a trajectory; via scv.tl.terminal_states

scv.tl.velocity_graph(adata)

## computing velocity graph
## 
... 4%
... 8%
... 12%
... 17%
... 21%
... 25%
... 29%
... 33%
... 37%
... 40%
... 43%
... 47%
... 50%
... 53%
... 57%
... 60%
... 64%
... 67%
... 70%
... 74%
... 77%
... 80%
... 83%
... 87%
... 90%
... 94%
... 97%
... 100%
    finished (0:01:23) --> added 
##     'velocity_graph', sparse matrix with cosine correlations (adata.uns)

scv.pl.velocity_graph(adata, color = "cell_type")

Finally, the velocities are projected onto any embedding specified in basis and visualized in one of three available ways: on single cell level, on grid level, or as streamplot as shown here.

For this we are going to use an umap projection

scv.tl.umap(adata)
scv.pl.velocity_embedding_stream(adata, basis = "umap", color = "cell_type")

## computing velocity embedding
##     finished (0:00:03) --> added
##     'velocity_umap', embedded velocity vectors (adata.obsm)

scv.pl.velocity_embedding(adata, basis='umap', arrow_length=2, arrow_size=2, dpi=150)

scv.pl.velocity_embedding_grid(adata, color='cell_type',
                               layer=['velocity', 'spliced'], arrow_size=1.5)

scv.tl.rank_velocity_genes(adata, groupby='cell_type')

## ranking velocity genes
##     finished (0:00:07) --> added 
##     'rank_velocity_genes', sorted scores by group ids (adata.uns) 
##     'spearmans_score', spearmans correlation scores (adata.var)


scv.DataFrame(adata.uns['rank_velocity_genes']['names']).head()

##            Astrocytes           Ependymal                NPCs             Neurons                OPCs    Oligodendrocytes               aNSCs               qNSCs
## 0  ENSMUSG00000055737  ENSMUSG00000029212  ENSMUSG00000069769  ENSMUSG00000004113  ENSMUSG00000028634  ENSMUSG00000054640  ENSMUSG00000042751  ENSMUSG00000026235
## 1  ENSMUSG00000018849  ENSMUSG00000024812  ENSMUSG00000027210  ENSMUSG00000056222  ENSMUSG00000027238  ENSMUSG00000060961  ENSMUSG00000025666  ENSMUSG00000062991
## 2  ENSMUSG00000056870  ENSMUSG00000044167  ENSMUSG00000028565  ENSMUSG00000055022  ENSMUSG00000061080  ENSMUSG00000020181  ENSMUSG00000027238  ENSMUSG00000072294
## 3  ENSMUSG00000060924  ENSMUSG00000022708  ENSMUSG00000027993  ENSMUSG00000021068  ENSMUSG00000044117  ENSMUSG00000045038  ENSMUSG00000038048  ENSMUSG00000008575
## 4  ENSMUSG00000028403  ENSMUSG00000052942  ENSMUSG00000066406  ENSMUSG00000021268  ENSMUSG00000052852  ENSMUSG00000032826  ENSMUSG00000003279  ENSMUSG00000055761

var_names = ["ENSMUSG00000055737", "ENSMUSG00000004113"]
scv.pl.velocity(adata, var_names=var_names, colorbar=True, ncols=1)

If you want to learn more about scRNASeq analysis try to keep up with the bibliography on the subject !

single-cell RNA sequencing analysis

Laurent Modolo

April 6, 2021