A 2018 study from La Manno et al., brought forth another interesting aspect of RNASeq data. Particularly interesting in the case of scRNASeq data. RNA abundance is a powerful indicator of the state of individual cells. scRNA can reveal RNA abundance with high quantitative accuracy, sensitivity and throughput. However, this approach captures only a static snapshot at a point in time, posing a challenge for the analysis of time-resolved phenomena such as embryogenesis or tissue regeneration. In their paper, they show that RNA velocity—the time derivative of the gene expression state—can be directly estimated by distinguishing between unspliced and spliced mRNAs in common single-cell RNA sequencing protocols. RNA velocity is a high-dimensional vector that predicts the future state of individual cells on a timescale of hours.
We are going to perform RNA velocity analysis on the 10x 10k neurons from an E18 mouse.
We can get the data with the following commands:
We start by downloading the fastq data:
wget http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/3.0.0/neuron_10k_v3/neuron_10k_v3_fastqs.tar -O data/neuron_10k_v3_fastqs.tar
cd data/
tar -xzvf neuron_10k_v3_fastqs.tar
cd ..
On pedago-ngs.univ-lyon1.fr
, you can find the file here:
mkdir data
ln -s /data/share/MADT/TP_lmodolo/data/neuron_10k/ data/neuron_10k
In order to know which reads come from spliced as opposed to unspliced transcripts, we need to see whether the reads contain intronic sequences. Thus we need to include intronic sequences in the kallisto index. This can be done with the BUSpaRse
function get_velocity_files
, which generates all files required to run RNA velocity with kallisto | bustools
. First, we need a genome annotation to get intronic sequences. We can get genome annotation from GTF or GFF3 files from Ensembl with getGTF
or getGF
F from the R package biomartr
, but Bioconductor provides genome annotations in its databases and package ecosystems as well. UCSC annotation can be obtained from Bioconductor package TxDb.Mmusculus.UCSC.mm10.knownGene
. Here Ensembl version 97 is used, but Bioconductor 3.10 also provides version 98.
ah <- AnnotationHub()
query(ah, pattern = c("Ensembl", "97", "Mus musculus", "EnsDb"))
## AnnotationHub with 1 record
## # snapshotDate(): 2020-10-27
## # names(): AH73905
## # $dataprovider: Ensembl
## # $species: Mus musculus
## # $rdataclass: EnsDb
## # $rdatadateadded: 2019-05-02
## # $title: Ensembl 97 EnsDb for Mus musculus
## # $description: Gene and protein annotations for Mus musculus based on Ensem...
## # $taxonomyid: 10090
## # $genome: GRCm38
## # $sourcetype: ensembl
## # $sourceurl: http://www.ensembl.org
## # $sourcesize: NA
## # $tags: c("97", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",
## # "Protein", "Transcript")
## # retrieve record with 'object[["AH73905"]]'
edb <- ah[["AH73905"]]
dir.create("results/neuron10k_velocity")
BiocManager::install("BSgenome.Mmusculus.UCSC.mm10")
library("BSgenome.Mmusculus.UCSC.mm10")
get_velocity_files(edb, L = 91, Genome = BSgenome.Mmusculus.UCSC.mm10,
out_path = "results/neuron10k_velocity",
isoform_action = "separate")
rm(ah, edb)
We build the kallisto index for the spliced and unspliced transcript
kallisto index -i results/neuron10k_velocity/mm_cDNA_introns_97.idx \
results/neuron10k_velocity/cDNA_introns.fa
# or download it from KB as it takes 60Gb and more than 1h to build
ls -s /data/share/MADT/TP_lmodolo/neuron10k_velocity/ results/neuron_10k_velocity
With this spliced, unspliced transcript Reference, we can use kb to perform all the necessary steps to compute the UMI count matrix.
kb count \
-i results/neuron10k_velocity/mm_cDNA_introns_97.idx \
-g results/neuron10k_velocity/tr2g.tsv \
-t 24 \
-m 100G \
-x 10xv3 \
-o results/neuron10k_velocity/kb \
-c1 results/neuron10k_velocity/cDNA_tx_to_capture.txt \
-c2 results/neuron10k_velocity/introns_tx_to_capture.txt \
--lamanno \
data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L002_R1_001.fastq.gz \
data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L002_R2_001.fastq.gz \
data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L001_R1_001.fastq.gz \
data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L001_R2_001.fastq.gz
[build] loading fasta file results/neuron10k_velocity/cDNA_introns.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 779 target sequences
[build] warning: replaced 4044739 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ...
done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 10053317 contigs and contains 1112521192 k-mers
We can load the kb outputs with the read_velocity_output()
from the neuron10k_velocity/kb/
folder. We will get a list of two sparceMatrix
.
res_mat_list <- read_velocity_output(
spliced_dir = "results/neuron10k_velocity/kb/counts_unfiltered/",
spliced_name = "spliced",
unspliced_dir = "results/neuron10k_velocity/kb/counts_unfiltered/",
unspliced_name = "unspliced"
)
spliced <- res_mat_list$spliced
unspliced <- res_mat_list$unspliced
rm(res_mat_list)
From the total sum (sum()
) of each of these matrix, we can compute the percentage of unspliced UMI counts.
sum(unspliced@x) / (sum(unspliced@x) + sum(spliced@x))
## [1] 0.4114181
dim(spliced)
## [1] 55487 1134440
dim(unspliced)
## [1] 55487 723863
We expect around 10,000 cells. There are over 10 times more barcodes here, since most barcodes are from empty droplets. The number of genes does not seem too outrageous.
We will have to remove empty droplets with the barcodesRanks()
function.
bc_rank <- barcodeRanks(spliced)
bc_uns <- barcodeRanks(unspliced)
knee_plot <- function(bc_ranks) {
knee_plt <- tibble(rank = map(bc_ranks, ~ .x[["rank"]]),
total = map(bc_ranks, ~ .x[["total"]]),
dataset = names(bc_ranks)) %>%
unnest(cols = c(rank, total)) %>%
distinct() %>%
dplyr::filter(total > 0)
annot <- tibble(inflection = map_dbl(bc_ranks, ~ metadata(.x)[["inflection"]]),
rank_cutoff = map_dbl(bc_ranks,
~ max(.x$rank[.x$total >
metadata(.x)[["inflection"]]])),
dataset = names(bc_ranks))
p <- ggplot(knee_plt, aes(rank, total, color = dataset)) +
geom_line() +
geom_hline(aes(yintercept = inflection, color = dataset),
data = annot, linetype = 2) +
geom_vline(aes(xintercept = rank_cutoff, color = dataset),
data = annot, linetype = 2) +
scale_x_log10() +
scale_y_log10() +
labs(x = "Rank", y = "Total UMIs") +
coord_flip()
return(p)
}
knee_plot(list(spliced = bc_rank, unspliced = bc_uns))
Which inflection point should be used to remove what are supposed to be empty droplets? The one of the spliced matrix or the unspliced matrix? For now, for simplicity, the inflection point for the spliced matrix will be used. Construct a sf
and uf
sparceMatrix
without empty droplets.
rm(bc_uns)
rownames(spliced) <- str_replace(rownames(spliced), "(.*)\\..*", "\\1")
rownames(unspliced) <- str_replace(rownames(unspliced), "(.*)\\..*", "\\1")
bcs_use <- colnames(spliced)[
nexprs(spliced, byrow = F) > metadata(bc_rank)$inflection
]
# Remove genes that aren’t detected
genes_use <- rownames(spliced)[
nexprs(spliced, byrow = T) > 0
]
sce <- SingleCellExperiment(
assays = list(
spliced = spliced[genes_use, bcs_use],
unspliced = unspliced[genes_use, bcs_use]
)
)
if (file.exists("results/neuron10k_velocity/sce_raw.Rdata")) {
load(file = "results/neuron10k_velocity/sce_raw.Rdata",
verbose = T)
} else {
save(sce, file = "results/neuron10k_velocity/sce_raw.Rdata")
}
## Loading objects:
## sce
rm(bcs_use, genes_use, spliced, unspliced, bc_rank)
We are going to normalize for library effect with SCTransform
.
colData(sce)$umi_spliced <- colSums(assays(sce)$spliced)
colData(sce)$log_umi_spliced <- log10(colData(sce)$umi_spliced)
spliced_norm <- sctransform::vst(
assays(sce)$spliced,
cell_attr = colData(sce),
latent_var = c("log_umi_spliced"),
return_gene_attr = T,
return_cell_attr = T,
show_progress = F) %>%
correct() %>%
Matrix(sparse = T)
##
|
| | 0%
|
|== | 3%
|
|==== | 5%
|
|====== | 8%
|
|======== | 11%
|
|========= | 14%
|
|=========== | 16%
|
|============= | 19%
|
|=============== | 22%
|
|================= | 24%
|
|=================== | 27%
|
|===================== | 30%
|
|======================= | 32%
|
|========================= | 35%
|
|========================== | 38%
|
|============================ | 41%
|
|============================== | 43%
|
|================================ | 46%
|
|================================== | 49%
|
|==================================== | 51%
|
|====================================== | 54%
|
|======================================== | 57%
|
|========================================== | 59%
|
|============================================ | 62%
|
|============================================= | 65%
|
|=============================================== | 68%
|
|================================================= | 70%
|
|=================================================== | 73%
|
|===================================================== | 76%
|
|======================================================= | 78%
|
|========================================================= | 81%
|
|=========================================================== | 84%
|
|============================================================= | 86%
|
|============================================================== | 89%
|
|================================================================ | 92%
|
|================================================================== | 95%
|
|==================================================================== | 97%
|
|======================================================================| 100%
We then normalize and add the unspliced data to our sce
object.
colData(sce)$umi_unspliced <- colSums(assays(sce)$unspliced)
colData(sce)$log_umi_unspliced <- log10(colData(sce)$umi_unspliced)
unspliced_norm <- sctransform::vst(
assays(sce)$unspliced,
cell_attr = colData(sce),
latent_var = c("log_umi_spliced"),
return_gene_attr = T,
return_cell_attr = T,
show_progress = F) %>%
correct() %>%
Matrix(sparse = T)
##
|
| | 0%
|
|== | 3%
|
|==== | 6%
|
|====== | 9%
|
|======== | 12%
|
|========== | 15%
|
|============ | 18%
|
|============== | 21%
|
|================ | 24%
|
|=================== | 26%
|
|===================== | 29%
|
|======================= | 32%
|
|========================= | 35%
|
|=========================== | 38%
|
|============================= | 41%
|
|=============================== | 44%
|
|================================= | 47%
|
|=================================== | 50%
|
|===================================== | 53%
|
|======================================= | 56%
|
|========================================= | 59%
|
|=========================================== | 62%
|
|============================================= | 65%
|
|=============================================== | 68%
|
|================================================= | 71%
|
|=================================================== | 74%
|
|====================================================== | 76%
|
|======================================================== | 79%
|
|========================================================== | 82%
|
|============================================================ | 85%
|
|============================================================== | 88%
|
|================================================================ | 91%
|
|================================================================== | 94%
|
|==================================================================== | 97%
|
|======================================================================| 100%
u_genes <- intersect(rownames(spliced_norm), rownames(unspliced_norm))
spliced_norm[rownames(spliced_norm) %in% u_genes, ] %>% dim()
## [1] 14539 9971
unspliced_norm[rownames(unspliced_norm) %in% u_genes, ] %>% dim()
## [1] 14539 9971
sce_norm <- SingleCellExperiment(
assays = list(
spliced = spliced_norm[rownames(spliced_norm) %in% u_genes, ] %>%
Matrix(sparse = T),
unspliced = unspliced_norm[rownames(unspliced_norm) %in% u_genes, ] %>%
Matrix(sparse = T)
)
)
We can save our sce
object to use for future crash
if(file.exists("results/neuron10k_velocity/sce_norm.Rdata")){
load(file = "results/neuron10k_velocity/sce_norm.Rdata",
verbose = T)
}else{
save(sce_norm, file = "results/neuron10k_velocity/sce_norm.Rdata")
}
## Loading objects:
## sce_norm
rm(sce, spliced_norm, unspliced_norm, u_genes)
With most droplets experiments, we don’t have additional information on the cells (like surface protein markers). Instead of using surface markers, SingleR
uses bulk RNA-seq data of isolated known cell types as a reference to annotate cell types in scRNA-seq datasets. The reference uses Ensembl IDs without version number.
mouse_rnaseq <- MouseRNAseqData(ensembl = TRUE)
sce <- sce_norm[rownames(sce_norm) %in% rownames(mouse_rnaseq), ]
annots <- SingleR(
assays(sce)$spliced,
ref = mouse_rnaseq,
labels = colData(mouse_rnaseq)$label.fine,
de.method = "wilcox",
method = "single"
)
rm(mouse_rnaseq)
For our RNA velocity analyze, we are mostly interested in neural or glial lineages. For clarity, we filter out cells that are not from these two lineages.
inds <- annots$pruned.labels %in% c(
"NPCs", "Neurons", "OPCs", "Oligodendrocytes",
"qNSCs", "aNSCs", "Astrocytes", "Ependymal"
)
colData(sce)$cell_type <- annots$pruned.labels
sce <- sce[, inds]
Contrary to the first part of the practical we are going to work with Seurat
objects instead of SingleCellExperiment
objects, from now on. Seurat
is an R package which contains a large set of tools for scRNASeq analysis. Of course, it also has its own way of storing single-cell data (with the CreateSeuratObject()
function). The functionalities of a SeuratObject
are roughly the same as the ones of an SingleCellExperiment
.
sce <- logNormCounts(sce, exprs_values = "spliced")
seu <- as.Seurat(sce, counts = "spliced", assay = "spliced")
seu[["unspliced"]] <- CreateAssayObject(assays(sce)$unspliced)
DefaultAssay(seu) <- "spliced"
We can save the seu
object.
if(file.exists("results/neuron10k_velocity/seu_annot.Rdata")){
load(file = "results/neuron10k_velocity/seu_annot.Rdata",
verbose = T)
}else{
save(seu, file = "results/neuron10k_velocity/seu_annot.Rdata")
}
## Loading objects:
## seu
rm(sce, sce_norm)
To speedup the computation and focus on interesting genes, we are going to work only with the highly variable genes. We are going to use the Seurat
procedure described here. You can use the FindVariableFeatures()
function to identify highly variable genes.
seu <- FindVariableFeatures(seu, selection.method = "vst", nfeatures = 2000)
# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(seu), 10)
# plot variable features with and without labels
plot1 <- VariableFeaturePlot(seu)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot2
By default, we select 2,000 genes per dataset. These will be used in downstream analysis, like PCA.
RNA velocity is a cell displacement vector of size the number of genes. Like for gene expression data, we want to visualize these data in a lower dimension space.
When visualizing RNA velocity on reduced dimensions, should the cell embeddings be from the spliced matrix or the unspliced matrix or the sum of both? In my opinion, it makes the most sense to plot RNA velocity over cell embeddings from the spliced matrix. The arrows in RNA velocity visualization stand for where the cell is predicted to be going in the near future. Where does the cell go from? The current state. And the current state is represented by the spliced matrix, while the unspliced matrix represents what is soon to come. Thus all the dimension reduction here will be computed from the spliced matrix.
We can use the DefaultAssay()
function to define the assay we will be working on in Seurat
. The RunPCA()
and ElbowPlot()
function performe an SVD decomposition and plot the variance explained by the first axis.
seu <- ScaleData(seu)
seu <- RunPCA(seu, verbose = FALSE, npcs = 70)
ElbowPlot(seu, ndims = 70)
In Seurat
when the dimension reduction is computed you can plot the results, with the function DimPlot()
.
DimPlot(
seu, reduction = "pca", group.by = "cell_type",
pt.size = 0.5, label = TRUE, repel = TRUE
) +
scale_color_brewer(type = "qual", palette = "Set2")
The DimHeatmap()
function allows for easy exploration of the primary sources of heterogeneity in a dataset, and can be useful when trying to decide which PCs to include for further downstream analyses. Both cells and features are ordered according to their PCA scores. Setting cells
to a number plots the ‘extreme’ cells on both ends of the spectrum, which dramatically speeds plotting for large datasets. Though clearly a supervised analysis, we find this to be a valuable tool for exploring correlated feature sets.
DimHeatmap(seu, dims = 1, cells = 500, balanced = TRUE)
We can also use non-linear dimension reduction technique like \(t\)-SNE and UMAP in Seurat
. Try the RunTSNE()
and RunUMAP()
function.
seu <- RunTSNE(seu, dims = 1:50, verbose = FALSE)
DimPlot(seu, reduction = "tsne",
group.by = "cell_type", pt.size = 0.5, label = TRUE, repel = TRUE) +
scale_color_brewer(type = "qual", palette = "Set2")
seu <- RunUMAP(seu, dims = 1:50, umap.method = "uwot")
DimPlot(seu, reduction = "umap",
group.by = "cell_type", pt.size = 0.5, label = TRUE, repel = TRUE) +
scale_color_brewer(type = "qual", palette = "Set2")
As expected, one end of the plot has mostly stem cells, and the other end has mostly neurons. Clustering should petition the big blob of NPCs that SingleR
could not further partition due to limitations in the SingleR
reference for mouse brains. Use the FindNeighbors()
and FindClusters()
function for the clustering.
seu <- FindNeighbors(seu, verbose = FALSE) %>%
FindClusters(resolution = 1, verbose = FALSE) # Louvain
DimPlot(seu, pt.size = 0.5, reduction = "umap", label = TRUE)
Instead of using the old R package cytoVelo
we are going to use the recent scvelo
tools which perform a better estimatation of the transcription and degradation rate. The one small problem we will face is that scvelo
is only available as a python package.
scvelo
can read data with in the form of AnnotData
class from .loom file. The LoomExperiment
R package makes the SingleCellExperiment
to loom
conversion possible. You can also use the as.SingleCellExperiment()
to convert a SeuratObject
into a SingleCellExperiment
.
if(file.exists("results/neuron10k_velocity/sce_annot.Rdata")){
load(file = "results/neuron10k_velocity/sce_annot.Rdata",
verbose = T)
}else{
save(sce, file = "results/neuron10k_velocity/sce_annot.Rdata")
}
## Loading objects:
## sce
load(file = "results/neuron10k_velocity/sce_annot.Rdata", verbose = T)
## Loading objects:
## sce
sce_clean <- SingleCellExperiment(
assays = list(
logcounts = assay(sce, "logcounts"),
spliced = assay(sce, "spliced"),
unspliced = assay(sce, "unspliced")
),
colData = colData(sce),
rowData = rowData(sce)
)
scle <- LoomExperiment::LoomExperiment(sce_clean)
file.remove("results/neuron10k_velocity/sce_annot.loom")
## [1] TRUE
LoomExperiment::export(scle, "results/neuron10k_velocity/sce_annot.loom")
Then we can launch a python3
console within our singularity container and load scvelo
and the data: We can check that the adata
variable contains the spliced
and unspliced
assay.
import scvelo as scv
adata = scv.read("results/neuron10k_velocity/sce_annot.loom")
adata.col_names = adata.obs["colnames"]
adata.var_names = adata.var["rownames"]
adata
## AnnData object with n_obs × n_vars = 9793 × 11948
## obs: 'cell_type', 'colnames', 'sizeFactor'
## var: 'rownames'
## obsm: 'colnames_factor'
## layers: 'matrix', 'spliced', 'unspliced'
scv.utils.show_proportions(adata)
## Abundance of ['spliced', 'unspliced']: [0.54 0.46]
With scVelo
Preprocessing that is necessary consists of :
Filtering and normalization is applied in the same vein to spliced/unspliced counts and X. Logarithmizing is only applied to X. If X is already preprocessed from former analysis, it won’t touch it.
scv.pp.filter_genes(adata, min_shared_counts=10)
## Filtered out 2214 genes that are detected 10 counts (shared).
scv.pp.normalize_per_cell(adata)
## Normalized count data: X, spliced, unspliced.
scv.pp.filter_genes_dispersion(adata, n_top_genes=3000)
## Extracted 3000 highly variable genes.
scv.pp.log1p(adata)
Further, we need the first and second order moments (basically mean and uncentered variance) computed among nearest neighbors in PCA space. First order is needed for deterministic velocity estimation, while stochastic estimation also requires second order moments.
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
## computing neighbors
## finished (0:00:18) --> added
## 'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
## computing moments based on connectivities
## finished (0:00:03) --> added
## 'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
The gene-specific velocities are obtained by fitting a ratio between precursor (unspliced) and mature (spliced) mRNA abundances that well explains the steady states (constant transcriptional state) and then computing how the observed abundances deviate from what is expected in steady state. (We will soon release a version that does not rely on the steady state assumption anymore).
scv.tl.velocity(adata)
## computing velocities
## finished (0:00:07) --> added
## 'velocity', velocity vectors for each individual cell (adata.layers)
This computes the (cosine) correlation of potential cell transitions with the velocity vector in high dimensional space. The resulting velocity graph has dimension 𝑛𝑜𝑏𝑠×𝑛𝑜𝑏𝑠
and summarizes the possible cell state changes (given by a transition from one cell to another) that are well explained through the velocity vectors. If you set `approx=Tru`` it is computed on a reduced PCA space with 50 components.
The velocity graph can then be converted to a transition matrix by applying a Gaussian kernel on the cosine correlation which assigns high probabilities to cell state changes that correlate with the velocity vector. You can access the Markov transition matrix via scv.tl.transition_matrix
. The resulting transition matrix can be used for a variety of applications shown hereinafter. For instance, it is used to place the velocities into a low-dimensional embedding by simply applying the mean transition with respect to the transition probabilities, i.e. scv.tl.velocity_embeddin
. Further, we can trace cells back along the Markov chain to their origins and potential fates, thus obtaining root cells and end points within a trajectory; via scv.tl.terminal_states
scv.tl.velocity_graph(adata)
## computing velocity graph
##
... 4%
... 8%
... 12%
... 17%
... 21%
... 25%
... 29%
... 33%
... 37%
... 40%
... 43%
... 47%
... 50%
... 53%
... 57%
... 60%
... 64%
... 67%
... 70%
... 74%
... 77%
... 80%
... 83%
... 87%
... 90%
... 94%
... 97%
... 100%
finished (0:01:23) --> added
## 'velocity_graph', sparse matrix with cosine correlations (adata.uns)
scv.pl.velocity_graph(adata, color = "cell_type")
Finally, the velocities are projected onto any embedding specified in basis and visualized in one of three available ways: on single cell level, on grid level, or as streamplot as shown here.
For this we are going to use an umap
projection
scv.tl.umap(adata)
scv.pl.velocity_embedding_stream(adata, basis = "umap", color = "cell_type")
## computing velocity embedding
## finished (0:00:03) --> added
## 'velocity_umap', embedded velocity vectors (adata.obsm)
scv.pl.velocity_embedding(adata, basis='umap', arrow_length=2, arrow_size=2, dpi=150)
scv.pl.velocity_embedding_grid(adata, color='cell_type',
layer=['velocity', 'spliced'], arrow_size=1.5)
scv.tl.rank_velocity_genes(adata, groupby='cell_type')
## ranking velocity genes
## finished (0:00:07) --> added
## 'rank_velocity_genes', sorted scores by group ids (adata.uns)
## 'spearmans_score', spearmans correlation scores (adata.var)
scv.DataFrame(adata.uns['rank_velocity_genes']['names']).head()
## Astrocytes Ependymal NPCs Neurons OPCs Oligodendrocytes aNSCs qNSCs
## 0 ENSMUSG00000055737 ENSMUSG00000029212 ENSMUSG00000069769 ENSMUSG00000004113 ENSMUSG00000028634 ENSMUSG00000054640 ENSMUSG00000042751 ENSMUSG00000026235
## 1 ENSMUSG00000018849 ENSMUSG00000024812 ENSMUSG00000027210 ENSMUSG00000056222 ENSMUSG00000027238 ENSMUSG00000060961 ENSMUSG00000025666 ENSMUSG00000062991
## 2 ENSMUSG00000056870 ENSMUSG00000044167 ENSMUSG00000028565 ENSMUSG00000055022 ENSMUSG00000061080 ENSMUSG00000020181 ENSMUSG00000027238 ENSMUSG00000072294
## 3 ENSMUSG00000060924 ENSMUSG00000022708 ENSMUSG00000027993 ENSMUSG00000021068 ENSMUSG00000044117 ENSMUSG00000045038 ENSMUSG00000038048 ENSMUSG00000008575
## 4 ENSMUSG00000028403 ENSMUSG00000052942 ENSMUSG00000066406 ENSMUSG00000021268 ENSMUSG00000052852 ENSMUSG00000032826 ENSMUSG00000003279 ENSMUSG00000055761
var_names = ["ENSMUSG00000055737", "ENSMUSG00000004113"]
scv.pl.velocity(adata, var_names=var_names, colorbar=True, ncols=1)
If you want to learn more about scRNASeq analysis try to keep up with the bibliography on the subject !