The data consists in 20K Neurons, downsampled from 1.3 Million Brain Cells from E18 Mice and is freely available from 10x Genomics (here).
In [1]:
import numpy as np
import pandas as pd
import scanpy.api as sc
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=70) # dots (pixels) per inch determine size of inline figures
sc.logging.print_versions()
In [2]:
adata = sc.read_10x_h5('./data/1M_neurons_neuron20k.h5')
In [3]:
adata.var_names_make_unique()
In [4]:
adata
Out[4]:
Run standard preprocessing steps, see here.
In [5]:
sc.pp.recipe_zheng17(adata)
In [6]:
sc.tl.pca(adata)
In [7]:
sc.pp.neighbors(adata)
In [8]:
sc.tl.umap(adata)
In [9]:
sc.tl.louvain(adata)
In [10]:
sc.tl.paga(adata)
In [19]:
sc.pl.paga_compare(adata, edges=True, threshold=0.05)
Now compare this with the reference clustering of PAGA preprint, Suppl. Fig. 12, available from here.
In [12]:
anno = pd.read_csv('/Users/alexwolf/Dropbox/1M/louvain.csv.gz', compression='gzip', header=None, index_col=0)
In [13]:
anno.columns = ['louvain_ref']
In [14]:
adata.obs['louvain_ref'] = anno.loc[adata.obs.index]['louvain_ref'].astype(str)
In [15]:
sc.pl.umap(adata, color=['louvain_ref'], legend_loc='on data')
In [16]:
adata.write('./write/subsampled.h5ad')