How to convert between Seurat/SingleCellExperiment object and Scanpy object/AnnData using basic packages
- From Scanpy object to Seurat object
The basic idea is saving to and reading from the disk.
Firstly, we need to write the data in Python into the disk:
from scipy import iosave_dir='/home/daimin/result/single-cell/hcm/analysis'io.mmwrite(save_dir+'/counts.mtx',adata.layers['raw'])
Save the meta data for both cells and genes:
cell_meta=adata.obs.copy()cell_meta['Barcode']=cell_meta.index
cell_meta['UMAP1']=adata.obsm['X_umap'][:,0]
cell_meta['UMAP2']=adata.obsm['X_umap'][:,1]
gene_meta=adata.var.copy()gene_meta['GeneName']=gene_meta.indexcell_meta.to_csv(save_dir+'/counts_cellMeta.csv',index=None)
gene_meta.to_csv(save_dir+'/counts_geneMeta.csv',index=None)
Next, we load the data from the disk into the R environment:
library(Matrix)
save_dir<-'/home/daimin/result/single-cell/IN/Satb2_ko_CGE'counts<-readMM(paste0(save_dir,'/counts.mtx'))dim(counts)cellMeta<-read.csv(paste0(save_dir,'/counts_cellMeta.csv'))
head(cellMeta)
geneMeta<-read.csv(paste0(save_dir,'/counts_geneMeta.csv'))
dim(geneMeta)
head(geneMeta)### Set the rownames and colnames
rownames(counts)<-cellMeta$Barcode
colnames(counts)<-geneMeta$GeneName
Then, for example, you can construct a Seurat object:
seo <- CreateSeuratObject(counts = t(counts), project = "min", min.cells = 3, min.features = 200)
2. How to load the sparse matrix in R into Python
For example, your data looks like this:
4 x 4 sparse Matrix of class "dgCMatrix"
SM-GE4R1_S041_E1-50 SM-GE4R1_S042_E1-50 SM-GE4R1_S043_E1-50 SM-GE4R1_S044_E1-50
0610005C13Rik . . . .
0610006L08Rik . . . .
0610007P14Rik 137.8368 138.3161 143.01993 70.96005
0610009B22Rik 109.6429 124.8218 73.06453 11.82667
and now save the data into the disk using Matrix’s wirteMM function:
writeMM(cpm,'/home/daimin/data/single_cell/mouse_brain_vmh/SMART-seq_VMH_cpm.mtx')
The saved mtx file looks like this:
%%MatrixMarket matrix coordinate real general30862 4574 412346323 1 137.83683877475584 1 109.642939934464895 1 9.397966280096996 1 6.2653108533979917 1 37.591865120387959 1 46.9898314004849410 1 46.9898314004849411 1 126.8725447813093
Then, save the barcodes:
barcodes<-data.frame(colnames(cpm))
colnames(barcodes)<-'Barcode'
write.csv(barcodes, '/home/daimin/data/single_cell/mouse_brain_vmh/SMART-seq_VMH_cpm_barcodes.csv',
quote = FALSE,row.names = FALSE)
and the data looks like this:
BarcodeSM-GE4R1_S041_E1-50SM-GE4R1_S042_E1-50SM-GE4R1_S043_E1-50SM-GE4R1_S044_E1-50SM-GE4R1_S045_E1-50SM-GE4R1_S046_E1-50SM-GE4R1_S047_E1-50SM-GE4R1_S048_E1-50SM-GE4R1_S049_E1-50
Save the gene names:
genes<-data.frame(rownames(cpm))
colnames(genes)<-'Gene'
write.csv(genes, '/home/daimin/data/single_cell/mouse_brain_vmh/SMART-seq_VMH_cpm_genes.csv',
quote = FALSE,row.names = FALSE)
and the file looks like:
Gene0610005C13Rik0610006L08Rik0610007P14Rik0610009B22Rik0610009E02Rik0610009L18Rik0610009O20Rik0610010B08Rik0610010F05Rik0610010K14Rik0610011F06Rik0610012G03Rik0610025J13Rik0610030E20Rik
Next, we load the data in Python:
import scanpy as sc
import pandas as pd
from scipy import io
counts=io.mmread('/home/daimin/data/single_cell/mouse_brain_vmh/SMART-seq_VMH_cpm.mtx')
from scipy import sparse
counts=sparse.csr_matrix(counts)
barcodes=pd.read_csv('/home/daimin/data/single_cell/mouse_brain_vmh/SMART-seq_VMH_cpm_barcodes.csv')
genes=pd.read_csv('/home/daimin/data/single_cell/mouse_brain_vmh/SMART-seq_VMH_cpm_genes.csv')
adata=sc.AnnData(counts.T)
adata.obs_names=barcodes['Barcode'].values
adata.var_names=genes['Gene'].values
and now check the adata:
adata
# AnnData object with n_obs × n_vars = 4574 × 30862