Skip to content

Commit a7fc6a0

Browse files
authored
Add new method: CellMapper (#31)
* Add basic CellMapper implementation * Add more parameters and hvg computation * Add cellmapper to workflow files (clean) * Update Changelog * Add normalization method choice * Fix typos and improve description * Add me as contributor
1 parent 01e02de commit a7fc6a0

6 files changed

Lines changed: 144 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# denoising devel
22

3+
* Add new method: CellMapper (PR #31)
34
* Update scPRINT, including a new default model (PR #26)
45

56
# denoising v1.0.0

_viash.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,11 @@ authors:
7575
info:
7676
github: jkobject
7777
orcid: "0000-0002-2818-9728"
78+
- name: Marius Lange
79+
roles: [ contributor ]
80+
info:
81+
github: marius1311
82+
orcid: 0000-0002-4846-1266
7883
repositories:
7984
- name: core
8085
type: github
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
__merge__: ../../api/comp_method.yaml
2+
name: cellmapper
3+
label: CellMapper
4+
summary: "CellMapper is a general framework for k-NN based mapping tasks in single-cell and spatial genomics"
5+
description: |
6+
k-NN-based mapping of cells across representations to transfer labels, embeddings and expression values.
7+
Works for millions of cells, on CPU and GPU, across molecular modalities, between spatial and non-spatial data,
8+
for arbitrary query and reference datasets. We treat data denoising as self-mapping, where the query and reference datasets are the same.
9+
Based on some joint representation (here: PCA), CellMapper computes a k-NN graph and applies a kernel function to compute edge weights. Here,
10+
we use a kernel based on Jaccard similarity (as e.g. in HNOCA-tools) or UMAP's `fuzzy_simplicial_set` (as in scanpy). Given the row-normalized
11+
weighted adjacency matrix, we simulate a t-step random walk to smooth the data, similar to MAGIC. For large t-values, we also provide a spectral
12+
approximation of the application of the t-step transition matrix to the data (not used here).
13+
14+
references:
15+
doi:
16+
- 10.5281/zenodo.15683594
17+
links:
18+
documentation: https://cellmapper.readthedocs.io/en/latest/
19+
repository: https://github.com/quadbio/cellmapper
20+
info:
21+
preferred_normalization: counts
22+
variants:
23+
cellmapper_jaccard_log:
24+
kernel_method: jaccard
25+
norm: log
26+
cellmapper_umap_log:
27+
kernel_method: umap
28+
norm: log
29+
cellmapper_umap_sqrt:
30+
kernel_method: umap
31+
norm: sqrt
32+
arguments:
33+
- name: "--kernel_method"
34+
type: "string"
35+
choices: ["jaccard", "umap"]
36+
default: "umap"
37+
description: Kernel function to compute k-NN edge weights.
38+
- name: "--t"
39+
type: "integer"
40+
default: 3
41+
description: Number of steps for data smoothing.
42+
- name: "--norm"
43+
type: string
44+
choices: ["sqrt", "log"]
45+
default: "log"
46+
description: Normalization method
47+
resources:
48+
- type: python_script
49+
path: script.py
50+
engines:
51+
- type: docker
52+
image: openproblems/base_python:1
53+
setup:
54+
- type: python
55+
packages:
56+
- cellmapper>=0.2.1
57+
runners:
58+
- type: executable
59+
- type: nextflow
60+
directives:
61+
label: [midtime,midmem,midcpu]

src/methods/cellmapper/script.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
import anndata as ad
2+
import numpy as np
3+
import scanpy as sc
4+
from scipy.sparse import issparse
5+
import cellmapper as cm
6+
7+
## VIASH START
8+
# Note: this section is auto-generated by viash at runtime. To edit it, make changes
9+
# in config.vsh.yaml and then run `viash config inject config.vsh.yaml`.
10+
par = {
11+
'input_train': 'resources_test/task_denoising/cxg_immune_cell_atlas/train.h5ad',
12+
'output': 'output_cellmapper.h5ad',
13+
'kernel_method': 'umap',
14+
'norm': 'log',
15+
't': 3
16+
}
17+
meta = {
18+
'name': 'cellmapper'
19+
}
20+
## VIASH END
21+
22+
print(f'CellMapper version: {cm.__version__}', flush=True)
23+
24+
print('Reading input files', flush=True)
25+
input_train = ad.read_h5ad(par['input_train'])
26+
27+
print('Prepare the AnnData object', flush=True)
28+
29+
# Let's make sure we have counts in .X
30+
input_train.X = input_train.layers["counts"].copy()
31+
32+
print('Preprocess the data', flush=True)
33+
sc.pp.normalize_total(input_train, target_sum=1e4)
34+
35+
if par['norm'] == 'sqrt':
36+
# Safe square root for both sparse and dense matrices
37+
if issparse(input_train.X):
38+
input_train.X.data = np.sqrt(input_train.X.data)
39+
else:
40+
input_train.X = np.sqrt(input_train.X)
41+
elif par['norm'] == 'log':
42+
sc.pp.log1p(input_train)
43+
else:
44+
raise ValueError(f"Unknown normalization method: {par['norm']}")
45+
46+
sc.pp.highly_variable_genes(input_train)
47+
sc.pp.pca(input_train)
48+
49+
print('Setup and prepare CellMapper', flush=True)
50+
51+
# Initialize CellMapper with the AnnData object, compute k-NN graph and mapping matrix
52+
cmap = cm.CellMapper(input_train)
53+
cmap.compute_neighbors(use_rep="X_pca")
54+
cmap.compute_mapping_matrix(kernel_method=par['kernel_method'])
55+
56+
print('Run data smoothing', flush=True)
57+
58+
# run t-step smoothing and write back to input
59+
cmap.map_layers(key="counts", t=par['t'])
60+
61+
print("Write output AnnData to file", flush=True)
62+
63+
# Create output AnnData object without X to avoid encoding issues
64+
output = ad.AnnData(
65+
obs=input_train.obs[[]],
66+
var=input_train.var[[]],
67+
uns={
68+
"dataset_id": input_train.uns["dataset_id"],
69+
"method_id": meta["name"]
70+
}
71+
)
72+
# Set the denoised layer directly from the imputed data
73+
output.layers["denoised"] = cmap.query_imputed.X
74+
75+
output.write_h5ad(par['output'], compression='gzip')

src/workflows/run_benchmark/config.vsh.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ dependencies:
7272
- name: control_methods/no_denoising
7373
- name: control_methods/perfect_denoising
7474
- name: methods/alra
75+
- name: methods/cellmapper
7576
- name: methods/dca
7677
- name: methods/knn_smoothing
7778
- name: methods/magic

src/workflows/run_benchmark/main.nf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ workflow run_wf {
1818
no_denoising,
1919
perfect_denoising,
2020
alra,
21+
cellmapper,
2122
dca,
2223
knn_smoothing,
2324
magic,

0 commit comments

Comments
 (0)