The official implementation for "SANNO".
We provide preprocessed datasets for easy reproduction.
Download datasets from: Dataset Link
To use {Project Name}, follow these steps:
- Create a conda environment:
conda create -n {SANNO} python=3.7 conda activate {SANNO} - Install dependencies:
pip install -r requirements.txt # or just pip install SANNO pip install SANNO - Install PYG and Pytorch according to the CUDA version, take torch-1.13.1+cu117 (Ubuntu 20.04.4 LTS) as an example:
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia pip install torch_geometric==2.3.0 # must be this version pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
In order to run SANNO, we need to first create anndata from the raw data.
We require two types of datasets for this project: reference data and query data. Both datasets should be provided in .h5ad format, with cells stored in obs and genes/features stored in var.
- Format:
.h5ad - Content:
obs: Cell metadata, including a mandatorycell_typecolumn indicating the true cell type labels.var: Gene/feature metadata.obsm: Spatial coordinates stored under the keypos, representing the relative positions of cells as a 2D numpy array (n_cells x 2).
- Format:
.h5ad - Content:
obs: Cell metadata (cell type labels are not required).var: Gene/feature metadata.obsm: Spatial coordinates stored under the keypos, representing the relative positions of cells as a 2D numpy array (n_cells x 2).
The processed data are used as input to SANNO and a reference genome is provided to extract the embedding and anootation incorporating reference Spatial Transcriptomics information:
cd SANNO/SANNO
python main_xy_adj.py --gpu_index 3 # GPU index
--type st2st \ # project type
--dataset Project name \ # project name
--train_dataset path/to/train_adata.h5ad \ # reference data
--test_dataset path/to/test_adata.h5ad \ # query data
--log log \ # log path
The project type must be selected based on the nature of the reference and query datasets. The following modes are supported:
st2st– For cases where both the reference and query datasets are spatial transcriptomics.st2sc– For cases where the reference dataset is spatial transcriptomics, and the query dataset is single-cell transcriptomics.sc2sc– For cases where both the reference and query datasets are single-cell transcriptomics.
Running the above command will generate three output files in the output path:
acc.csv: Contains the overall accuracy of the query data and SANNO predictions.embedding.h5ad: An AnnData file storing the embeddings extracted by SANNO.Reports: A set of logs recorded during the training process.
- Install the required environment according to Installation.
- Download the datasets from HubMap CL.
- Preprocess the datasets according to the Data Preprocessing standards.
- For more detailed information, run the tutorial HubMap_CL_intra.ipynb for how to do data preprocessing and training.
- Install the required environment according to Installation.
- Download the datasets from Tonsil_BE.
- Preprocess the datasets according to the Data Preprocessing standards.
- For more detailed information, run the tutorial HubMap_CL_intra.ipynb for how to do data preprocessing and training.
If you use SANNO in your research, please cite:
@article{
title={{SANNO: A Graph-Transformer Enhanced Optimal Transport Tool for Spatial Transcriptomic Annotation }},
author={Yuansong Zeng, Yuanze Chen, Ningyuan Shangguan, Wenbin Li, Hongyu Zhang, Zheng Wang, Huiying Zhao}
}