Skip to content

An automated cell type annotation algorithm for unmatched spatial transcriptomics data

Notifications You must be signed in to change notification settings

BillyChen123/SANNO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SANNO

The official implementation for "SANNO".

Table of Contents


Datasets

We provide preprocessed datasets for easy reproduction.

Download datasets from: Dataset Link


Installation

To use {Project Name}, follow these steps:

  1. Create a conda environment:
    conda create -n {SANNO} python=3.7
    conda activate {SANNO}
    
  2. Install dependencies:
    pip install -r requirements.txt
    # or just pip install SANNO
    pip install SANNO
    
  3. Install PYG and Pytorch according to the CUDA version, take torch-1.13.1+cu117 (Ubuntu 20.04.4 LTS) as an example:
    conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
    pip install torch_geometric==2.3.0 # must be this version
    pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
    

Usage

Data Preprocessing

In order to run SANNO, we need to first create anndata from the raw data.

We require two types of datasets for this project: reference data and query data. Both datasets should be provided in .h5ad format, with cells stored in obs and genes/features stored in var.

Reference Data

  • Format: .h5ad
  • Content:
    • obs: Cell metadata, including a mandatory cell_type column indicating the true cell type labels.
    • var: Gene/feature metadata.
    • obsm: Spatial coordinates stored under the key pos, representing the relative positions of cells as a 2D numpy array (n_cells x 2).

Query Data

  • Format: .h5ad
  • Content:
    • obs: Cell metadata (cell type labels are not required).
    • var: Gene/feature metadata.
    • obsm: Spatial coordinates stored under the key pos, representing the relative positions of cells as a 2D numpy array (n_cells x 2).

Cell Type Annotation

The processed data are used as input to SANNO and a reference genome is provided to extract the embedding and anootation incorporating reference Spatial Transcriptomics information:

cd SANNO/SANNO

python main_xy_adj.py   --gpu_index 3 # GPU index
                        --type st2st \ # project type
                        --dataset Project name \ # project name
                        --train_dataset path/to/train_adata.h5ad \ # reference data
                        --test_dataset path/to/test_adata.h5ad \ # query data
                        --log log \ # log path

The project type must be selected based on the nature of the reference and query datasets. The following modes are supported:

  • st2st – For cases where both the reference and query datasets are spatial transcriptomics.
  • st2sc – For cases where the reference dataset is spatial transcriptomics, and the query dataset is single-cell transcriptomics.
  • sc2sc – For cases where both the reference and query datasets are single-cell transcriptomics.

Running the above command will generate three output files in the output path:

  • acc.csv: Contains the overall accuracy of the query data and SANNO predictions.
  • embedding.h5ad: An AnnData file storing the embeddings extracted by SANNO.
  • Reports: A set of logs recorded during the training process.

Tutorial

Tutorial 1: Cell annotations within samples (HubMap CL A & HubMap CL B)

  1. Install the required environment according to Installation.
  2. Download the datasets from HubMap CL.
  3. Preprocess the datasets according to the Data Preprocessing standards.
  4. For more detailed information, run the tutorial HubMap_CL_intra.ipynb for how to do data preprocessing and training.

Tutorial 1: Cell annotations cross samples (Tonsil & BE)

  1. Install the required environment according to Installation.
  2. Download the datasets from Tonsil_BE.
  3. Preprocess the datasets according to the Data Preprocessing standards.
  4. For more detailed information, run the tutorial HubMap_CL_intra.ipynb for how to do data preprocessing and training.

Citation

If you use SANNO in your research, please cite:

@article{
  title={{SANNO: A Graph-Transformer Enhanced Optimal Transport Tool for Spatial Transcriptomic Annotation }},
  author={Yuansong Zeng, Yuanze Chen, Ningyuan Shangguan, Wenbin Li, Hongyu Zhang, Zheng Wang, Huiying Zhao}
}

About

An automated cell type annotation algorithm for unmatched spatial transcriptomics data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published