annotation

This routine simplifies the implementation of the CellTypist automated annotation method. The process is written to allow for both stand-alone execution or integration as a module with the coreSC Seurat workflow.

While CellTypist can take both generic count matrices (Seurat) and AnnData objects, this approach has been written exclusively for Seurat input files. Since raw counts are used as inputs, it is not required that a fully processed object be used as input. Still, it is advised that any data set should be carried through a complete pre-processing workflow regardless for the sake of comprehensive validation.

Workflow

read RDS input object
extract raw counts as a sparse matrix
convert sparse matrix to full counts matrix
run CellTypist with majority voting
assign predicted_labels.csv values as metadata of input object
run FindAllMarkers() using predicted_labels as Idents()

Setup

Dependencies

This method uses the Singularity container execution framework.

First Time

# clone the repo
git clone https://github.com/ChoBioLab/annotation.git

Config

Select a trained model [REQUIRED]
- https://www.celltypist.org/models (preloaded) | https://www.celltypist.org/organs (available as custom models)
  - Gut: Cells_Intestinal_Tract, Adult_Human_Intestine (combined)
  - Immune: Immune_All_High, Immune_All_Low
  - Liver: Adult_Human_Liver (combined)
- Determine the appropriate model for classification. Selecting a model that is not the correct fit will generate an annotation that is effectively worthless!
- The appropriate model:
  - Needs to have been trained using CellTypist's classification method
  - Will be a close match to your query data on a level of cell profile-likeness (e.g. organ system, disease state, etc)
  - Should have been developed with a comprehensive, diverse, well-annotated training set
- Training custom models is an option but should only be undertaken with a complete knowledge of all factors involved.
Confirm parallel memory use with future [OPTIONAL]
- The -w WORKERS and -r RAM args give number of threads, and RAM/thread. Each individual task needs an adequate threshold of RAM to complete its work. Also WORKERS * RAM gives the total memory allocation. This should live under the available system RAM for the job as a whole. If either of these considerations are not met, the run will fail!
- The default values are 8 * 8 (8 workers and 8GB RAM/worker).

Usage

Execution can be carried out with the run script and the appropriate args.
If you are getting an error that suggests the model isn't available, it's likely the model isn't preloaded and needs to be supplied locally.

./run

-i INPUT [NULL]
-m MODEL [NULL]
-w WORKERS [8]
-r RAM [8]

# example

# Preloaded models can be called by name (available models here https://www.celltypist.org/models)
./run -i /path/to/pbmc.RDS -m Immune_All_High

# Custom models can be used from a local file (some models in the CellTypist organ atlas are custom)
./run -i /path/to/pbmc.RDS -m /path/to/Adult_Human_Intestine.pkl

Output

output_2023-06-23_13.19.45/
├── annd_all_markers.csv            # output of FindAllMarkers for annotated obj
├── celltypist-log.txt              # process log
├── decision_matrix.csv             # CellTypist output
├── pbmc-annd_2023-06-23.RDS        # input obj annotated with predicted_labels.csv fields
├── predicted_labels.csv            # CellTypist output
├── probability_matrix.csv          # CellTypist output
└── qc.csv                          # some QC metrics

qc.csv

- cell_count
  - number of predicted cells by type
- probability_median
  - median value of filtered prediction probabilities to > 50% (i.e. positive decision tree values)
  - lower values indicate lower confidence in prediction
- conflict_count
  - number of instances a positive value occurs with a corresponding positive value for a barcode
  - high values indicate more discrepancies in prediction
- conflict_proportion
  - fraction of barcodes which experience prediction conflicts
  - high values indicate more discrepancies in prediction

File Tree

annotation/
├── README.md
├── run                     # EXECUTION SCRIPT and main runtime
└── src
    ├── apply-ann.R         # application of predicted labels to input and find DEGs
    ├── getopts             # run script arguments
    ├── matrix-convert.R    # routine to extract and prepare raw counts matrix
    └── qc.R                # method to assempble QC table

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

annotation

Workflow

Setup

Dependencies

First Time

Config

Usage

Output

qc.csv

File Tree

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
output		output
src		src
.gitignore		.gitignore
README.md		README.md
run		run

Folders and files

Latest commit

History

Repository files navigation

annotation

Workflow

Setup

Dependencies

First Time

Config

Usage

Output

qc.csv

File Tree

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages