Skip to content

Permutation analysis and normalization strategy for the colocatome paper.

License

Notifications You must be signed in to change notification settings

plevritis-lab/Spatial_Permutation_and_Normalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatial permutation and normalization of multiplexed immunofluorescence imaging data

Overview

This R script is used to identify statistically significant spatial features i.e. positive or negative cell-cell colocalizations using the colocation quotient (CLQ) analysis. Here we describe how to calculate the CLQ, and create a null distribution of CLQ values and normalize the data. The normalization process considers the number of cells within each subpopulation. Subpopulations with a low cell count were more likely to yield a broader distribution of CLQ values during the permutation analysis. This broader distribution resulted from the substantial impact of random label sampling on CLQ value calculations.

  • get_CLQ() The colocation quotient (CLQ) quantifies how a cell subpopulation colocates spatially with another cell subpopulation among a set of nearest neighbors, defined here as 20. We calculated the colocation quotient for the pairwise cell types identified with CELESTA (Zhang et al., 2022, Nature Methods) under naïve and treatment conditions using the following equation: CLQb→a = (Cb→a/Na) / (Nb/(N− 1)) where C is the number of cells of cell type b among the defined nearest neighbors of cell type a, N is the total number of cells and Na and Nb are the numbers of cells for cell type a and cell type b.

  • KNN_neighbors() Function to find N-nearest neighboring cells

  • find_cell_type_neighbors() This step intends to find the cell types for neighboring cells

  • CLQ_permutated_matrix_gen() This function intends to assess the significance of the CLQ values obtained by randomly permuting 500 times the cell labels (cell types) while preserving the subpopulation proportions.

  • get_counts() This function intends to count the number of cells for each subpopulation. It generates a summary table with cell type number, corresponding names and the cell counts in the sample.

  • CLQ_matrix_gen This function will read the CELESTA cell assignment file, and will generate the original CLQ matrix.

CLQ_permutated_matrix_gen_caller : This function retrieves the output of the permutated matrix of each sample.

get_counts_caller This function retrieves the output of the subpopulation counts for each sample.

significance_matrix_gen This function identifies statistically significant CLQ values. The CLQ values falling outside or at the tail of the distribution generated by the permutation analysis are considered significant, whereas values within the distribution are deemed non-significant, as they can be reproduced after spatial randomization. Percentile values < 0.05 or > 0.95 are considered as significant. The normalization achieved through the permutation analysis facilitates not only spatial feature comparisons but also enables the comparison of different conditions from the same, or independent experiments. CLQs were normalized according to the following formula: (Observed CLQ - Mean CLQ)/(Max CLQ – Mean CLQ).

plot_gen This function plots the distribution of all the permutation CLQ values for each cell pair. The blue bar is the normalized CLQ value and the red bar is the original CLQ value.

CLQ_normalization_by_sample This function requires (1) a named vector with the original CLQ values for one sample before normalization, each element need to have a name, which is the two cell types in the cell pair, connected by “_“, (2) A cell count file with the number of cells for each cell type in that sample, (3) Number of nearest neighbors in the CLQ calculation, (4) A threshold value cell count for rare cell populations, default is 5, (5) CELESTA input prior cell type signature matrix and (6) Clipping parameters, default to 0.05. but a warning message will suggest clipping more as needed. The original CLQ distribution is bell-shaped, but is skewed on the rail. The clipping parameter allows for better visualization when normalizing the data.

Dependency

  • spdep: for obtaining spatial neighborhood information
  • ggplot2

Usage

library(spdep)
library(ggplot2)


### Samples are first processed here to generate original and permutated CLQs for each cell to cell pair in a given sample. 

### Input file name example: “TAFs1_cell_type_assignment.csv”


files <- (Sys.glob("*cell_type_assignment.csv"))

for (f in files){
  print(f)
  filename_c = f
  
  count_file = get_counts(filename=filename_c)
  
  ### CLQ_permutated_matrix_gen function is using iteration number, filename and the count_file generated in the previous step. This function is dependent on multiple functions [get_CLQ(), KNN_neighbors(), find_cell_type_neighbors()]
 ### iternum is the iteration number for permutation analysis, which is set to 500. 


  CLQ_permutated = CLQ_permutated_matrix_gen(iternum=500, 
                                             filename = filename_c,
                                             df_c = count_file)
}


### 
### Then, significance of CLQs are calculated based on their percentile. 
### The original CLQs and permutated CLQs are retrieved for each sample through the functions [CLQ_matrix_gen(), CLQ_permutated_matrix_gen_caller(),get_counts_caller() ]

files <- (Sys.glob("*cell_type_assignment.csv"))

for (f in files){
  print(f)
  
  filename_c = f
  
  CLQ_matrix = CLQ_matrix_gen(filename = filename_c)
  
  CLQ_permutated = CLQ_permutated_matrix_gen_caller(filename = filename_c)
  
  count_file = get_counts_caller(filename=filename_c)
 
###  “list_of_matrices” is the 500 different CLQ sets for each iteration.  

  significance_matrices = significance_matrix_gen(iternum=500,
                                                  filename = filename_c,
                                                  list_of_matrices = CLQ_permutated,
                                                  CLQ_matrix_original= CLQ_matrix,
                                                  df_c = count_file)
  
### plot_gen generates plot for each CLQ for a pair of cell type A to cell type B. It retrieves the permutated CLQ values from CLQ_permutated, and the original CLQ values from CLQ_matrix.

 plot_gen(iternum=500,
           filename = filename_c,
           list_of_matrices = CLQ_permutated,
           CLQ_matrix_original= CLQ_matrix)
  
}

Inputs

The spatial permutation analysis requires two inputs:
1. CELESTA cell subpopulations:
A dataframe with one column named cell_types with all the user-defined CELESTA cell subpopulations.

See file example: “cell_types_celesta.csv”

2. Segmented imaging data with CELESTA cell assignment:
The _cell_type_assigment.csv output dataframe from the CELESTA algorithm available to download at https://github.com/plevritis-lab/CELESTA.

See file example: “TAFs1_cell_type_assignment.csv”

Outputs

Spatial permutation outputs: 1. After running the get_counts() function, the script will output a .csv file with the number of cells for each cell subpopulation.

See file example: “TAFs1_CellCounts.csv”

  1. After running the CLQ_matrix_gen function, the script will output a .csv file with the original CLQ values for each cell pair.

See file example: “TAFs1_CLQ.csv”

  1. After running the CLQ_permutated_matrix_gen function, the script will output a .csv file of 500 CLQ values obtained by randomly permuting 500 times the cell labels (cell subpopulations) while preserving the proportions. These values will be plotted in

See file example: “TAFs1_CLQ_Permutated.csv”

  1. After running the significance_matrix_gen function, the script will output a .csv file with the script will output a .csv file with the sample name, the identity of and count of each cell subpopulation, the original CLQ value, the percentile and if the value is deemed significant.

Note that the original CLQ of value zero smay be caused by insufficient cell numbers of respective cell types. These are filtered out in the post-process prior to colocatome generation.

See file example: “TAFs1_CLQ_data_full.csv” and .png images in the PA_figures_TAFs1 folder.

5.After running the CLQ_normalization_by_sample functions, the cript will output a .csv file with normalized values.

See file examples: “TAFs1_CLQ_Normalized_L0_R0.05”. L0 = left clipping parameter at 0 (no clipping) and R0.05 = clipping parameter at 0.05.

Note that the folder contains only a subset of the distribution plots.

Getting help

If you encounter a bug, please file an issue with a minimal reproducible example on GitHub. For questions and other discussion, please use community.rstudio.com.

About

Permutation analysis and normalization strategy for the colocatome paper.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages