Skip to content

Commit c64d5b2

Browse files
committed
update readme
1 parent 2f4038e commit c64d5b2

4 files changed

Lines changed: 102 additions & 694 deletions

File tree

README.md

Lines changed: 44 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# CRISPR Pipeline
1+
# coreSCpy Pipeline
22

33
Developer: Elizabeth Aslinger (easlinger)
44

@@ -23,8 +23,7 @@ with desired environment name):
2323
`git clone https://github.com/ChoBioLab/corescpy.git`, or
2424
look above for the green "Code" button and press it for instructions.
2525

26-
5. Naviate to the repository directory (replace
27-
<DIRECTORY> with your path):
26+
5. Naviate to the repository directory (replace <DIRECTORY> with your path):
2827
`cd <DIRECTORY>`
2928

3029
6. Install the package with pip. (Ensure you have pip installed.)
@@ -38,14 +37,26 @@ Open a Python terminal and type:
3837

3938
2. You can now call functions from the analysis module using
4039
`cr.ax.<FUNCTION>()`, from the preprocessing using `cr.ax.pp...`, etc.
41-
in Python; however, you are most likely to interact with the `Crispr`
42-
class object. Here is example code you might run
43-
(replacing <...> with your argument specifications) to load and
44-
preprocess your data.
40+
in Python; however, you are most likely to interact with the `Omics` class object, or specialized classes that inherit from it, such as `Crispr` and `Spatial`.
41+
class object. Here is example code you might run (replacing things in < > brackets with your specifications):
42+
```
43+
self = cr.Omics(<data_object_or_directory>, <...>)
44+
```
45+
or
46+
```
47+
self = cr.Crispr(<data_object_or_directory>, <...>)
48+
```
49+
or
50+
```
51+
self = cr.Spatial(<data_object_or_directory>, <...>)
52+
```
53+
54+
and then run workflows, such as
4555
```
46-
from corescpy.crispr_class import corescpy
47-
self = Crispr(adata, <...>)
4856
self.preprocess(<...>)
57+
self.cluster(<...>)
58+
self.annotate_clusters("<CellTypist model.pkl>")
59+
self.plot(kind=["heat", "matrix", "umap"])
4960
```
5061
etc.
5162

@@ -57,6 +68,21 @@ Here are the methods (applicable to scRNA-seq generally, not just perturbations)
5768

5869
The following perturbation-specific methods can be executed optionally and in any order:
5970

71+
### Spatial Data
72+
73+
Here is an example workflow to analyze spatial data (after preprocessing and clustering as described above):
74+
75+
```
76+
self.calculate_centrality(n_jobs=4)
77+
self.find_cooccurrence(figsize=(60, 20), kws_plot=dict(wspace=3))
78+
self.find_svgs(genes=genes, method="moran", n_perms=10, kws_plot=dict(
79+
legend_fontsize="large"), figsize=(15, 15))
80+
self.calculate_receptor_ligand(col_condition=False, p_threshold=0.001,
81+
remove_ns=True, figsize=(20, 20))
82+
```
83+
84+
### Perturbation Data
85+
6086
* `self.run_augur(...)`: Score and plot how strongly different cell types responded to perturbation(s). This score is operationalized as the accuracy with which a machine learning model can use gene expression data to predict the perturbation condition to which cells of a given type belong. Augur provides scores aggregated across cells of a given type rather than for individual cells.
6187
* `self.run_mixscape(...)`: Quantify and plot the extent to which individual cells responded to CRISPR perturbation(s), and identify which perturbation condition cells were not detectibly perturbed in terms of their gene expression.
6288
* `self.compute_distance(...)`: Calculate and visualize various distance metrics that quantify the similarity in gene expression profiles across perturbation conditions.
@@ -95,12 +121,14 @@ Certain arguments used throughout the `corescpy` package (including outside the
95121
### Initialization Method Arguments
96122

97123
* `file_path` **(str, AnnData, or dictionary)**: Path or object containing data. Used in initialization to create the initial `self.adata` attribute (an AnnData or MuData object). Either
98-
- a path to a 10x directory (with matrix.mtx.gz, barcodes.tsv.gz, features.tsv.gz),
124+
- a path to a 10x directory (with matrix.mtx.gz, barcodes.tsv.gz, features.tsv.gz), the top-level directory of Xenium output (above the CellRanger feature/matrix directory), or the top-level directory of Visium output (that contains the .h5 file),
99125
- a path to an .h5ad or .mu file (Scanpy/AnnData/Muon-compatible),
100-
- an AnnData or MuData object (e.g., already loaded with Scanpy or Muon, or by using `corescpy.pp.create_object(file_path)`), or
126+
- an `AnnData`, `MuData`, or `SpatialData` object (e.g., already loaded with the appropriate `scverse` packages, or by using `corescpy.pp.create_object(file_path)`), or
101127
- a dictionary containing keyword arguments to pass to `corescpy.pp.combine_matrix_protospacer()` (in order to load information about perturbations from other file(s); press the arrow to expand details here),
102128

103129
<details><summary>Click to expand details</summary>
130+
or
131+
- a dictionary, keyed by sample name, containing multiple `file_path`-compatible arguments for each sample (for integration).
104132

105133
```
106134
crd = "<YOUR DIRECTORY HERE>"
@@ -241,8 +269,12 @@ Finally, this approach saves memory: All these versions of the attribute are sto
241269

242270
## Resources for Background Knowledge
243271

272+
[Pertpy (Perturbation/Conditions Analysis) Tutorials](https://pertpy.readthedocs.io/en/latest/tutorials/index.html)
273+
274+
[Squidpy (Spatial) Tutorials](https://squidpy.readthedocs.io/en/stable/notebooks/tutorials/index.html)
244275

245-
[Pertpy Tutorials](https://pertpy.readthedocs.io/en/latest/tutorials/index.html)
246276
[Single Cell Best Practices](https://www.sc-best-practices.org/conditions/perturbation_modeling.html)
277+
247278
[Augur](https://github.com/neurorestore/Augur)
279+
248280
[Mixscape (Seurat)](https://satijalab.org/seurat/articles/mixscape_vignette.html)

corescpy/processing/preprocessing.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -603,13 +603,13 @@ def perform_qc(adata, n_top=20, col_gene_symbols=None, log1p=True,
603603
"n_genes_by_counts": "Genes Detected in Cell",
604604
**patterns_names}, axis=1) # rename
605605
fff = seaborn.pairplot(
606-
mets_df, diag_kind="kde", hue=h if yes else None,
606+
mets_df, diag_kind="kde", hue=h if h else None,
607607
diag_kws=dict(fill=True, cut=0), plot_kws=dict(
608608
marker=".", linewidth=0.05)) # QC pairplot
609609
except Exception as err:
610610
fff = err
611611
print(traceback.format_exc())
612-
figs[f"pairplot_by_{h}" if yes else "pairplot"] = fff
612+
figs[f"pairplot_by_{h}" if h else "pairplot"] = fff
613613

614614
# % Counts (MT, RB, HB) Distribution (KDE) Plots
615615
if len(pct_n) > 0: # if any QC vars (e.g., MT RNA) present...
@@ -715,9 +715,9 @@ def remove_batch_effects(adata, col_cell_type="leiden",
715715
early_stopping=True, early_stopping_patience=25)
716716
train = adata.copy()
717717
train.obs["cell_type"] = train.obs[col_cell_type].tolist()
718-
train.obs["batch"] = train.obs[col_batch].tolist()
718+
train.obs["batch"] = train.obs[col_sample_id].tolist()
719719
if plot is True:
720-
sc.pl.umap(train, color=[col_batch, col_cell_type],
720+
sc.pl.umap(train, color=[col_sample_id, col_cell_type],
721721
wspace=.5, frameon=False)
722722
print("\n<<< PREPARING DATA >>>")
723723
pt.tl.SCGEN.setup_anndata(
@@ -730,5 +730,5 @@ def remove_batch_effects(adata, col_cell_type="leiden",
730730
if plot is True:
731731
sc.pp.neighbors(corr)
732732
sc.tl.umap(corr)
733-
sc.pl.umap(corr, color=[col_batch, col_cell_type], wspace=0.4)
733+
sc.pl.umap(corr, color=[col_sample_id, col_cell_type], wspace=0.4)
734734
return corr

examples/senmayo_ileal.ipynb

Lines changed: 40 additions & 1 deletion
Large diffs are not rendered by default.

examples/spatial_visium.ipynb

Lines changed: 13 additions & 676 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)