Skip to content

Missing information in the tutorial dataset. #4

@ZixiangPAN

Description

@ZixiangPAN

Hi, I found that the example dataset in the tutorial (the link below), does not have gene names in the h5ad object so that when running

dc.tl.trajectories( adata, dc.tl.TConfig("Healthy", "AVP", "MPO", "origin", "Healthy"), dc.tl.TConfig("AML1", "AVP", "CD68", "origin", "AML1"), )

dataset link:
https://github.com/azizilab/decipher_data/data_decipher_tutorial.h5ad

the inside code in ../decipher/tools/trajectory_inference.py , the function find_cluster_with_marker will filter all the cells so that the adata will be void.

`def find_cluster_with_marker(
adata,
marker,
subset_column=None,
subset_value=None,
subset_min_percent_per_cluster=0.3,
cluster_key="decipher_clusters",
min_cell_per_cluster=10,
):
"""Find the cluster enriched for a marker gene. Possibly subset the cells before.

Parameters
----------
adata : sc.AnnData
    The annotated data matrix.
marker : str
    The marker gene.
subset_column : str, optional
    The column in `adata.obs` to subset on.
subset_value : str, optional
    The value in subset_column to subset on.
subset_min_percent_per_cluster : float, default 0.3
    When subsetting the cells, each cluster must have at least this proportion of cells from
    the subset to not be discarded. This is useful to remove clusters with too few cells from
    the subset.
cluster_key : str, default "decipher_clusters"
    The key in `adata.obs` where the cluster information is stored.
min_cell_per_cluster : int, default 10
    The minimum number of cells per cluster to consider it.
"""
if subset_column is not None:
    adata = _subset_cells_and_clusters(
        adata,
        subset_column,
        subset_value,
        subset_min_percent_per_cluster=subset_min_percent_per_cluster,
        min_cell_per_cluster=min_cell_per_cluster,
        cluster_key=cluster_key,
    )
marker_data = pd.DataFrame(adata[:, marker].X.toarray())
marker_data["cluster"] = adata.obs[cluster_key].values
# get the proportion of cells in each cluster that are in the subset
marker_data = marker_data.groupby("cluster").mean()
marker_data = marker_data.sort_values(by=0, ascending=False)
return marker_data.index[0]`

please have a check, thank you.

Best

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions