Missing information in the tutorial dataset.

Hi, I found that the example dataset in the tutorial (the link below), does not have gene names in the h5ad object so that when running 

`dc.tl.trajectories(
    adata,
    dc.tl.TConfig("Healthy", "AVP", "MPO", "origin", "Healthy"),
    dc.tl.TConfig("AML1", "AVP", "CD68", "origin", "AML1"),
)`

dataset link:
https://github.com/azizilab/decipher_data/data_decipher_tutorial.h5ad

the inside code in ../decipher/tools/trajectory_inference.py , the function find_cluster_with_marker will filter all the cells so that the adata will be void.

`def find_cluster_with_marker(
    adata,
    marker,
    subset_column=None,
    subset_value=None,
    subset_min_percent_per_cluster=0.3,
    cluster_key="decipher_clusters",
    min_cell_per_cluster=10,
):
    """Find the cluster enriched for a marker gene. Possibly subset the cells before.

    Parameters
    ----------
    adata : sc.AnnData
        The annotated data matrix.
    marker : str
        The marker gene.
    subset_column : str, optional
        The column in `adata.obs` to subset on.
    subset_value : str, optional
        The value in subset_column to subset on.
    subset_min_percent_per_cluster : float, default 0.3
        When subsetting the cells, each cluster must have at least this proportion of cells from
        the subset to not be discarded. This is useful to remove clusters with too few cells from
        the subset.
    cluster_key : str, default "decipher_clusters"
        The key in `adata.obs` where the cluster information is stored.
    min_cell_per_cluster : int, default 10
        The minimum number of cells per cluster to consider it.
    """
    if subset_column is not None:
        adata = _subset_cells_and_clusters(
            adata,
            subset_column,
            subset_value,
            subset_min_percent_per_cluster=subset_min_percent_per_cluster,
            min_cell_per_cluster=min_cell_per_cluster,
            cluster_key=cluster_key,
        )
    marker_data = pd.DataFrame(adata[:, marker].X.toarray())
    marker_data["cluster"] = adata.obs[cluster_key].values
    # get the proportion of cells in each cluster that are in the subset
    marker_data = marker_data.groupby("cluster").mean()
    marker_data = marker_data.sort_values(by=0, ascending=False)
    return marker_data.index[0]`

please have a check, thank you.

Best


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing information in the tutorial dataset. #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing information in the tutorial dataset. #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions