Skip to content

Upsampling Panther Pathways for Performance Evaluation #62

@ntalluri

Description

@ntalluri

The performance evaluation uses PANTHER pathways as input nodes, each paired with downsampled STRING interactomes. This helps isolate how the number of input nodes and interactome size affect algorithm performance in a controlled setting. EGFR experimental data will now serve as a final experimental case study to test whether insights from the controlled evaluation transfer to a real omics dataset.

The EGFR omics dataset has 700 targets and 1 source. The problem is that I don't know if there is PANTHER pathways that match this distribution and are this large, which would make applying our insights between the controlled evaluation and the EGFR case study difficult.

An idea would be to combine multiple PANTHER pathways to increase the number of input nodes, then downsample the interactomes constructed from these combined pathways. This would allow the controlled evaluation to include cases closer to the EGFR node distribution.

PR #25 needs to work first before we can get pathway statistics (number of sources, targets, and prizes/actives per pathway) to decide if we need to upsample and finalize this plan. However, I think there will be some Panther pathways with similar distributions to the EGFR omic data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    datasetMutating datasets in any way.enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions