Skip to content

feat: scaffolding, caching, EGFR#65

Open
tristan-f-r wants to merge 12 commits intomainfrom
egfr-and-infrastructure
Open

feat: scaffolding, caching, EGFR#65
tristan-f-r wants to merge 12 commits intomainfrom
egfr-and-infrastructure

Conversation

@tristan-f-r
Copy link
Contributor

@tristan-f-r tristan-f-r commented Mar 18, 2026

We bundle EGFR along with the rest of the caching infrastructure. Notes:

  • All motivation for the caching system lives under cache/README.md.
  • We removed pra.yaml for now, as the only PRAs are the synthetic data and the ResponseNet data, and soon the DepMap data.
  • The CONTRIBUTING.md file is not finalized, and is simply there to not break Changes to CONTRIBUTING guide #57. I may split all contributing material into Changes to CONTRIBUTING guide #57 later.
  • directory.py contains unnecessary files from other datasets that were deemed universal.
  • I would like to keep the web folder even though I'm aware no one is currently in a position to review it.

@tristan-f-r tristan-f-r added the enhancement New feature or request label Mar 18, 2026
@tristan-f-r tristan-f-r changed the title feat: initial scaffolding, EGFR feat: scaffolding, caching, EGFR Mar 18, 2026
Copy link
Collaborator

@ntalluri ntalluri Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the web folder, is that something that can be a separate pull request? It mentions in the high level that this is something that someone can't review right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! Ill have to make some light changes, but I'll make that separate 👍

Copy link
Collaborator

@ntalluri ntalluri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a light review of the PR; did not look to hard at the code itself yet. I mostly was gathering ideas on what was happening from the READMEs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the name of this. DMMM isn't a very universal term to all the algorithms in this config. Maybe something like prize-active?

Copy link
Contributor Author

@tristan-f-r tristan-f-r Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of either names, though I'm much more worried for the config being universal to the provided datasets rather than to the provided algorithms.

Copy link
Collaborator

@ntalluri ntalluri Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on how we are setting up the configs for the paper, we will need to make and have each config dataset collection specific instead of algorithm type specific anyways. This would be a better way to organize these configs. Also so people have access to the configs we used.

g: 0

datasets:
- label: dmmmegfr_string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the EGFR dataset is also not specific to algorithms that fit into the DMMM category only.

Suggested change
- label: dmmmegfr_string
- label: egfr_string

edge_files: ["processed/interactome.tsv"]
node_files: ["processed/prizes.txt"]
other_files: []
- label: dmmmegfr_irefindex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- label: dmmmegfr_irefindex
- label: egfr_irefindex

tristan-f-r and others added 2 commits March 18, 2026 16:54
Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>
),
```

When a file is requested, `cached`, `pinned`, and `unpinned` are all downloaded, and we characterize them as follows:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still don't understand what these terms actual are mean and are associated to before them leading to failure or updating; the following descriptions don't help since they involve using multiple terms.


## Snakemake

We also provide a `Snakefile`, which can be imported in dataset Snakefiles through:
Copy link
Collaborator

@ntalluri ntalluri Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We also provide a `Snakefile`, which can be imported in dataset Snakefiles through:
We also provide a `Snakefile`, which can contains dataset fetching functions that are imported in dataset specific Snakefiles through:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset Mutating datasets in any way. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants