Skip to content

feat!: SPRAS revision#320

Merged
agitter merged 66 commits intoReed-CompBio:mainfrom
tristan-f-r:hash
Mar 18, 2026
Merged

feat!: SPRAS revision#320
agitter merged 66 commits intoReed-CompBio:mainfrom
tristan-f-r:hash

Conversation

@tristan-f-r
Copy link
Collaborator

@tristan-f-r tristan-f-r commented Jul 9, 2025

This change means that output files will not be reused whenever SPRAS is updated if immutable_files is true, furthering the immutability goal necessary to get OSDF integration working for SPRAS benchmarking. ('updated' depends on the git commit hash or the actual SPRAS release version)

This adds the unique spras_revision to dataset, gold standard, and algorithm labels to provide OSDF support on the level of deterministic, non-seeded algorithms when datasets are immutable.

This has the added benefit of allowing SPRAS users to simply upgrade their SPRAS version without needing to clear output, which complements #380. The refactored test also partially covers #165 and #45. (This is also where the majority of the code comes from: The actual feature patch here is a 50 line change.)

See #321 implemented by #335 for handling nondeterministic algorithms / seeded algorithms.


To make this change, a significant test refactor in test/analysis was needed to remove hardcoded paths (which contained the hashes being modified per-commit in this PR.) It turns out that whenever we make any change to the hash, this [original: the patch here fixes this] test breaks! That's why this PR is depended on by so many other PRs.

This adds the unique spras_revision to every single paramater combination (before hashing) and the dataset label, to provide OSDF support on the level of deterministic algorithms.
@tristan-f-r tristan-f-r marked this pull request as ready for review July 9, 2025 20:51
@tristan-f-r tristan-f-r added enhancement New feature or request needed for benchmarking Priority PRs needed for the benchmarking paper labels Jul 9, 2025
@tristan-f-r tristan-f-r changed the title feat: spras_revision feat: SPRAS revision Jul 9, 2025
@tristan-f-r

This comment was marked as outdated.

@tristan-f-r tristan-f-r marked this pull request as draft July 9, 2025 21:37
@tristan-f-r tristan-f-r marked this pull request as ready for review July 10, 2025 19:34
@tristan-f-r tristan-f-r changed the title feat: SPRAS revision feat!: SPRAS revision Jul 10, 2025
@tristan-f-r

This comment was marked as outdated.

@tristan-f-r tristan-f-r added the P-high This is a blocker for many PRs/issues/features label Jul 24, 2025
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting closer and the scope looks better now that the labeled outputs are optional. I still will need a final end-to-end pass because I've been reviewing in chunks.

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level comments:

  • We should find a place to document these major changes for users or developers. If we are using config.yaml as the temporary place to document config options, that's fine. Developers won't know what immutability means and the main concepts to be aware of unless they find these comments in the source.
  • The parameters files in the logs/ output are not immutable. Do they need to be?
  • The output subdirectories like data0_72021389-bowtiebuilder-params-4YLQBJ5 take some getting used to. I'm not sure what else we could do to make it more concise other than use a shorter hash. The eval subdirectories also end up included the tag twice: data0_72021389-gs0_72021389-eval.

@tristan-f-r
Copy link
Collaborator Author

  • Yes.
  • Indeed. I missed this.
  • This is a problem - this can be alleviated somewhat with logs/report-{timestamp}.md #380, though I also have no immediate ideas on how to more clearly name these folders.

@tristan-f-r tristan-f-r marked this pull request as draft March 8, 2026 23:37
@tristan-f-r
Copy link
Collaborator Author

tristan-f-r commented Mar 9, 2026

I've moved the revision code over to spras/config/revision.py, and added the hash to all algorithms as well (which I've hopefully motivated enough in the top-level docstring in revision.py). This unfortunately makes the file names even worse.

I've also introduced an extra helper which I don't like too much (detatch_spras_revision) due to its readibility, so I may move calls of it to runner.py to make it easier to read.

@tristan-f-r tristan-f-r marked this pull request as ready for review March 9, 2026 19:07
@agitter
Copy link
Collaborator

agitter commented Mar 14, 2026

We should find a place to document these major changes for users or developers

To address this, should we create a new doc in docs/fordevs? Something like "SPRAS concepts" that gives an overview of major design decisions or implementation aspects that are not obvious.

@github-actions github-actions bot added the merge-conflict This PR has merge conflicts. label Mar 16, 2026
@github-actions github-actions bot removed the merge-conflict This PR has merge conflicts. label Mar 16, 2026
Copy link
Collaborator

@ntalluri ntalluri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the code locally, the hashes aren't as long as I thought, still very manageable. I also left so changes and questions.

@agitter agitter merged commit fcdfb0c into Reed-CompBio:main Mar 18, 2026
20 checks passed
@tristan-f-r tristan-f-r deleted the hash branch March 18, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needed for benchmarking Priority PRs needed for the benchmarking paper P-high This is a blocker for many PRs/issues/features tuning Workflow-spanning algorithm tuning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants