Conversation
This adds the unique spras_revision to every single paramater combination (before hashing) and the dataset label, to provide OSDF support on the level of deterministic algorithms.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
agitter
left a comment
There was a problem hiding this comment.
This is getting closer and the scope looks better now that the labeled outputs are optional. I still will need a final end-to-end pass because I've been reviewing in chunks.
There was a problem hiding this comment.
Some high level comments:
- We should find a place to document these major changes for users or developers. If we are using
config.yamlas the temporary place to document config options, that's fine. Developers won't know what immutability means and the main concepts to be aware of unless they find these comments in the source. - The parameters files in the
logs/output are not immutable. Do they need to be? - The output subdirectories like
data0_72021389-bowtiebuilder-params-4YLQBJ5take some getting used to. I'm not sure what else we could do to make it more concise other than use a shorter hash. The eval subdirectories also end up included the tag twice:data0_72021389-gs0_72021389-eval.
|
|
I've moved the revision code over to I've also introduced an extra helper which I don't like too much ( |
To address this, should we create a new doc in |
ntalluri
left a comment
There was a problem hiding this comment.
I ran the code locally, the hashes aren't as long as I thought, still very manageable. I also left so changes and questions.
Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>
This change means that output files will not be reused whenever SPRAS is updated if
immutable_filesis true, furthering the immutability goal necessary to get OSDF integration working for SPRAS benchmarking. ('updated' depends on the git commit hash or the actual SPRAS release version)This adds the unique
spras_revisionto dataset, gold standard, and algorithm labels to provide OSDF support on the level of deterministic, non-seeded algorithms when datasets are immutable.This has the added benefit of allowing SPRAS users to simply upgrade their SPRAS version without needing to clear
output, which complements #380. The refactored test also partially covers #165 and #45. (This is also where the majority of the code comes from: The actual feature patch here is a 50 line change.)See #321 implemented by #335 for handling nondeterministic algorithms / seeded algorithms.
To make this change, a significant test refactor in
test/analysiswas needed to remove hardcoded paths (which contained the hashes being modified per-commit in this PR.) It turns out that whenever we make any change to the hash, this [original: the patch here fixes this] test breaks! That's why this PR is depended on by so many other PRs.