-
Notifications
You must be signed in to change notification settings - Fork 22
Add uniqueness restriction on names #567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
niemela
wants to merge
1
commit into
master
Choose a base branch
from
stricter-naming
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should be a tiny bit more specific with the definition of equivalence? Surely we don't care about output validator flags in this definition, right?
For any two test cases, if the contents of their .in and .files directory are equivalent, as well as the args sequence in the .yaml file, then the input of the two test cases is equivalent. For any two test cases, if their input, output validator arguments and the contents of their .ans files are equivalent, then the test cases are equivalent.At the very least, we should say "if their inputs are equivalent". Additionally, we should probably either copy paste the definition or link to it.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I want to be able to reuse a
.infile with different output validator flags.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe we do, but then we have two different kinds of equivalence. The one you want to use is "the inputs are equivalent", the one we already have defined and that I used is "the test cases are equivalent". The latter allows a judge system to reuse the results of judging the test case, the former does not. This is why I would like to use that definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For a concrete example: Sweden has a problem asking "find the min and max possible thing for a given input", with subtask "you only need to find the max correctly". I would argue that the most correct solution in this instance is that this property is part of the group via
output_validator_flags, not any testcase itself, and we want to be able to reuse them (problem in question: https://po2punkt0.kattis.com/problems/robottavlingen)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... I see you have not read the entirety of me and @thorehusfeldt's discussion in #523 😏.
In that discussion the consensus seems to go towards
output_validator_flagsbeing part of "the test case". I think @thorehusfeldt's is arguing from a point of "the sameness of a test case should imply the sameness of the judgement of said test case", and I would agree with that. I feels strange to say that you could pass a test case and then fail "the same" test case? They are quite obviously not the same then.So, IMO, what you are talking about is identical input, not identical test cases. I would argue that that can be sufficiently handled by symlinks of copying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also broken in a different way: it breaks every single randomized submission. The judge is basically allowed to rerun a submission an infinite number of times and check that it fails 0 times (so basically until it fails). Clearly that is not what we want right?
We very specifically mean: run this test case, and run it exactly once.
So I guess the discussion here is: can we change it to at most once for 'identical' testcases (for some definition of identical).
I would suggest that every
data/**/*.incorresponds to exactly 1 run, as it I have always understood it.If you want to avoid this, use
require_pass: easier_groupinstead to be explicit.this sounds reasonable. Note that this does not imply the opposite: if two files have the same content, they should have the same base name.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing the devil's advocate: if your submission gets TLE on codeforces, it will be rerun under certain conditions. Even if it's strange, is this something we want to forbid?
This is not sufficient. For example: hamiltonian path. 50% of the points for deciding existence, and 100% for outputting an actual path. The behavior we want here for strict subgroups is to rerun every test case and take min (which can be optimized by the determinism assumption). If we remove the determinism assumption, we could of course patch this by assigning the caching behavior to symlinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing something or would a setting on the problem such as
which defaults to true not solve this?
I feel like this is always a configuration on the problem and in almost all cases you should be allowed to assume determinism without any real issues.
If you want more granularity you could even set it on the test group instead of the problem itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on this?
Ah, interesting motvating example. I hadn't considered this so far.
That's sort of fine, but then requires implementations to actually implement and read this flag at the right times; not sure that will happen in practice.
A problem remains though: there can be fundamentally non-deterministic solutions to eg the hamiltonpath problem. If you assume a non-deterministic solution is deterministic, that kinda sounds like undefined behaviour.
Instead, I'd argue this is somewhat orthogonal to whether the solution is deterministic, and the flag should be called
dedup_identical_testcaseswhich could default totrueI suppose. (Or it could default totruewhen groups are used and default tofalseotherwise, but that is kinda complex.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is still a thing because it was abused to by time heavy/random submission... (it was better to timeout instead of producing WA because than you got another shot). So yeah I would want to forbid this.
This does not at all solve the issue I have. If you assume something that is not true you will get undefined behavior. We should definitely avoid this.
As far as I can see we have two somewhat independent issues:
For the second part I am technically fine with both options, but I personally prefer option 1. Mostly because I have not seen any good argument for option 2. The arguments I saw were all of the form "we want to reuse the outcome of a previous run" but then it is clear to me that we should "reuse the outcome of a previous run" and not "reuse a testcase".