You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, hyperfine seems to either abort when a command fails even once, or it treats failures the same as successes.
I'd like an option for either:
ignoring the failed runs (report their number, but otherwise compute statistics as though they never happened). Some possible names for such an option: --omit-failed-runs, --forget-failed-runs, or --skip-failed-runs. This is slightly confusing in the presence of existing --ignore-failed-runs, which I think should be renamed, see Suggestion: rename --ignore-failure to --ignore-exit-code for Hyperfine 2.0 #828.
putting the failures in a different "bucket", reporting the statistics for the successful runs and failed runs separately.
Out of scope (as far as I'm concerned), but maybe worth discussing.
There is also a potential feature of bucketing results based on other data, say the exact exit code, or whether it takes more than X seconds, but that's less important to me now.
Another possibility that I'd consider out of scope is automatically finding the buckets, e.g. trying to fit a sum of Gaussian distributions instead of one Gaussian distribution onto the measurements.
My use-case
I am trying to benchmark a test run that includes a test that is flaky and sometimes deadlocks (around 10% of the time). Successful runs take about 3 minutes, unsuccessful ones take forever. So, I do:
Currently,
hyperfineseems to either abort when a command fails even once, or it treats failures the same as successes.I'd like an option for either:
--omit-failed-runs,--forget-failed-runs, or--skip-failed-runs. This is slightly confusing in the presence of existing--ignore-failed-runs, which I think should be renamed, see Suggestion: rename--ignore-failureto--ignore-exit-codefor Hyperfine 2.0 #828.Out of scope (as far as I'm concerned), but maybe worth discussing.
There is also a potential feature of bucketing results based on other data, say the exact exit code, or whether it takes more than X seconds, but that's less important to me now.
Another possibility that I'd consider out of scope is automatically finding the buckets, e.g. trying to fit a sum of Gaussian distributions instead of one Gaussian distribution onto the measurements.
My use-case
I am trying to benchmark a test run that includes a test that is flaky and sometimes deadlocks (around 10% of the time). Successful runs take about 3 minutes, unsuccessful ones take forever. So, I do:
However, when the test does deadlock in any of the 10 runs, the whole benchmark is wasted.