Remove some randomization in the example and in the code base.#344
Remove some randomization in the example and in the code base.#344lionelkusch merged 56 commits intomind-inria:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #344 +/- ##
==========================================
- Coverage 98.08% 98.07% -0.01%
==========================================
Files 22 22
Lines 1147 1144 -3
==========================================
- Hits 1125 1122 -3
Misses 22 22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jpaillard
left a comment
There was a problem hiding this comment.
I can't reproduce the reproducibility issue you describe for CFI and knockoffs. Can you point more precisely to the problematic example/test?
The problem is with The difference is quite minor but there is always some variations. |
jpaillard
left a comment
There was a problem hiding this comment.
Thanks for the details. I could reproduce the issue.
It seems that the problem comes from using nested parallel loops: parallel calls of try_reproducibility.run_joblib, which has inner parallelization of plot_knockoff_aggregation.single_run
The inner processes might unpredictably inherit some state of the parent.
To fix it you can use Parallel(n_jobs=<nb_of_jobs>, require='sharedmem') or simply set n_jobs=1 in try_reproducibility.
|
I fixed the issue for plot_knockoff_aggregation. |
|
For |
|
There is still a bit of uncontrolled randomness for plot_knockoff_aggregation but it will be easier to debug it after reformatting with the new API. |
|
I updated the management of the seed because I forgot that it's better to use a range of values for setting a seed than to use a random generator. The problem is because the random generator can generate 2 times the same numbers. |
jpaillard
left a comment
There was a problem hiding this comment.
Thank you.
I have one last comment.
bthirion
left a comment
There was a problem hiding this comment.
I have some very minor comments.
There is no need to make KFold random in the examples (Better use ShuffleSplit if we want a random splitter, but this is the user's choice then).
Co-authored-by: bthirion <bertrand.thirion@inria.fr>
Co-authored-by: bthirion <bertrand.thirion@inria.fr>
Co-authored-by: bthirion <bertrand.thirion@inria.fr>
|
Last review before merging. |
Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr>
I try to improve the reproducibility of the example by setting seeds in the example and better management of the random generator in the code base.
However, there is still some randomisation in PermutationFeatureImportance and in Model_X_Knockoff.