Convert single task study calls to a task call#303
Conversation
This avoids performing a call to retrieve the study and all their dataset metadata (2 calls total).
|
Is this a fix that will reduce the frequency of openml server errors? Does this impact runs done with pre-existing benchmark files instead of studies (such as |
|
It reduces requests only when the benchmark was specified with the |
| _task_names = [] | ||
| else: | ||
| _task_names = task_names |
There was a problem hiding this comment.
You don't need to change task_names variable, it will work either way.
Did you try it?
I want to be sure that the folder structure generated on s3 is the same as before and that this change is not making it more difficult to retrieve results from s3 a posteriori.
Currently, s3 is the long-term storage for results and those are organized by sessions, and inside the session, each folder contains the original benchmark name and the task name, which makes it relatively easy to download only a specific result.
I think it should be fine though as aws mode is running benchmarks using --session= (which removes the session folder on the ec2 instance to avoid an additional subfolder) and this should prevent the modifed params to appear anywhere.
There was a problem hiding this comment.
Testing with python runbenchmark.py constantpredictor openml/s/264 -m aws -f 0 the structure on the bucket seems the same, but the local result directory is actually different. Both have the same aws.openml_s_264.test.all_tasks.0.constantpredictor subdirectory with the data from that run, but the main directory of this branch does not feature logs and logs.zip.
You seem to be correct that task names don't need to be modified, though I find the openml/t/61 -t iris notation a bit odd to explicitly support.
This avoids performing a call to retrieve the study and all their dataset metadata (2 calls total) for each job.