Skip to content

Launch training jobs #6

@FlorianBertonBrightClue

Description

it seems that there is a issue when you launch for the first time a training jobs.

In base_training_job.py line 203 you check if the checkpoint subfolder exists and if not you create it. However this directory is a child of log_folder/training_job_name

Then line 217 you check if the log folder : log_folder/training_job_name exists in order to know if the training should init it and the parameters or used a checkpoints.

The issue is that this folder is sure to exists as you just created it before line 203. At this point the boolean __found_job_folder is True. This means that a file ".yml" should be present which is not the case.

And so when we go in __initialize_training_job() instead of saving the parameters we try to load it (line 747),
and then an error is raised in __load_training_parameters()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions