Skip to content

fix: delegate offline mode control from container template to train.py#16

Merged
Neonkraft merged 1 commit into
mainfrom
fix/offline-env-vars
Apr 29, 2026
Merged

fix: delegate offline mode control from container template to train.py#16
Neonkraft merged 1 commit into
mainfrom
fix/offline-env-vars

Conversation

@Neonkraft
Copy link
Copy Markdown
Collaborator

Summary

The containerised TRL SLURM template (job_trl_container.sh.jinja) was hardcoding HF_HUB_OFFLINE=1, HF_DATASETS_OFFLINE=1, and TRANSFORMERS_OFFLINE=1 inside the container regardless of config.offline. This meant offline: false had no effect at runtime — the container always ran in offline mode and would stall if models weren't already cached.

This PR removes those hardcoded flags from the template and delegates ownership to train.py, which already reads config.offline to set these vars. An else branch is added to explicitly zero out the flags when offline: false, guarding against any stale values inherited from the container environment.

Type of change

  • Bug fix
  • New feature
  • Refactor
  • Performance
  • Documentation
  • Maintenance

The container SLURM template was hardcoding HF_HUB_OFFLINE=1 etc.
regardless of config.offline, so offline: false had no effect inside
the container and jobs would stall trying to reach the Hub at runtime.

Remove the three hardcoded offline flags from job_trl_container.sh.jinja
and let train.py own this: the existing if config.offline block sets them
to 1, and a new else block explicitly sets them to 0 to clear any value
inherited from the container environment.
@Neonkraft Neonkraft merged commit 1787551 into main Apr 29, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant