Conversation
de9a147 to
a492d65
Compare
Greptile SummaryRemoves Note: Confidence Score: 5/5Safe to merge — documentation-only change with no code logic impact All three files receive identical, targeted doc fixes (flag removal). No code, config, or test logic is altered. The remaining observation about osmo/finetune.yaml is informational and out of scope for this PR. No files require special attention; all changes are straightforward documentation edits Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Start multi-GPU training\ntorchrun --nproc_per_node=8] --> B{--use-wandb\npassed?}
B -- Yes\n(before this PR) --> C[wandb prompts user\nfor login/project]
C --> D[Secondary GPU logs\nflood stdout]
D --> E[Prompt buried /\nnever answered]
E --> F[Data loading\ntimeout / hang]
B -- No\n(after this PR) --> G[Training starts\nimmediately]
G --> H[Completes successfully]
Reviews (2): Last reviewed commit: "nit" | Re-trigger Greptile |
Summary
Address https://nvbugspro.nvidia.com/bug/6062848
Detailed description
In a multi-GPU setup, the standard output (stdout) buffer gets flooded with logs from secondary GPUs. As a result, the wandb prompt requesting user input gets buried in the output. Because the prompt goes unanswered, the data loading process stalls, eventually leading to a timeout