Skip to content

feat: unify training rendering and harden FrozenLake VL#176

Merged
benjibc merged 7 commits intomainfrom
benjibc/frozenlake-vl-rollout-hardening
Mar 7, 2026
Merged

feat: unify training rendering and harden FrozenLake VL#176
benjibc merged 7 commits intomainfrom
benjibc/frozenlake-vl-rollout-hardening

Conversation

@benjibc
Copy link
Copy Markdown
Contributor

@benjibc benjibc commented Mar 7, 2026

Summary

  • harden the FrozenLake multimodal rollout path for Kimi K2.5 VL and Qwen3 VL, including prompt-faithful token/mask accounting, verifier-side validation, and validation screenshots for ep logs
  • unify cookbook training onto shared token-level rendering/masking so SFT, DPO, ORPO, and FrozenLake RL all use the same target_tokens + weights representation
  • keep FrozenLake training aligned with eval-protocol by deriving the training datum from the same per-token mask that the UI renders
  • add a qwen3-4b smoke-test CI workflow for SFT, DPO, and GRPO, plus local unit/import coverage for the shared rendering path
  • require eval-protocol>=0.3.23 so the cookbook picks up the released ep logs transcript/token-debug UI changes and the relaxed fireworks-ai dependency range from eval-protocol

Eval Protocol Releases

Validation

  • ./training/.venv/bin/pytest -q training/tests/unit training/tests/test_smoke_imports.py training/examples/frozen_lake/test_masking.py
  • env -u FIREWORKS_API_KEY -u FIREWORKS_ACCOUNT_ID -u FIREWORKS_BASE_URL -u FIREWORKS_INFERENCE_URL -u FIREWORKS_HOTLOAD_API_URL -u FIREWORKS_GATEWAY_SECRET ./training/.venv/bin/pytest -q training/tests/smoke_test/test_sft_smoke.py training/tests/smoke_test/test_dpo_smoke.py training/tests/smoke_test/test_grpo_smoke.py
  • real visual verifier runs against:
    • accounts/fireworks/models/kimi-k2p5
    • accounts/fireworks/models/qwen3-vl-30b-a3b-instruct
  • Chromium verification against the live ep logs UI

Smoke CI

The new workflow expects these GitHub secrets/vars for remote smoke runs:

  • secrets: FIREWORKS_API_KEY, FIREWORKS_ACCOUNT_ID
  • optional vars: FIREWORKS_SMOKE_TRAINING_SHAPE, FIREWORKS_SMOKE_BASE_MODEL, FIREWORKS_SMOKE_TOKENIZER_MODEL, FIREWORKS_BASE_URL, FIREWORKS_INFERENCE_URL, FIREWORKS_HOTLOAD_API_URL, FIREWORKS_CUSTOM_IMAGE_TAG

Defaults are wired to:

  • model: accounts/fireworks/models/qwen3-4b
  • tokenizer: Qwen/Qwen3-4B
  • shape: ts-qwen3-4b-smoke-v1

Screenshots

Kimi K2.5 VL

Kimi ep logs token debug

Qwen3 VL

Qwen3 VL ep logs token debug

@benjibc benjibc changed the title feat: harden FrozenLake visual tool-call rollouts feat: unify training rendering and harden FrozenLake VL Mar 7, 2026
@benjibc benjibc merged commit b5bd3a1 into main Mar 7, 2026
4 checks passed
@benjibc benjibc deleted the benjibc/frozenlake-vl-rollout-hardening branch March 7, 2026 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants