feat: unify training rendering and harden FrozenLake VL by benjibc · Pull Request #176 · fw-ai/cookbook

benjibc · 2026-03-07T00:54:53Z

Summary

harden the FrozenLake multimodal rollout path for Kimi K2.5 VL and Qwen3 VL, including prompt-faithful token/mask accounting, verifier-side validation, and validation screenshots for ep logs
unify cookbook training onto shared token-level rendering/masking so SFT, DPO, ORPO, and FrozenLake RL all use the same target_tokens + weights representation
keep FrozenLake training aligned with eval-protocol by deriving the training datum from the same per-token mask that the UI renders
add a qwen3-4b smoke-test CI workflow for SFT, DPO, and GRPO, plus local unit/import coverage for the shared rendering path
require eval-protocol>=0.3.23 so the cookbook picks up the released ep logs transcript/token-debug UI changes and the relaxed fireworks-ai dependency range from eval-protocol

Eval Protocol Releases

UI PR: feat: align ep logs chat view with tokenized rollout prompts eval-protocol/python-sdk#431
Hotfix PR: Relax fireworks-ai dependency pin eval-protocol/python-sdk#432
Release: https://github.com/eval-protocol/python-sdk/releases/tag/v0.3.23
PyPI: eval-protocol==0.3.23

Validation

./training/.venv/bin/pytest -q training/tests/unit training/tests/test_smoke_imports.py training/examples/frozen_lake/test_masking.py
env -u FIREWORKS_API_KEY -u FIREWORKS_ACCOUNT_ID -u FIREWORKS_BASE_URL -u FIREWORKS_INFERENCE_URL -u FIREWORKS_HOTLOAD_API_URL -u FIREWORKS_GATEWAY_SECRET ./training/.venv/bin/pytest -q training/tests/smoke_test/test_sft_smoke.py training/tests/smoke_test/test_dpo_smoke.py training/tests/smoke_test/test_grpo_smoke.py
real visual verifier runs against:
- accounts/fireworks/models/kimi-k2p5
- accounts/fireworks/models/qwen3-vl-30b-a3b-instruct
Chromium verification against the live ep logs UI

Smoke CI

The new workflow expects these GitHub secrets/vars for remote smoke runs:

secrets: FIREWORKS_API_KEY, FIREWORKS_ACCOUNT_ID
optional vars: FIREWORKS_SMOKE_TRAINING_SHAPE, FIREWORKS_SMOKE_BASE_MODEL, FIREWORKS_SMOKE_TOKENIZER_MODEL, FIREWORKS_BASE_URL, FIREWORKS_INFERENCE_URL, FIREWORKS_HOTLOAD_API_URL, FIREWORKS_CUSTOM_IMAGE_TAG

Defaults are wired to:

model: accounts/fireworks/models/qwen3-4b
tokenizer: Qwen/Qwen3-4B
shape: ts-qwen3-4b-smoke-v1

Screenshots

Kimi K2.5 VL

Qwen3 VL

…-rollout-hardening # Conflicts: # training/pyproject.toml

benjibc added 4 commits March 6, 2026 16:51

feat: harden FrozenLake VL rollouts

d1da902

chore: require eval-protocol 0.3.22

4b37d34

Merge remote-tracking branch 'origin/main' into benjibc/frozenlake-vl…

2b4d405

…-rollout-hardening # Conflicts: # training/pyproject.toml

feat: share training rendering and add smoke CI

9297143

benjibc changed the title ~~feat: harden FrozenLake visual tool-call rollouts~~ feat: unify training rendering and harden FrozenLake VL Mar 7, 2026

benjibc added 3 commits March 6, 2026 18:00

chore: bump eval-protocol to 0.3.23

27ce2ef

fix: support newer fireworks sdk error helpers

c2c8194

fix: lazily initialize FrozenLake multimodal client

85a11ae

benjibc requested review from Hecate0821, mayinghan and xiaoyifan March 7, 2026 05:02

mayinghan approved these changes Mar 7, 2026

View reviewed changes

benjibc merged commit b5bd3a1 into main Mar 7, 2026
4 checks passed

benjibc deleted the benjibc/frozenlake-vl-rollout-hardening branch March 7, 2026 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: unify training rendering and harden FrozenLake VL#176

feat: unify training rendering and harden FrozenLake VL#176
benjibc merged 7 commits intomainfrom
benjibc/frozenlake-vl-rollout-hardening

benjibc commented Mar 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

benjibc commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Eval Protocol Releases

Validation

Smoke CI

Screenshots

Kimi K2.5 VL

Qwen3 VL

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benjibc commented Mar 7, 2026 •

edited

Loading