compare to remote HEAD by CharlieFRuan · Pull Request #1 · FrontierCS/SkyRL

CharlieFRuan · 2026-03-09T03:38:48Z

No description provided.

Adds examples/evolve/ with the SkyRL training integration for the EvolveAgent advisor RL loop (main_evolve.py + train_evolve.sh). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…s=10 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

enable_auto_tool_choice + tool_call_parser=qwen3_coder (advisor uses get_call_code tool) language_model_only=true + attention_backend=FLASH_ATTN Intentionally omitting reasoning_parser so thinking tokens stay in content and are captured in the training token sequence.

- CKPTS_DIR, EXPORTS_DIR, LOG_DIR → /data/qmang/outputs/ (avoid ~18GB checkpoints in home) - HF_HOME → /data/qmang/hf_cache - TRITON_CACHE_DIR → /data/qmang/triton_cache - TORCH_HOME → /data/qmang/torch_cache

…pyarrow fixes - Configure all 8 GPUs for advisor vLLM + FSDP training (frozen solver uses GPT-5 via OpenAI API) - Pin pyarrow>=20,<22 to avoid jemalloc background thread segfault in multiprocessing.spawn - Set ARROW_DEFAULT_MEMORY_POOL=system and disable jemalloc background thread in runtime env - Guard eval when eval_dataloader is None in trainer - Add Qwen3.5 accuracy+thinking jinja2 template - Add binary_search full_context example config - Update uv.lock

joyemang33 and others added 26 commits March 19, 2026 17:56

feat: add EvolveGenerator training entrypoint

f700cd7

Adds examples/evolve/ with the SkyRL training integration for the EvolveAgent advisor RL loop (main_evolve.py + train_evolve.sh). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: modify max_advisor_context_iters

03f8809

fix: add PYTHONPATH, n_samples_per_prompt=8, max_advisor_context_iter…

7a484dc

…s=10 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: use 2 GPUs, Qwen3.5-9B local path, correct model name handling

28f6090

fix: change HTTP endpoint port to 8002 (8000 and 8001 occupied)

21392e1

fix: use 4 GPUs

87c8ea3

fix: set UV_CACHE_DIR to /data/qmang/uv_cache

3ed697e

fix: redirect all outputs/caches to /data/qmang

1c75eec

- CKPTS_DIR, EXPORTS_DIR, LOG_DIR → /data/qmang/outputs/ (avoid ~18GB checkpoints in home) - HF_HOME → /data/qmang/hf_cache - TRITON_CACHE_DIR → /data/qmang/triton_cache - TORCH_HOME → /data/qmang/torch_cache

fix: use UV_PROJECT_ENVIRONMENT instead of symlink for venv on /data

8ba21ea

update training script

1825858

go

04eb4e7

111

29a9dc6

good

56687fd

feat: separate solver and advisor

36d5570

clean up unused things

b4e1a57

feat: add logging dir

e9d22e0

remove jinja unused

e5ef2b5

Make script runnable with Qwen3 and non-QMang machine

f1f3819

train-train-train

38883bd

feat:

a78df21

update training script

bcfec44

update

a6a2f83

runnable on b200 e2e

c09eb7a

nonfrozen-training

19f1350

CharlieFRuan force-pushed the main branch from c32f878 to 972be5e Compare March 19, 2026 17:59

CharlieFRuan closed this Mar 19, 2026

CharlieFRuan reopened this Mar 19, 2026

[WIP] Set up Qwen3.5

97fec32

CharlieFRuan and others added 11 commits March 19, 2026 19:34

more changes for qwen3.5

4bfaa9e

remove claude debug things that broke weight sync

06ef790

stepwise H100

f9ab318

log prompt

d75ab45

trivial

59123ef

Update script to use step-wise

0ef87df

Charlie pass on knobs

9ecfa07

add ckpt interval

396d825

reduce context length in RL training

b63bec2

finalize H100 recipe

c9ae082

update recipe

eba3cae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compare to remote HEAD#1

compare to remote HEAD#1
CharlieFRuan wants to merge 38 commits into
remote-originfrom
main

CharlieFRuan commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CharlieFRuan commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants