Add RL-Kernel linear_logp integration with TP2 benchmark config by inaniloquentee · Pull Request #2 · RL-Align/vime

inaniloquentee · 2026-06-27T12:40:31Z

Summary

Add RL-Kernel linear_logp integration and TP-aware tests for the vime benchmark path.
Add runtime counters for successful linear_logp calls: call count, token count, dispatch elapsed, and tokens-per-call totals/deltas.
Make scripts/run-qwen3-30B-A3B.sh default to a non-smoke 2xH100 performance pre-gate config from vime-RLK.md.
Keep the 2-card run as a baseline-vs-candidate metric gate before moving to the 8xH100 promotion benchmark.
Exclude experiments/ artifacts and result files from this PR.

2xH100 config

NUM_GPUS=2, MEGATRON_TP=2, MEGATRON_EP=2, MEGATRON_CP=1.
ROLLOUT_NUM_GPUS_PER_ENGINE=2, VLLM_GPU_MEMORY_UTILIZATION=0.50.
Performance pre-gate default: NUM_ROLLOUT=24, ROLLOUT_BATCH_SIZE=2, N_SAMPLES_PER_PROMPT=2, GLOBAL_BATCH_SIZE=4, MAX_TOKENS_PER_GPU=4096, ROLLOUT_MAX_RESPONSE_LEN=1024.
Defaults disable save/eval-before-train and use eager vLLM for the 2-card validation path.
Added shell validation so TP/EP/rollout GPU settings cannot exceed or fail to divide NUM_GPUS.

Runtime counters

train/rl_kernel_linear_logp_call_count_total|delta.
train/rl_kernel_linear_logp_token_count_total|delta.
train/rl_kernel_linear_logp_dispatch_elapsed_s_total|delta.
train/rl_kernel_linear_logp_tokens_per_call_total|delta.
Dispatch elapsed does not synchronize CUDA and is not a kernel-time benchmark; use it only to verify path activity. Promotion numbers should use logprob-time/step-time/profiler metrics.

Acceptance

Run baseline with RL-Kernel off and candidate with VIME_RL_KERNEL=1, VIME_RL_KERNEL_OPS=linear_logp, VIME_RL_KERNEL_STRICT=1.
Candidate must have rl_kernel_fallback_count = 0, positive linear_logp call/token deltas, and finite logprob/loss/reward metrics.
2-card gate keeps the same metric direction as the 8-card benchmark: reward/logprob quality not worse, and logprob time or peak VRAM clearly lower.
Each run should have at least 24 train steps and discard the first 5 steps as warmup for summary statistics.

Tests

bash -n scripts/run-qwen3-30B-A3B.sh — passed.
git diff --check — passed.
PYTHONPATH=. pytest tests/test_rl_kernel_args.py tests/test_rl_kernel_linear_logp_integration.py tests/test_rl_kernel_logp_integration.py tests/test_metric_report.py tests/test_value_temperature.py -q — 47 passed, 1 warning.
pre-commit run --files scripts/run-qwen3-30B-A3B.sh vime-RLK.md vime/backends/megatron_utils/rl_kernel.py vime/backends/megatron_utils/model.py tests/test_rl_kernel_linear_logp_integration.py — passed.

coderabbitai · 2026-06-27T12:40:39Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3a5fe9da-85fe-4b2a-aad5-9f6413934e48

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch vime-rlk-tp2-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

Signed-off-by: inaniloquentee <3051000145@qq.com>

Add RL-Kernel linear_logp integration with TP2 benchmark config

ac79a46

inaniloquentee added 3 commits July 1, 2026 20:56

Document 2xH100 RL-Kernel dev validation

1aef1fe

Signed-off-by: inaniloquentee <3051000145@qq.com>

Fix 2xH100 RL-Kernel benchmark defaults

47980cf

Signed-off-by: inaniloquentee <3051000145@qq.com>

Add RL-Kernel runtime counters for TP2 benchmark

7447813

Signed-off-by: inaniloquentee <3051000145@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add RL-Kernel linear_logp integration with TP2 benchmark config#2

Add RL-Kernel linear_logp integration with TP2 benchmark config#2
inaniloquentee wants to merge 4 commits into
mainfrom
vime-rlk-tp2-integration

inaniloquentee commented Jun 27, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

inaniloquentee commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

2xH100 config

Runtime counters

Acceptance

Tests

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

inaniloquentee commented Jun 27, 2026 •

edited

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading