Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
d967e93
Auto-create sessions on first access for restart tolerance
DavidBellamy Apr 5, 2026
2ae96dd
Fix: recover from SGLang rollback failures in session proxy
DavidBellamy Apr 5, 2026
e0fc889
Revert "[BUGFIX] [P2PRDMA] Add rollout post-processing after P2PRDMA …
JD-ETH Apr 5, 2026
dd188aa
Fix null body crash and case-insensitive rollback detection
DavidBellamy Apr 5, 2026
29a0dca
Add session TTL eviction and make GET endpoint side-effect free
DavidBellamy Apr 5, 2026
f700bd8
Fix _truncate_sample_output to truncate rollout_routed_experts
DavidBellamy Apr 5, 2026
2564535
Simplify rollout_routed_experts slice to len(tokens) - 1
DavidBellamy Apr 5, 2026
ef5dda6
[Fix] fix ci (#894)
yushengsu-thu Apr 5, 2026
a3db3a9
Avoid threading for ray getting object (#886)
fzyzcjy Apr 5, 2026
4dd7770
Add explicit errors for unsupported Megatron profiles (#887)
fzyzcjy Apr 5, 2026
649a353
Add nvfp4 quantizer files (#907)
zianglih Apr 6, 2026
3572922
Bump flash-linear-attention version to 0.4.2 (#892)
Zhichenzzz Apr 6, 2026
8146a78
[BUGFIX] Invoke "post_process_quantization" by default after weight u…
JensenFire Apr 7, 2026
eaa36a2
Add heartbeat and id to session server (#866)
maocheng23 Apr 7, 2026
70dc402
fix: adding thin glm5 image to docker build + latest tag sync (#871)
dougyster Apr 7, 2026
c198efa
Add consistent hashing routing policy for rollout (#891)
yueming-yuan Apr 7, 2026
afc5b55
[example] add retool v2 example with multi-turn framework interfaces …
PopSoda2002 Apr 7, 2026
4db9bfe
Expose rollout-batch-size, n-samples-per-prompt, global-batch-size as…
Shi-Dong Apr 7, 2026
6b58ebd
chore: remove obsolete swe-agent server.py and run-qwen3.sh (#952)
guapisolo Apr 8, 2026
41615af
Add weight staleness control for fully async rollout (#958)
maocheng23 Apr 9, 2026
94dbb8f
Fix/pause generation mode (#924)
maocheng23 Apr 9, 2026
4d8b007
[v0.5.10][1] Bump sglang to v0.5.10 (#898)
yueming-yuan Apr 9, 2026
ef228e6
[v0.5.10][2] Fix apply_chat_template behavior for transformers >=5.0 …
yueming-yuan Apr 9, 2026
b1a4346
[v0.5.10][3] Fix processor return_tensors duplicate kwarg for transfo…
yueming-yuan Apr 9, 2026
2a99108
[v0.5.10][4] Fix _no_split_modules set not subscriptable in transform…
yueming-yuan Apr 9, 2026
c74392d
[v0.5.10][5] Disable piecewise cuda graph to avoid NVLS oom (#935)
yueming-yuan Apr 9, 2026
d6158f8
[v0.5.10][6][FSDP] fix outdated weight update logic in FSDP (#948)
yueming-yuan Apr 9, 2026
c4e50c8
[v0.5.10][7][FSDP] move FSDP to experimental and disable by default (…
yueming-yuan Apr 9, 2026
8d66ac1
Add skiplist and more robust calculation on val (#965)
maocheng23 Apr 9, 2026
02f6e05
[fix] tiny fix debug rollout only in weight version check (#967)
yueming-yuan Apr 10, 2026
eb294e3
feat: real cp support with relayout fix for qwen3.5 train/rollout mis…
Zhichenzzz Apr 11, 2026
82bf196
[AMD] Upgrade to sglv0.5.10 (#973)
zyzshishui Apr 13, 2026
ef7481a
switch model to actor (#756)
maocheng23 Apr 13, 2026
85fe651
[fix] support general logic to bypass fp32 downcast and fix qwen35 A_…
guapisolo Apr 14, 2026
6cc3feb
fix: populate prefix_cache_info in OpenAI/session rollout path (#960)
guapisolo Apr 14, 2026
6706c73
Remove prepare_harbor_tasks.py; use harbor-private adapters (#982)
Shi-Dong Apr 14, 2026
f144961
[fix] Skip flush_cache in in_place mode and add fully async example (…
maocheng23 Apr 15, 2026
c271e14
GLM47 full cmd for async and sync reasoning (#986)
maocheng23 Apr 16, 2026
7b7efa9
fix(rollout): guard round(None) in zero-std metric aggregation
DavidBellamy Apr 16, 2026
f0c9d3c
feat(sglang_engine): allow PD worker_type on /add_worker registration…
DavidBellamy Apr 16, 2026
779839c
fix(rollout): propagate PYTHONPATH to Ray remote actors
DavidBellamy Apr 16, 2026
9a0ef97
fix(session-server): strip stale hop-by-hop headers when re-emitting …
DavidBellamy Apr 17, 2026
643bfdf
Deploy: merge fix/propagate-pythonpath-to-ray-remote-actors
github-actions[bot] Apr 17, 2026
c24a0de
Deploy: merge fix/allow-pd-worker-type-on-miles-router
github-actions[bot] Apr 17, 2026
00dc588
Deploy: merge fix/guard-round-none-in-zero-std-metrics
github-actions[bot] Apr 17, 2026
a189e79
Deploy: merge fix/truncate-routed-experts
github-actions[bot] Apr 17, 2026
e9bace5
Deploy: merge fix/rollback-error-recovery
github-actions[bot] Apr 17, 2026
7c3fa80
Deploy: merge fix/session-auto-create
github-actions[bot] Apr 17, 2026
af3de70
state:c271e14f7916e47299f74d00280aaa7cfec0bdde|fix/allow-pd-worker-ty…
github-actions[bot] Apr 17, 2026
c15c704
arguments: allow 'assistant' in --tito-allowed-append-roles choices
DavidBellamy Apr 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,11 @@ jobs:
${{ inputs.custom_tag && format('--custom-tag {0}', inputs.custom_tag) || '' }} \
--push

- name: Point latest to current dev
if: github.event_name == 'schedule' || inputs.simulate_schedule == true
run: |
docker buildx imagetools create -t radixark/miles:latest radixark/miles:dev

- name: Prune old dev tags
if: github.event_name == 'schedule'
run: |
Expand Down Expand Up @@ -193,3 +198,33 @@ jobs:
echo " Failed to delete ${TAG} (HTTP ${HTTP_CODE})"
fi
done

build-and-push-dev-glm:
needs: [build-and-push]
# Only rebuild dev-glm when the dev image was built (schedule, push to main, or dispatch with image_tag=dev)
if: needs.build-and-push.result == 'success' && (github.event_name == 'schedule' || inputs.simulate_schedule == true)
runs-on: self-hosted
steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
driver-opts: |
image=moby/buildkit:latest
network=host

- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Build and push dev-glm
run: |
docker buildx build \
-f docker/glm5/Dockerfile.dev-glm \
-t radixark/miles:dev-glm \
--push \
.
118 changes: 3 additions & 115 deletions .github/workflows/pr-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -166,118 +166,6 @@ jobs:
shell: bash
run: python tests/ci/gpu_lock_exec.py --count ${{ matrix.info.num_gpus }} -- pytest tests/${{ matrix.info.test_file }}

unit-test:
if: (github.event_name == 'workflow_dispatch') || (github.event.pull_request && contains(github.event.pull_request.labels.*.name, 'run-unit-test'))
runs-on: self-hosted
container:
image: radixark/miles:dev
options: >
--gpus all
--ipc=host
--shm-size=32g
--ulimit memlock=-1
--ulimit stack=67108864
--memory=0
--memory-swap=0
-v /mnt/nvme0n1/miles_ci:/data/miles_ci
-v /mnt/nvme0n1/miles_ci/models:/root/models
-v /mnt/nvme0n1/miles_ci/datasets:/root/datasets
--privileged
--ulimit nofile=65535:65535
-v /tmp:/tmp
strategy:
fail-fast: false
matrix:
info: [{"num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py"}]
defaults:
run:
working-directory: ${{ github.workspace }}
env:
GITHUB_COMMIT_NAME: ${{ github.sha }}_${{ github.event.pull_request.number || 'non-pr' }}
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
MILES_TEST_ENABLE_INFINITE_RUN: ${{ (github.event_name == 'workflow_dispatch' && github.event.inputs.infinite_run) || 'false' }}
MILES_TEST_USE_DEEPEP: ${{ matrix.info.use_deepep || '0' }}
MILES_TEST_USE_FP8_ROLLOUT: ${{ matrix.info.use_fp8_rollout || '0' }}
MILES_TEST_USE_INT4_ROLLOUT: ${{ matrix.info.use_int4_rollout || '0' }}
MILES_TEST_USE_BRIDGE: ${{ matrix.info.use_bridge || '0' }}
MILES_TEST_ENABLE_EVAL: ${{ matrix.info.enable_eval || '1' }}
MILES_TEST_FEW_GPU: '0'
SESSION_TEST_MODEL_FAMILY: ${{ matrix.info.model_family || '' }}

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Cleanup Ray processes
shell: bash
run: |
pkill -9 -f 'ray::' 2>/dev/null || true
pkill -9 -f raylet 2>/dev/null || true
pkill -9 -f gcs_server 2>/dev/null || true
pkill -9 -f 'ray-dashboard' 2>/dev/null || true
pkill -9 sglang 2>/dev/null || true
ray stop --force 2>/dev/null || true
rm -rf /tmp/ray/* 2>/dev/null || true
sleep 3


- name: Resolve dependency refs
id: resolve-refs
shell: bash
env:
PR_BODY: ${{ github.event.pull_request.body || '' }}
INPUT_MEGATRON_PR: ${{ github.event.inputs.ci_megatron_pr || '' }}
INPUT_SGLANG_PR: ${{ github.event.inputs.ci_sglang_pr || '' }}
run: |
# Priority: workflow_dispatch input > PR description > default
MEGATRON_PR="${INPUT_MEGATRON_PR}"
SGLANG_PR="${INPUT_SGLANG_PR}"

# Parse PR description for "ci-megatron-pr:" and "ci-sglang-pr:"
if [ -n "$PR_BODY" ]; then
PR_MEGATRON_PR=$(echo "$PR_BODY" | grep -oP '(?<=ci-megatron-pr:\s)\S+' || true)
PR_SGLANG_PR=$(echo "$PR_BODY" | grep -oP '(?<=ci-sglang-pr:\s)\S+' || true)
[ -z "$MEGATRON_PR" ] && [ -n "$PR_MEGATRON_PR" ] && MEGATRON_PR="$PR_MEGATRON_PR"
[ -z "$SGLANG_PR" ] && [ -n "$PR_SGLANG_PR" ] && SGLANG_PR="$PR_SGLANG_PR"
fi

# Defaults
[ -z "$MEGATRON_PR" ] && MEGATRON_PR="miles-main"
[ -z "$SGLANG_PR" ] && SGLANG_PR="sglang-miles"

# Convert "#N" PR syntax to git fetch ref: "pull/N/head"
resolve_fetch_ref() {
local ref="$1"
if [[ "$ref" =~ ^#([0-9]+)$ ]]; then
echo "pull/${BASH_REMATCH[1]}/head"
else
echo "$ref"
fi
}
MEGATRON_FETCH=$(resolve_fetch_ref "$MEGATRON_PR")
SGLANG_FETCH=$(resolve_fetch_ref "$SGLANG_PR")

echo "ci_megatron_pr=$MEGATRON_FETCH" >> $GITHUB_OUTPUT
echo "ci_sglang_pr=$SGLANG_FETCH" >> $GITHUB_OUTPUT
echo "Resolved: megatron=$MEGATRON_PR -> fetch=$MEGATRON_FETCH, sglang=$SGLANG_PR -> fetch=$SGLANG_FETCH"

- name: Install
shell: bash
env:
MEGATRON_PR: ${{ steps.resolve-refs.outputs.ci_megatron_pr }}
SGLANG_PR: ${{ steps.resolve-refs.outputs.ci_sglang_pr }}
run: |
cd /sgl-workspace/sglang && git reset --hard HEAD && git clean -fd && git fetch origin "$SGLANG_PR" && git checkout -f FETCH_HEAD && git log --oneline -1 && pip install -e python --no-deps --break-system-packages
cd /root/Megatron-LM && git reset --hard HEAD && git clean -fd && git fetch origin "$MEGATRON_PR" && git checkout -f FETCH_HEAD && git log --oneline -1 && pip install -e . --no-deps --break-system-packages
cd $GITHUB_WORKSPACE && pip install -e . --no-deps --break-system-packages
pip install pytest-asyncio --break-system-packages


- name: Execute
shell: bash
run: python tests/ci/gpu_lock_exec.py --count ${{ matrix.info.num_gpus }} -- python tests/${{ matrix.info.test_file }}

e2e-test-sglang:
if: (github.event_name == 'workflow_dispatch') || (github.event.pull_request && contains(github.event.pull_request.labels.*.name, 'run-ci-sglang'))
runs-on: self-hosted
Expand Down Expand Up @@ -412,7 +300,7 @@ jobs:
strategy:
fail-fast: false
matrix:
info: [{"num_gpus": 8, "test_file": "e2e/short/test_qwen2.5_0.5B_gsm8k_async_short.py"}, {"num_gpus": 8, "test_file": "e2e/short/test_qwen2.5_0.5B_gsm8k_short.py"}, {"num_gpus": 8, "test_file": "e2e/short/test_qwen3_0.6B_fsdp_colocated_2xGPU.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config_mixed_offload.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config_mixed_offload_ft.py"}]
info: [{"num_gpus": 8, "test_file": "e2e/short/test_qwen2.5_0.5B_gsm8k_async_short.py"}, {"num_gpus": 8, "test_file": "e2e/short/test_qwen2.5_0.5B_gsm8k_short.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config_mixed_offload.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config_mixed_offload_ft.py"}]
defaults:
run:
working-directory: ${{ github.workspace }}
Expand Down Expand Up @@ -524,7 +412,7 @@ jobs:
strategy:
fail-fast: false
matrix:
info: [{"num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py"}, {"num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_vl_4B_fsdp.py"}, {"num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_0.6B_fsdp_distributed.py"}, {"num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_0.6B_megatron_fsdp_align.py"}]
info: [{"name": "[FSDP] qwen3-4B-fsdp-true-on-policy", "num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py"}, {"name": "[FSDP] qwen3-vl-4B-fsdp", "num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_vl_4B_fsdp.py"}, {"name": "[FSDP] qwen3-0.6B-fsdp-distributed", "num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_0.6B_fsdp_distributed.py"}, {"name": "[FSDP] qwen3-0.6B-megatron-fsdp-align", "num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_0.6B_megatron_fsdp_align.py"}, {"name": "[FSDP] qwen3-0.6B-fsdp-colocated-2xGPU", "num_gpus": 8, "test_file": "e2e/short/test_qwen3_0.6B_fsdp_colocated_2xGPU.py"}]
defaults:
run:
working-directory: ${{ github.workspace }}
Expand Down Expand Up @@ -1375,7 +1263,7 @@ jobs:
strategy:
fail-fast: false
matrix:
info: [{"num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py"}, {"num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_vl_4B_fsdp.py"}, {"num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_0.6B_fsdp_distributed.py"}, {"num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_0.6B_megatron_fsdp_align.py"}, {"num_gpus": 8, "test_file": "e2e/megatron/test_quick_start_glm4_9B.py"}, {"name": "qwen3-30B-A3B-deepep-fp8", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B.py", "use_deepep": "1", "use_fp8_rollout": "1"}, {"name": "qwen3-30B-A3B-bridge", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B.py", "use_bridge": "1"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B_r3.py", "use_deepep": "1", "use_fp8_rollout": "1"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B_r3.py"}, {"num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_4B_ppo.py"}, {"num_gpus": 8, "test_file": "e2e/megatron/test_moonlight_16B_A3B.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "e2e/megatron/test_moonlight_16B_A3B_r3.py"}, {"num_gpus": 8, "test_file": "e2e/megatron/test_mimo_7B_mtp_only_grad.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "e2e/megatron/test_glm47_flash_r3_mtp.py"}, {"num_gpus": 8, "test_file": "e2e/lora/test_lora_qwen2.5_0.5B.py"}, {"num_gpus": 8, "test_file": "e2e/short/test_qwen2.5_0.5B_gsm8k_async_short.py"}, {"num_gpus": 8, "test_file": "e2e/short/test_qwen2.5_0.5B_gsm8k_short.py"}, {"num_gpus": 8, "test_file": "e2e/short/test_qwen3_0.6B_fsdp_colocated_2xGPU.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config.py"}, {"num_gpus": 4, "test_file": "e2e/sglang_config/test_sglang_config_mixed_offload.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config_mixed_offload_ft.py"}, {"num_gpus": 8, "test_file": "e2e/precision/test_qwen3_0.6B_parallel_check.py"}, {"num_gpus": 8, "test_file": "e2e/ckpt/test_qwen3_4B_ckpt.py"}, {"num_gpus": 8, "test_file": "e2e/ckpt/test_qwen3_4B_ckpt.py --async-save"}, {"num_gpus": 8, "test_file": "e2e/ckpt/test_glm47_flash_ckpt.py"}, {"num_gpus": 8, "test_file": "e2e/ckpt/test_glm47_flash_ckpt.py --async-save"}, {"num_gpus": 8, "test_file": "e2e/long/test_qwen2.5_0.5B_gsm8k.py"}, {"num_gpus": 8, "test_file": "e2e/long/test_qwen2.5_0.5B_gsm8k_async.py"}, {"name": "qwen3-30B-A3B-bf16", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B.py", "use_deepep": "0", "use_fp8_rollout": "0"}, {"name": "qwen3-30B-A3B-rollout-fp8", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B.py", "use_deepep": "1", "use_fp8_rollout": "1"}, {"name": "qwen3-30B-A3B-rollout-int4", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B.py", "use_deepep": "0", "use_fp8_rollout": "0", "use_int4_rollout": "1"}]
info: [{"name": "[FSDP] qwen3-4B-fsdp-true-on-policy", "num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py"}, {"name": "[FSDP] qwen3-vl-4B-fsdp", "num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_vl_4B_fsdp.py"}, {"name": "[FSDP] qwen3-0.6B-fsdp-distributed", "num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_0.6B_fsdp_distributed.py"}, {"name": "[FSDP] qwen3-0.6B-megatron-fsdp-align", "num_gpus": 8, "test_file": "e2e/fsdp/test_qwen3_0.6B_megatron_fsdp_align.py"}, {"name": "[FSDP] qwen3-0.6B-fsdp-colocated-2xGPU", "num_gpus": 8, "test_file": "e2e/short/test_qwen3_0.6B_fsdp_colocated_2xGPU.py"}, {"num_gpus": 8, "test_file": "e2e/megatron/test_quick_start_glm4_9B.py"}, {"name": "qwen3-30B-A3B-deepep-fp8", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B.py", "use_deepep": "1", "use_fp8_rollout": "1"}, {"name": "qwen3-30B-A3B-bridge", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B.py", "use_bridge": "1"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B_r3.py", "use_deepep": "1", "use_fp8_rollout": "1"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B_r3.py"}, {"num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_4B_ppo.py"}, {"num_gpus": 8, "test_file": "e2e/megatron/test_moonlight_16B_A3B.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "e2e/megatron/test_moonlight_16B_A3B_r3.py"}, {"num_gpus": 8, "test_file": "e2e/megatron/test_mimo_7B_mtp_only_grad.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "e2e/megatron/test_glm47_flash_r3_mtp.py"}, {"num_gpus": 8, "test_file": "e2e/lora/test_lora_qwen2.5_0.5B.py"}, {"num_gpus": 8, "test_file": "e2e/short/test_qwen2.5_0.5B_gsm8k_async_short.py"}, {"num_gpus": 8, "test_file": "e2e/short/test_qwen2.5_0.5B_gsm8k_short.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config_mixed_offload.py"}, {"num_gpus": 8, "test_file": "e2e/sglang_config/test_sglang_config_mixed_offload_ft.py"}, {"num_gpus": 8, "test_file": "e2e/precision/test_qwen3_0.6B_parallel_check.py"}, {"num_gpus": 8, "test_file": "e2e/ckpt/test_qwen3_4B_ckpt.py"}, {"num_gpus": 8, "test_file": "e2e/ckpt/test_qwen3_4B_ckpt.py --async-save"}, {"num_gpus": 8, "test_file": "e2e/ckpt/test_glm47_flash_ckpt.py"}, {"num_gpus": 8, "test_file": "e2e/ckpt/test_glm47_flash_ckpt.py --async-save"}, {"num_gpus": 8, "test_file": "e2e/long/test_qwen2.5_0.5B_gsm8k.py"}, {"num_gpus": 8, "test_file": "e2e/long/test_qwen2.5_0.5B_gsm8k_async.py"}, {"name": "qwen3-30B-A3B-bf16", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B.py", "use_deepep": "0", "use_fp8_rollout": "0"}, {"name": "qwen3-30B-A3B-rollout-fp8", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B.py", "use_deepep": "1", "use_fp8_rollout": "1"}, {"name": "qwen3-30B-A3B-rollout-int4", "num_gpus": 8, "test_file": "e2e/megatron/test_qwen3_30B_A3B.py", "use_deepep": "0", "use_fp8_rollout": "0", "use_int4_rollout": "1"}]
defaults:
run:
working-directory: ${{ github.workspace }}
Expand Down
16 changes: 5 additions & 11 deletions .github/workflows/pr-test.yml.j2
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
<% set default_image = 'radixark/miles:dev' %>

<% set fsdp_tests = [
{'test_file': 'e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py', 'num_gpus': 8},
{'test_file': 'e2e/fsdp/test_qwen3_vl_4B_fsdp.py', 'num_gpus': 8},
{'test_file': 'e2e/fsdp/test_qwen3_0.6B_fsdp_distributed.py', 'num_gpus': 8},
{'test_file': 'e2e/fsdp/test_qwen3_0.6B_megatron_fsdp_align.py', 'num_gpus': 8},
{'name': '[FSDP] qwen3-4B-fsdp-true-on-policy', 'test_file': 'e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py', 'num_gpus': 8},
{'name': '[FSDP] qwen3-vl-4B-fsdp', 'test_file': 'e2e/fsdp/test_qwen3_vl_4B_fsdp.py', 'num_gpus': 8},
{'name': '[FSDP] qwen3-0.6B-fsdp-distributed', 'test_file': 'e2e/fsdp/test_qwen3_0.6B_fsdp_distributed.py', 'num_gpus': 8},
{'name': '[FSDP] qwen3-0.6B-megatron-fsdp-align', 'test_file': 'e2e/fsdp/test_qwen3_0.6B_megatron_fsdp_align.py', 'num_gpus': 8},
{'name': '[FSDP] qwen3-0.6B-fsdp-colocated-2xGPU', 'test_file': 'e2e/short/test_qwen3_0.6B_fsdp_colocated_2xGPU.py', 'num_gpus': 8},
] %>

<% set megatron_tests = [
Expand All @@ -27,7 +28,6 @@
<% set short_tests = [
{'test_file': 'e2e/short/test_qwen2.5_0.5B_gsm8k_async_short.py', 'num_gpus': 8},
{'test_file': 'e2e/short/test_qwen2.5_0.5B_gsm8k_short.py', 'num_gpus': 8},
{'test_file': 'e2e/short/test_qwen3_0.6B_fsdp_colocated_2xGPU.py', 'num_gpus': 8},
{'test_file': 'e2e/sglang_config/test_sglang_config.py', 'num_gpus': 8},
{'test_file': 'e2e/sglang_config/test_sglang_config_mixed_offload.py', 'num_gpus': 8},
{'test_file': 'e2e/sglang_config/test_sglang_config_mixed_offload_ft.py', 'num_gpus': 8},
Expand Down Expand Up @@ -67,12 +67,6 @@
{'test_file': 'utils/test_sglang_config.py', 'num_gpus': 0},
],
},
'unit-test': {
'label': 'run-unit-test',
'tests': [
{'test_file': 'e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py', 'num_gpus': 8}
],
},
'e2e-test-sglang': {
'label': 'run-ci-sglang',
'test_executor': 'pytest',
Expand Down
8 changes: 4 additions & 4 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
#
# 2. radixark/miles:dev-cu13-arm64
# build-arg:ENABLE_CUDA_13=1 \
# build-arg:SGLANG_IMAGE_TAG=v0.5.9-cu130-arm64 \
# build-arg:SGLANG_IMAGE_TAG=v0.5.10-cu130 \
# build-arg:WHEELS_TAG=cu130-aarch64 \

ARG SGLANG_IMAGE_TAG=v0.5.9
ARG SGLANG_IMAGE_TAG=v0.5.10
FROM lmsysorg/sglang:${SGLANG_IMAGE_TAG} AS sglang

# ======================================== Arguments =============================================
Expand Down Expand Up @@ -63,7 +63,7 @@ RUN pip install /tmp/wheels/flash_attn_3-*.whl && \

RUN pip install git+https://github.com/ISEEKYAN/mbridge.git@89eb10887887bc74853f89a4de258c0702932a1c --no-deps

RUN pip install flash-linear-attention==0.4.1
RUN pip install flash-linear-attention==0.4.2
RUN pip install tilelang -f https://tile-ai.github.io/whl/nightly/cu128/

RUN if [ "${ENABLE_CUDA_13}" = "1" ]; then \
Expand All @@ -88,7 +88,7 @@ RUN pip install megatron-energon --no-deps
RUN pip install multi-storage-client --no-deps

COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN rm -rf /usr/lib/python3/dist-packages/jwt /usr/lib/python3/dist-packages/PyJWT* && pip install -r /tmp/requirements.txt

# https://github.com/pytorch/pytorch/issues/168167
RUN if [ "${ENABLE_CUDA_13}" = "1" ]; then \
Expand Down
Loading
Loading