Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
e0fc889
Revert "[BUGFIX] [P2PRDMA] Add rollout post-processing after P2PRDMA …
JD-ETH Apr 5, 2026
ef5dda6
[Fix] fix ci (#894)
yushengsu-thu Apr 5, 2026
a3db3a9
Avoid threading for ray getting object (#886)
fzyzcjy Apr 5, 2026
4dd7770
Add explicit errors for unsupported Megatron profiles (#887)
fzyzcjy Apr 5, 2026
649a353
Add nvfp4 quantizer files (#907)
zianglih Apr 6, 2026
3572922
Bump flash-linear-attention version to 0.4.2 (#892)
Zhichenzzz Apr 6, 2026
8146a78
[BUGFIX] Invoke "post_process_quantization" by default after weight u…
JensenFire Apr 7, 2026
eaa36a2
Add heartbeat and id to session server (#866)
maocheng23 Apr 7, 2026
70dc402
fix: adding thin glm5 image to docker build + latest tag sync (#871)
dougyster Apr 7, 2026
c198efa
Add consistent hashing routing policy for rollout (#891)
yueming-yuan Apr 7, 2026
afc5b55
[example] add retool v2 example with multi-turn framework interfaces …
PopSoda2002 Apr 7, 2026
4db9bfe
Expose rollout-batch-size, n-samples-per-prompt, global-batch-size as…
Shi-Dong Apr 7, 2026
6b58ebd
chore: remove obsolete swe-agent server.py and run-qwen3.sh (#952)
guapisolo Apr 8, 2026
41615af
Add weight staleness control for fully async rollout (#958)
maocheng23 Apr 9, 2026
94dbb8f
Fix/pause generation mode (#924)
maocheng23 Apr 9, 2026
4d8b007
[v0.5.10][1] Bump sglang to v0.5.10 (#898)
yueming-yuan Apr 9, 2026
ef228e6
[v0.5.10][2] Fix apply_chat_template behavior for transformers >=5.0 …
yueming-yuan Apr 9, 2026
b1a4346
[v0.5.10][3] Fix processor return_tensors duplicate kwarg for transfo…
yueming-yuan Apr 9, 2026
2a99108
[v0.5.10][4] Fix _no_split_modules set not subscriptable in transform…
yueming-yuan Apr 9, 2026
c74392d
[v0.5.10][5] Disable piecewise cuda graph to avoid NVLS oom (#935)
yueming-yuan Apr 9, 2026
d6158f8
[v0.5.10][6][FSDP] fix outdated weight update logic in FSDP (#948)
yueming-yuan Apr 9, 2026
c4e50c8
[v0.5.10][7][FSDP] move FSDP to experimental and disable by default (…
yueming-yuan Apr 9, 2026
8d66ac1
Add skiplist and more robust calculation on val (#965)
maocheng23 Apr 9, 2026
02f6e05
[fix] tiny fix debug rollout only in weight version check (#967)
yueming-yuan Apr 10, 2026
eb294e3
feat: real cp support with relayout fix for qwen3.5 train/rollout mis…
Zhichenzzz Apr 11, 2026
82bf196
[AMD] Upgrade to sglv0.5.10 (#973)
zyzshishui Apr 13, 2026
ef7481a
switch model to actor (#756)
maocheng23 Apr 13, 2026
85fe651
[fix] support general logic to bypass fp32 downcast and fix qwen35 A_…
guapisolo Apr 14, 2026
6cc3feb
fix: populate prefix_cache_info in OpenAI/session rollout path (#960)
guapisolo Apr 14, 2026
6706c73
Remove prepare_harbor_tasks.py; use harbor-private adapters (#982)
Shi-Dong Apr 14, 2026
f144961
[fix] Skip flush_cache in in_place mode and add fully async example (…
maocheng23 Apr 15, 2026
c271e14
GLM47 full cmd for async and sync reasoning (#986)
maocheng23 Apr 16, 2026
5d11fe2
fix: handle non-tool appended messages in TITO incremental tokenizati…
guapisolo Apr 19, 2026
ad01e63
[docker] Add sgl-model-gateway install and download .tar.gz assets (#…
guapisolo Apr 20, 2026
3270915
[ci] fix hf rate limit error by caching tokenizer loading (#1014)
guapisolo Apr 20, 2026
9a00364
Use load_generate_function in legacy sglang_rollout path (#1016)
maocheng23 Apr 20, 2026
cd41b28
Update CODEOWNERS to add new reviewers (#1021)
Ying1123 Apr 20, 2026
38f9183
Support moe lora for gpt-oss (#798)
gongyisheng Apr 20, 2026
641f071
[fix] restore expert_bias to fp32 before bridge weight export (#811)
yueming-yuan Apr 21, 2026
252fbec
[chore] drop legacy transformers upgrade pin for glm47-flash and qwen…
guapisolo Apr 21, 2026
5cc643f
[fix] Enforce param dtype before wrap ddp (#992)
guapisolo Apr 21, 2026
99956c0
[upgrade] update Megatron-Bridge source and LoRA CI to megatron e2e t…
yushengsu-thu Apr 21, 2026
85fdb7e
[CI] Drop --use-miles-router from R3 tests and add r3 comparasion tes…
guapisolo Apr 21, 2026
1f58c11
tito: support agent-layer-inserted assistant messages in append segme…
DavidBellamy Apr 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@
/miles/backends/sglang_utils/ @fzyzcjy @yueming-yuan @maocheng23 @yushengsu-thu
/miles/ray/ @fzyzcjy @yueming-yuan @maocheng23
/miles/rollout/ @fzyzcjy @yueming-yuan @guapisolo
/miles/rollout/session/ @fzyzcjy @yueming-yuan @guapisolo @maocheng23
/miles/rollout/session/ @fzyzcjy @yueming-yuan @guapisolo @maocheng23 @jybsuper
/miles/router/ @fzyzcjy @yueming-yuan @guapisolo
/miles/utils/ @fzyzcjy @yueming-yuan @guapisolo @maocheng23
/miles/utils/ @fzyzcjy @yueming-yuan @guapisolo @maocheng23 @jybsuper
35 changes: 35 additions & 0 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,11 @@ jobs:
${{ inputs.custom_tag && format('--custom-tag {0}', inputs.custom_tag) || '' }} \
--push

- name: Point latest to current dev
if: github.event_name == 'schedule' || inputs.simulate_schedule == true
run: |
docker buildx imagetools create -t radixark/miles:latest radixark/miles:dev

- name: Prune old dev tags
if: github.event_name == 'schedule'
run: |
Expand Down Expand Up @@ -193,3 +198,33 @@ jobs:
echo " Failed to delete ${TAG} (HTTP ${HTTP_CODE})"
fi
done

build-and-push-dev-glm:
needs: [build-and-push]
# Only rebuild dev-glm when the dev image was built (schedule, push to main, or dispatch with image_tag=dev)
if: needs.build-and-push.result == 'success' && (github.event_name == 'schedule' || inputs.simulate_schedule == true)
runs-on: self-hosted
steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
driver-opts: |
image=moby/buildkit:latest
network=host

- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Build and push dev-glm
run: |
docker buildx build \
-f docker/glm5/Dockerfile.dev-glm \
-t radixark/miles:dev-glm \
--push \
.
134 changes: 17 additions & 117 deletions .github/workflows/pr-test.yml

Large diffs are not rendered by default.

21 changes: 9 additions & 12 deletions .github/workflows/pr-test.yml.j2
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
<% set default_image = 'radixark/miles:dev' %>

<% set fsdp_tests = [
{'test_file': 'e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py', 'num_gpus': 8},
{'test_file': 'e2e/fsdp/test_qwen3_vl_4B_fsdp.py', 'num_gpus': 8},
{'test_file': 'e2e/fsdp/test_qwen3_0.6B_fsdp_distributed.py', 'num_gpus': 8},
{'test_file': 'e2e/fsdp/test_qwen3_0.6B_megatron_fsdp_align.py', 'num_gpus': 8},
{'name': '[FSDP] qwen3-4B-fsdp-true-on-policy', 'test_file': 'e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py', 'num_gpus': 8},
{'name': '[FSDP] qwen3-vl-4B-fsdp', 'test_file': 'e2e/fsdp/test_qwen3_vl_4B_fsdp.py', 'num_gpus': 8},
{'name': '[FSDP] qwen3-0.6B-fsdp-distributed', 'test_file': 'e2e/fsdp/test_qwen3_0.6B_fsdp_distributed.py', 'num_gpus': 8},
{'name': '[FSDP] qwen3-0.6B-megatron-fsdp-align', 'test_file': 'e2e/fsdp/test_qwen3_0.6B_megatron_fsdp_align.py', 'num_gpus': 8},
{'name': '[FSDP] qwen3-0.6B-fsdp-colocated-2xGPU', 'test_file': 'e2e/short/test_qwen3_0.6B_fsdp_colocated_2xGPU.py', 'num_gpus': 8},
] %>

<% set megatron_tests = [
Expand All @@ -27,7 +28,6 @@
<% set short_tests = [
{'test_file': 'e2e/short/test_qwen2.5_0.5B_gsm8k_async_short.py', 'num_gpus': 8},
{'test_file': 'e2e/short/test_qwen2.5_0.5B_gsm8k_short.py', 'num_gpus': 8},
{'test_file': 'e2e/short/test_qwen3_0.6B_fsdp_colocated_2xGPU.py', 'num_gpus': 8},
{'test_file': 'e2e/sglang_config/test_sglang_config.py', 'num_gpus': 8},
{'test_file': 'e2e/sglang_config/test_sglang_config_mixed_offload.py', 'num_gpus': 8},
{'test_file': 'e2e/sglang_config/test_sglang_config_mixed_offload_ft.py', 'num_gpus': 8},
Expand Down Expand Up @@ -67,12 +67,6 @@
{'test_file': 'utils/test_sglang_config.py', 'num_gpus': 0},
],
},
'unit-test': {
'label': 'run-unit-test',
'tests': [
{'test_file': 'e2e/fsdp/test_qwen3_4B_fsdp_true_on_policy.py', 'num_gpus': 8}
],
},
'e2e-test-sglang': {
'label': 'run-ci-sglang',
'test_executor': 'pytest',
Expand All @@ -82,6 +76,8 @@
{'test_file': 'e2e/sglang/test_session_server_tool_call.py', 'num_gpus': 1, 'model_family': 'glm47'},
{'test_file': 'e2e/sglang/test_tito_logprob_equivalence.py', 'num_gpus': 1, 'model_family': 'qwen3'},
{'test_file': 'e2e/sglang/test_tito_logprob_equivalence.py', 'num_gpus': 1, 'model_family': 'glm47'},
{'test_file': 'e2e/sglang/test_r3_router_equivalence.py', 'num_gpus': 1, 'model_family': 'qwen3_30b_a3b'},
{'test_file': 'e2e/sglang/test_r3_router_equivalence.py', 'num_gpus': 1, 'model_family': 'glm47_flash'},
],
},
'e2e-test-short': {
Expand All @@ -94,7 +90,7 @@
},
'e2e-test-megatron': {
'label': 'run-ci-megatron',
'tests': megatron_tests,
'tests': megatron_tests + lora_tests,
},
'e2e-test-precision': {
'label': 'run-ci-precision',
Expand Down Expand Up @@ -197,6 +193,7 @@ jobs:
MILES_TEST_ENABLE_EVAL: ${{ matrix.info.enable_eval || '1' }}
MILES_TEST_FEW_GPU: '0'
SESSION_TEST_MODEL_FAMILY: ${{ matrix.info.model_family || '' }}
ROUTER_EQ_MODEL_FAMILY: ${{ matrix.info.model_family || '' }}

steps:
- name: Checkout repository
Expand Down
44 changes: 38 additions & 6 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
#
# 2. radixark/miles:dev-cu13-arm64
# build-arg:ENABLE_CUDA_13=1 \
# build-arg:SGLANG_IMAGE_TAG=v0.5.9-cu130-arm64 \
# build-arg:SGLANG_IMAGE_TAG=v0.5.10-cu130 \
# build-arg:WHEELS_TAG=cu130-aarch64 \

ARG SGLANG_IMAGE_TAG=v0.5.9
ARG SGLANG_IMAGE_TAG=v0.5.10
FROM lmsysorg/sglang:${SGLANG_IMAGE_TAG} AS sglang

# ======================================== Arguments =============================================
Expand Down Expand Up @@ -46,7 +46,7 @@ RUN mkdir -p /tmp/wheels && \
curl -sL "https://api.github.com/repos/${WHEELS_REPO}/releases/tags/${WHEELS_TAG}" \
| python3 -c "import sys, json, subprocess; \
[subprocess.run(['curl', '-fSL', '-o', '/tmp/wheels/' + a['name'], a['browser_download_url']], check=True) \
for a in json.load(sys.stdin)['assets'] if a['name'].endswith('.whl')]" && \
for a in json.load(sys.stdin)['assets'] if a['name'].endswith(('.whl', '.tar.gz'))]" && \
ls -lh /tmp/wheels/

# ====================================== Python dependencies ============================================
Expand All @@ -63,7 +63,7 @@ RUN pip install /tmp/wheels/flash_attn_3-*.whl && \

RUN pip install git+https://github.com/ISEEKYAN/mbridge.git@89eb10887887bc74853f89a4de258c0702932a1c --no-deps

RUN pip install flash-linear-attention==0.4.1
RUN pip install flash-linear-attention==0.4.2
RUN pip install tilelang -f https://tile-ai.github.io/whl/nightly/cu128/

RUN if [ "${ENABLE_CUDA_13}" = "1" ]; then \
Expand All @@ -83,12 +83,12 @@ RUN git clone https://github.com/${MEGATRON_REPO}.git --recursive -b ${MEGATRON_
RUN pip install git+https://github.com/fzyzcjy/torch_memory_saver.git@d64a639 --no-cache-dir --force-reinstall
# RUN pip install git+https://github.com/fzyzcjy/Megatron-Bridge.git@dev_rl --no-build-isolation
RUN pip install "nvidia-modelopt[torch]>=0.37.0" --no-build-isolation
RUN pip install git+https://github.com/yushengsu-thu/Megatron-Bridge.git@merged-megatron-0.16.0rc0-miles --no-deps --no-build-isolation
RUN pip install git+https://github.com/radixark/Megatron-Bridge.git@bridge --no-deps --no-build-isolation
RUN pip install megatron-energon --no-deps
RUN pip install multi-storage-client --no-deps

COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN rm -rf /usr/lib/python3/dist-packages/jwt /usr/lib/python3/dist-packages/PyJWT* && pip install -r /tmp/requirements.txt

# https://github.com/pytorch/pytorch/issues/168167
RUN if [ "${ENABLE_CUDA_13}" = "1" ]; then \
Expand Down Expand Up @@ -125,4 +125,36 @@ RUN git clone https://github.com/radixark/miles.git /root/miles && \
# int4_qat
RUN pip install /tmp/wheels/fake_int4_quant_cuda-*.whl

# ====================================== Install sgl-model-gateway ============================================
# SGL_ROUTER_USE_WHEELS=0:
# Build from source https://github.com/radixark/sgl-router-for-miles
# SGL_ROUTER_USE_WHEELS=1 (default):
# Install the pre-built sgl-model-gateway wheel

ARG SGL_ROUTER_USE_WHEELS=1
ARG SGL_ROUTER_REPO=https://github.com/radixark/sgl-router-for-miles.git
ARG SGL_ROUTER_BRANCH=main

RUN --mount=type=cache,target=/root/.cache/pip \
set -eux; \
if [ "${SGL_ROUTER_USE_WHEELS}" = "1" ]; then \
pip install --force-reinstall /tmp/wheels/sglang_router-*.whl && \
tar xzf /tmp/wheels/sgl-model-gateway-linux-*.tar.gz -C /usr/local/bin/ && \
chmod +x /usr/local/bin/sgl-model-gateway; \
elif [ "${SGL_ROUTER_USE_WHEELS}" = "0" ]; then \
git clone --branch "${SGL_ROUTER_BRANCH}" --depth 1 "${SGL_ROUTER_REPO}" /build/sgl-model-gateway && \
curl --proto '=https' --tlsv1.2 --retry 3 --retry-delay 2 -sSf https://sh.rustup.rs | sh -s -- -y && \
export PATH="/root/.cargo/bin:${PATH}" && \
python3 -m pip install maturin && \
cd /build/sgl-model-gateway/bindings/python && \
ulimit -n 65536 && \
maturin build --release --features vendored-openssl --out /build/gateway_wheels && \
cd /build/sgl-model-gateway && \
cargo build --release --bin sgl-model-gateway --features vendored-openssl && \
cp target/release/sgl-model-gateway /usr/local/bin/sgl-model-gateway && \
chmod +x /usr/local/bin/sgl-model-gateway && \
pip install --force-reinstall /build/gateway_wheels/sglang_router-*.whl && \
rm -rf /root/.cargo /root/.rustup /build/sgl-model-gateway /build/gateway_wheels; \
fi

RUN rm -rf /tmp/wheels
Loading