fix(container): pin SGLang nixl_ref to v1.0.0 to match upstream wheel ABI#9283
fix(container): pin SGLang nixl_ref to v1.0.0 to match upstream wheel ABI#9283keivenchang wants to merge 3 commits intomainfrom
Conversation
… ABI PR #9216 set sglang.nixl_ref=0.10.1 so wheel_builder builds the NIXL C++ SDK at 0.10.1, but the upstream lmsysorg/sglang base image already ships nixl-cu13==1.0.0 in its runtime — the SDK and runtime wheel disagree on ABI, surfacing as nixl symbol load failures at runtime. Pin sglang.nixl_ref to v1.0.0 so the SDK build matches the preinstalled wheel. Sglang-only change; vllm and trtllm stay at 0.10.1. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
…e image bumps Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
WalkthroughSGLang nixl_ref version is updated from 0.10.1 to v1.0.0, with corresponding CUDA 12.9 and 13.0 runtime image tag updates, accompanied by documentation comments clarifying the dependency relationship between nixl_ref and preinstalled wheel versions. ChangesSGLang Configuration Updates
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
container/context.yaml (1)
95-95: ⚡ Quick winMake the verification recipe explicit for both CUDA wheels.
Line 95 only shows
nixl-cu13; that can mislead when validating thecuda12.9runtime path. Consider documenting bothnixl-cu12andnixl-cu13checks (or a parametric command) so maintainers verify the correct wheel per image tag.#!/bin/bash # Read-only verification recipe for maintainers (run locally where Docker is available) # CUDA 12.9 image should expose nixl-cu12 at the expected version docker run --rm lmsysorg/sglang:v0.5.10.post1-runtime \ python -m pip show nixl-cu12 | sed -n '1,20p' # CUDA 13.0 image should expose nixl-cu13 at the expected version docker run --rm lmsysorg/sglang:v0.5.10.post1-cu130-runtime \ python -m pip show nixl-cu13 | sed -n '1,20p'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@container/context.yaml` at line 95, Update the verification comment that currently only references "nixl-cu13" so it documents both CUDA wheel checks or a parametric command; change the single-line note to show commands for verifying nixl-cu12 (for CUDA 12.9 runtimes) and nixl-cu13 (for CUDA 13.0 runtimes) or provide a template command that accepts the wheel name/tag, so maintainers can run the appropriate docker run + "python -m pip show <wheel>" check for each runtime image (refer to the existing comment about verifying upstream version and the example docker run + pip show pattern).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@container/context.yaml`:
- Line 95: Update the verification comment that currently only references
"nixl-cu13" so it documents both CUDA wheel checks or a parametric command;
change the single-line note to show commands for verifying nixl-cu12 (for CUDA
12.9 runtimes) and nixl-cu13 (for CUDA 13.0 runtimes) or provide a template
command that accepts the wheel name/tag, so maintainers can run the appropriate
docker run + "python -m pip show <wheel>" check for each runtime image (refer to
the existing comment about verifying upstream version and the example docker run
+ pip show pattern).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d1265fbc-9f54-4039-bac4-ef7698af7c0a
📒 Files selected for processing (1)
container/context.yaml
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Overview:
#9216 added the NIXL C++ SDK to the sglang dev stage so
cargo buildcould linknixl-sys, pinningnixl_ref: 0.10.1. But wheel_builder also buildsai_dynamo*.whl(Rust bindings) against that SDK, and the runtime image installs the wheel on top oflmsysorg/sglang, which preinstallsnixl-cu1{2,3}==1.0.0. ABI mismatch → nixl symbol load failures at runtime.This PR pins all three NIXL surfaces to the same version (1.0.0) so the SDK
ai_dynamois built against matches the NIXL the runtime already ships:sglang.nixl_ref: v1.0.0— wheel_builder builds the C++ SDK at v1.0.0nixl-sys = "=1.0.0"(Cargo) — Rust bindings linked against v1.0.0 headersnixl-cu1{2,3}==1.0.0(already preinstalled by upstream sglang runtime) — runtime Python wheelSglang-only; vllm/trtllm stay at
0.10.1.Details:
nixlDescList<T>::operator[]flipped fromunsigned inttosize_t(and droppedvirtual). Thenixl-syscrate's vendoredwrapper.cppis generated against a specific NIXL commit's headers, so crate version and SDK version must match.nixl-sys 1.0.0is published on crates.io, lockstep with NIXL v1.0.0.sglang_runtime.Dockerfilenarrows the wheel COPY from wheel_builder toai_dynamo*.whland excludesnixl-cu*.whl, so the wheel_builder NIXL stack does not enter the runtime image. Runtime keeps using upstreamlmsysorg/sglang's preinstalled NIXL Python stack — only the ABIai_dynamois built against changes.nixl-sysis identical between 0.10.1 and 1.0.x (onlywrapper.cppchanged, plus internalnixlAgentConfigfield-by-field construction). Zero dynamo Rust source changes needed.runtime_image_tag↔nixl_refso future SGLang base-image bumps re-verify the upstream NIXL version (pip show nixl-cu13recipe included).Verified locally on cu12.9 and cu13.0 (sglang local-dev images built from this branch):
compile.sh --dev(full workspace cargo + maturin) PASSED on both — norust-lld: undefined symbolfrom the prior 0.10.1 nixl-sys × v1.0.0 SDK mismatch.Where should the reviewer start?
lib/memory/Cargo.tomlandlib/llm/Cargo.tomlfor the crate bump;container/context.yamlfor the SDK pin and the cross-reference comments.Related Issues:
Linear: DIS-9265
Relates to #6671
/coderabbit profile chill
Summary by CodeRabbit