THUDM · zhuzilin · Mar 28, 2026 · Mar 27, 2026 · Mar 27, 2026 · Mar 28, 2026
diff --git a/docker/patch/latest/sglang.patch b/docker/patch/latest/sglang.patch
diff --git a/docker/version.txt b/docker/version.txt
@@ -1 +1 @@
-nightly-dev-20260327a
+nightly-dev-20260329a
diff --git a/docs/_static/image/trace.png b/docs/_static/image/trace.png
diff --git a/docs/en/developer_guide/trace.md b/docs/en/developer_guide/trace.md
@@ -0,0 +1,65 @@
+# Trace Viewer
+
+slime can attach lightweight execution traces to each rollout sample. These traces capture span-style events such as generation and reward-model calls, and they can be inspected later from a saved rollout debug dump.
+
+![trace timeline viewer](../../_static/image/trace.png)
+
+## Save rollout trace data
+
+To inspect traces later, save rollout debug data during a run:
+
+```bash
+python train.py \
+    ... \
+    --save-debug-rollout-data /path/to/debug/rollout_{rollout_id}.pt
+```
+
+Each saved `.pt` file contains the rollout samples together with their `trace` payloads. You can also replay the same dump later with `--load-debug-rollout-data`.
+
+## Open the timeline viewer
+
+Use the trace viewer script on a saved rollout dump:
+
+```bash
+python tools/trace_timeline_viewer.py /path/to/debug/rollout_0.pt
+```
+
+The script generates:
+
+- `rollout_0.trace_timeline_cache.json`
+- `rollout_0.trace_timeline_viewer.html`
+
+By default it also starts a local static server so you can open the generated HTML immediately. If you only want the files, use `--no-serve`.
+
+## How to read the viewer
+
+- Each row corresponds to one sample.
+- Bars represent spans, while point markers represent instant events.
+- Span attributes recorded at the start or end of `trace_span(...)` are shown in the details panel.
+- When SGLang returns PD disaggregation timings, the viewer adds synthetic `[P]` and `[D]` lanes to break out prefill/decode work.
+- When PD is not enabled, those virtual lanes are omitted automatically and the base trace still renders normally.
+
+## Instrument custom code
+
+For custom rollout or reward code, reuse helpers from `slime.utils.trace_utils`:
+
+- `trace_span(target, name, attrs=...)`: record a duration span.
+- `trace_event(target, name, attrs=...)`: record an instant event.
+- `bind_trace(sample)`: ensure a sample already has a trace carrier before passing it across helpers or tasks.
+
+If you want to record SGLang generation metadata in a consistent way, reuse `build_sglang_meta_trace_attrs`:
+
+```python
+from slime.utils.trace_utils import build_sglang_meta_trace_attrs, trace_span
+
+with trace_span(sample, "sglang_generate") as span:
+    output = await post(url, payload)
+    span.update(build_sglang_meta_trace_attrs(output["meta_info"]))
+```
+
+## Tips
+
+- Save a small number of rollouts first; the viewer is easiest to read when each dump contains a manageable number of samples.
+- The viewer is built from the saved `.pt` dump, so traces can be inspected offline on another machine.
+- For GPU/kernel-level SGLang profiling traces, see [Profiling](./profiling.md).
+
diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -66,6 +66,7 @@ slime is the RL-framework behind GLM-4.7, GLM-4.6 and GLM-4.5. Apart from models
 
    developer_guide/ci.md
    developer_guide/debug.md
+   developer_guide/trace.md
    developer_guide/profiling.md
 
 .. toctree::

diff --git a/docs/zh/developer_guide/trace.md b/docs/zh/developer_guide/trace.md
@@ -0,0 +1,65 @@
+# Trace 可视化
+
+slime 可以为每条 rollout sample 挂上轻量级执行 trace。它会记录生成、奖励模型等 span 事件，并且可以在保存下来的 rollout debug dump 中离线查看。
+
+![trace 时间线查看器](../../_static/image/trace.png)
+
+## 保存 rollout trace 数据
+
+如果想在运行结束后查看 trace，可以在训练时打开 rollout debug dump：
+
+```bash
+python train.py \
+    ... \
+    --save-debug-rollout-data /path/to/debug/rollout_{rollout_id}.pt
+```
+
+每个保存出来的 `.pt` 文件都会包含 rollout samples，以及对应的 `trace` 数据。之后也可以通过 `--load-debug-rollout-data` 复用同一份 dump。
+
+## 打开时间线查看器
+
+对保存好的 rollout dump 运行：
+
+```bash
+python tools/trace_timeline_viewer.py /path/to/debug/rollout_0.pt
+```
+
+脚本会生成：
+
+- `rollout_0.trace_timeline_cache.json`
+- `rollout_0.trace_timeline_viewer.html`
+
+默认情况下，它还会启动一个本地静态文件服务，方便直接在浏览器里打开。如果只想生成文件，可以加 `--no-serve`。
+
+## 如何理解可视化结果
+
+- 每一行对应一条 sample。
+- 条形块表示 span，点表示瞬时事件。
+- `trace_span(...)` 在开始和结束时记录的属性，都会显示在详情面板里。
+- 当 SGLang 返回 PD 分离相关时延时，viewer 会自动补出 `[P]` 和 `[D]` 两条虚拟 lane，用来拆开展示 prefill/decode。
+- 如果没有开启 PD，这两条虚拟 lane 不会出现，基础 trace 也仍然可以正常渲染。
+
+## 给自定义代码打点
+
+在自定义 rollout 或 reward 逻辑中，可以直接复用 `slime.utils.trace_utils` 里的工具：
+
+- `trace_span(target, name, attrs=...)`：记录一段持续时间。
+- `trace_event(target, name, attrs=...)`：记录一个瞬时事件。
+- `bind_trace(sample)`：在 sample 被传递到其他 helper 或任务之前，确保它已经绑定好 trace carrier。
+
+如果想统一记录 SGLang 返回的 generation 元信息，可以复用 `build_sglang_meta_trace_attrs`：
+
+```python
+from slime.utils.trace_utils import build_sglang_meta_trace_attrs, trace_span
+
+with trace_span(sample, "sglang_generate") as span:
+    output = await post(url, payload)
+    span.update(build_sglang_meta_trace_attrs(output["meta_info"]))
+```
+
+## 使用建议
+
+- 先保存少量 rollout；单个 dump 的 sample 数量适中时，viewer 会更容易阅读。
+- viewer 直接基于保存下来的 `.pt` dump 工作，因此可以把文件拷到别的机器离线分析。
+- 如果你想看的是 SGLang 自身的 GPU / kernel 级 profiling trace，请参考 [性能分析](./profiling.md)。
+
diff --git a/docs/zh/index.rst b/docs/zh/index.rst
@@ -66,6 +66,7 @@ slime 是 GLM-4.7、GLM-4.6、GLM-4.5 背后的 RL 训练框架。除此之外
 
    developer_guide/ci.md
    developer_guide/debug.md
+   developer_guide/trace.md
    developer_guide/profiling.md
 
 .. toctree::

diff --git a/slime/ray/rollout.py b/slime/ray/rollout.py
@@ -120,6 +120,8 @@ def start_engines(self, port_cursors: dict[int, int] | None = None) -> tuple[lis
                     "SGLANG_BATCH_INVARIANT_OPS_ENABLE_MM_FALLBACK_VARIANT": "true",
                     "SGLANG_ENABLE_HEALTH_ENDPOINT_GENERATION": "false",
                     "SGLANG_ENABLE_STRICT_MEM_CHECK_DURING_IDLE": "false",
+                    "SGLANG_TRANSFER_PROFILING_INFO": "true",
+                    "SLIME_ENABLE_PROFILING": "true",
                 }.items()
             }
 

diff --git a/slime/rollout/sglang_rollout.py b/slime/rollout/sglang_rollout.py
@@ -26,6 +26,7 @@
     load_processor,
     load_tokenizer,
 )
+from slime.utils.trace_utils import build_sglang_meta_trace_attrs, trace_function, trace_span
 from slime.utils.types import Sample
 
 from .rm_hub import async_rm, batched_async_rm
@@ -174,7 +175,9 @@ async def generate(args: Namespace, sample: Sample, sampling_params: dict[str, A
         if getattr(args, "router_policy", None) == "consistent_hashing":
             headers = {"X-SMG-Routing-Key": sample.session_id}
 
-    output = await post(url, payload, headers=headers)
+    with trace_span(sample, "sglang_generate", attrs={"max_new_tokens": sampling_params["max_new_tokens"]}) as span:
+        output = await post(url, payload, headers=headers)
+        span.update(build_sglang_meta_trace_attrs(output["meta_info"]))
 
     if "output_token_logprobs" in output["meta_info"]:
         new_response_tokens = [item[1] for item in output["meta_info"]["output_token_logprobs"]]
@@ -211,6 +214,7 @@ async def generate(args: Namespace, sample: Sample, sampling_params: dict[str, A
     return sample
 
 
+@trace_function("generate_and_rm", target="sample")
 async def generate_and_rm(
     args: Namespace,
     sample: Sample | list[Sample],
@@ -261,7 +265,8 @@ async def generate_and_rm(
 
         # for multi agent system, the reward of some sample is calculated during generation.
         samples_need_reward = [sample for sample in samples if sample.reward is None]
-        rewards = await batched_async_rm(args, samples_need_reward)
+        with trace_span(samples_need_reward, "reward_model"):
+            rewards = await batched_async_rm(args, samples_need_reward)
         for sample, reward in zip(samples_need_reward, rewards, strict=False):
             sample.reward = reward
         return samples
@@ -270,11 +275,17 @@ async def generate_and_rm(
             return sample
         # for multi-turn environment, a reward could be assigned to the agent.
         if sample.reward is None:
-            sample.reward = await async_rm(args, sample)
+            with trace_span(sample, "reward_model"):
+                sample.reward = await async_rm(args, sample)
 
     return sample
 
 
+@trace_function(
+    "generate_and_rm_group",
+    target="group",
+    attrs_getter=lambda args, group, sampling_params, evaluation=False: {"group_size": len(group)},
+)
 async def generate_and_rm_group(
     args: Namespace, group: list[Sample], sampling_params: dict[str, Any], evaluation: bool = False
 ) -> list[Sample]:
@@ -302,7 +313,8 @@ async def generate_and_rm_group(
 
     # for the rm that need the whole group, we will do the rm here
     if not state.aborted and args.group_rm:
-        rewards = await batched_async_rm(args, group)
+        with trace_span(group, "group_reward_model"):
+            rewards = await batched_async_rm(args, group)
         for sample, reward in zip(group, rewards, strict=False):
             sample.reward = reward