arguments: allow 'assistant' in --tito-allowed-append-roles choices by DavidBellamy · Pull Request #6 · LLM360/miles

DavidBellamy · 2026-04-18T19:38:39Z

Summary

Widen the --tito-allowed-append-roles argparse choices from ['tool', 'user', 'system'] to ['tool', 'user', 'system', 'assistant'] so multi-turn agent harnesses (e.g. Harbor's terminus-2) can append their own planning or self-reflection assistant messages between tool calls.

Default remains ['tool'], so existing callers are unaffected.

Why

Terminus-2 in multi-turn agent loops appends its own assistant-role messages to the conversation before the next tool or user message (self-reflection, chain-of-thought, retry framing). TITO's session-server (miles/rollout/session/linear_trajectory.py:assert_messages_append_only_with_allowed_role) checks each appended role against the allowlist derived from this flag, and without assistant in the allowed set it raises:

litellm.BadRequestError: OpenAIException - Error code: 400 -
{'error': "appended message at index N has role='assistant',
allowed=['tool', 'user']; to allow more roles use --tito-allowed-append-roles"}

The error message suggests using the flag to fix it, but argparse then rejects 'assistant' with invalid choice: 'assistant' (choose from 'tool', 'user', 'system'), so users are stuck.

Observed

On job LLM360/RL360#1565146 (GLM-4.7-Flash + PD + L3 + real terminus-2 on tblite), this blocks every non-first agent turn. Mock-agent runs never triggered it because the mock never appends assistant messages.

Tested

python -c "from miles.utils.arguments import *" imports cleanly
Passes our internal RL360 training launcher once this lands on the deploy branch (auto-rebuilt every 15 min from upstream + LLM360 PRs)

When the session server restarts, all in-memory sessions are lost. Previously this returned 404 to every active agent, cascading failures across all running trials. Now, get_or_create_session() auto-creates the session if it does not exist, allowing agents to transparently recover after a router restart. The GET /sessions/{session_id} and chat completions endpoints both use this new method.

When SGLang returns 400 "rollback failed" (prefix-cache state mismatch), retry the request once without pretokenized input_ids. This bypasses prefix continuation and lets SGLang process the request from scratch. Previously, rollback failures were passed through to the caller as fatal errors, ending the session on the first request.

…weight updates" (radixark#882)

- Guard against None response_body (use `or b""` instead of default) - Use .lower() for case-insensitive "rollback failed" matching - Only retry without prefix continuation if input_ids was present

- GET /sessions/{id} returns empty response for unknown sessions instead of auto-creating (keeps the endpoint idempotent) - Auto-creation in POST still works for restart tolerance - Add TTL-based eviction (2h) for auto-created sessions to prevent unbounded memory growth

rollout_routed_experts has shape (len(tokens)-1, num_layers, topk) and was not truncated alongside tokens/logprobs/loss_mask, causing Sample.validate() to fail with an assertion error when agentic sessions exceed max_seq_len with routing replay enabled.

Directly reflects the invariant rather than reconstructing the index from prompt_len and keep_tokens.

…pdating (radixark#890) Co-authored-by: Yueming Yuan <yym022502@gmail.com>

…rk#871)

Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net>

…adixark#654) Co-authored-by: GuanxingLu <gxlu02@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… CLI args in swe-agent-v2 (radixark#954) Co-authored-by: Shi Dong <shi.dong@radixark.ai>

…#952) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: Yueming Yuan <yym022502@gmail.com>

…adixark#926) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rmers >=5.0 (radixark#927) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ers >=5.0 (radixark#931)

…#935)

…#948) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…adixark#961)

…match (radixark#885)

…log dtype (radixark#975) Co-authored-by: yueming-yuan <yym022502@gmail.com>

…ark#960)

…#982)

…adixark#974) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

_compute_zero_std_metrics crashes with TypeError when any zero-std group's leading sample has a None reward (typical for Status.ABORTED trials): File "miles/ray/rollout.py", line 1266, in _compute_zero_std_metrics interesting_rewards = [str(round(g[0].get_reward_value(args), 1)) ...] TypeError: type NoneType doesn't define __round__ method This crash fires on RolloutManager.generate() inside _log_rollout_data, after the rollout collection + dynamic sampling filter have already accepted the batch. With agentic tasks where some trials routinely abort (Daytona sandbox timeout, tool-invocation loops, etc.), the trainer never receives the batch and optimizer.step() never fires, so async RL training silently stalls. Fix: extract a _reward_label helper that buckets None-reward samples under a dedicated 'none' label instead of passing None to round(). This keeps the metric informative (zero_std/count_none shows the aborted-group count) and preserves the existing behavior for numeric rewards. Observed on LLM360/RL360 radixark#76 FAST_ITER smoke runs (job 1559799) with GLM-4.7-Flash on agentic terminal-bench tasks.

… path The old sglang_router (<=0.2.1) and the miles-router both use the single-arg /add_worker?url=... endpoint for engine registration. Previously, the Miles engine asserted worker_type=='regular' before hitting that endpoint, so any attempt to stand up prefill/decode workers via the miles-router path (including the sgl-model-gateway that mirrors it) fail-fasts at engine init: AssertionError: pd disaggregation is not supported in old router or miles router. This blocks PD disagg throughput scaling in any deployment that uses the miles-router path, even when the receiving router (e.g. sgl-model-gateway with a PD-aware shim) can handle worker_type on /add_worker. Relax the assertion: forward worker_type (and bootstrap_port for prefill) as extra query params. Routers that honor them get PD registration; routers that only accept the single-arg form ignore the extras and register as regular, with a warning logged so the fallback is visible. The companion server-side change is on the receiving router: - sgl-model-gateway must accept ?worker_type=&bootstrap_port= on /add_worker - Or deployments can use the newer /workers endpoint (non-miles path). Context: LLM360/RL360 radixark#76. Track G (job 1559336) showed full PD KV transfer via mooncake works with SGLang's own mini_lb; this unblocks the same flow through Miles-driven rollouts.

SGLangEngine Ray actors created via RolloutRayActor.options(runtime_env= {'env_vars': env_vars}) only inherit the env vars explicitly listed in env_vars. PYTHONPATH is not included, so the actor process uses only the container's default PYTHONPATH. In deployments that install miles as a pip package in the container image (e.g. the radixark/miles:dev overlay), 'import miles' in the actor imports from /root/miles (site-packages pointer), silently bypassing any MILES_OVERRIDE prepended to PYTHONPATH on the driver. This means local-patched miles code on the driver's PYTHONPATH is not executed in actors, so per-cluster patches (e.g. kept in a shared clone under SHARED_DIR/miles) never reach SGLangEngine. Observed on LLM360/RL360#76: the driver had the patched miles from LLM360/miles:deploy visible via PYTHONPATH, but SGLangEngine actors still hit the assert from an earlier revision because they imported from the container's /root/miles. Fix: copy os.environ['PYTHONPATH'] into env_vars when it is set. No-op when the driver doesn't have PYTHONPATH exported (container defaults apply; same behavior as before).

…proxy response build_proxy_response forwards upstream gateway headers into either JSONResponse or Response. Both re-serialize / re-frame the body: * JSONResponse runs json.dumps over the parsed content, so whitespace and unicode escape behavior may produce a different byte count than the upstream did. * Response may be re-framed by Starlette with chunked transfer encoding. Forwarding the upstream content-length, transfer-encoding, or content-encoding in these cases causes a mismatch between declared framing and the bytes Starlette actually writes. Clients (e.g. Miles's own http_utils.post) then error with h11 LocalProtocolError 'Too much data for declared Content-Length' or 'peer closed connection without sending complete message body' and retry. Observed: on a mock-agent FAST_ITER run with PD disaggregation through a gateway that serializes merged prefill+decode logprobs, ~200 of 332 chat completions hit this error before mock retries salvaged training progress. Strip the three hop-by-hop headers before building the outgoing Response; Starlette / hyper then recompute the correct framing from the actual body.

…pe-on-miles-router=f0c9d3cc1f9f9a9ed98723e9462f8e1a3465a428,fix/guard-round-none-in-zero-std-metrics=7b7efa922b47f3cb2b3d1e1b6c549ae6d7aa640e,fix/propagate-pythonpath-to-ray-remote-actors=779839cb2569f411a373dac9a229b469aa0aa991,fix/rollback-error-recovery=dd188aaddb887e5c1de066277e7c9f10eec297b0,fix/session-auto-create=29a0dcad45e36985c7af6f7c1dcf653e0eace4f2,fix/session-server-strip-stale-content-length-clean=9a0ef97613294d854b60e98e2202f58a7734936f,fix/truncate-routed-experts=25645357a7d51daa307a9fa25081095bc3cfb2a1

Multi-turn agent harnesses such as Harbor's terminus-2 append their own planning or self-reflection assistant messages to the conversation before the next tool/user turn. TITO's session-server validates the appended role against this allowlist; without 'assistant' in the choices, the agent's 400 surfaces as: litellm.BadRequestError: OpenAIException - Error code: 400 - {'error': "appended message at index N has role='assistant', allowed=[...]; to allow more roles use --tito-allowed-append-roles"} and the user cannot fix it via CLI because argparse rejects 'assistant' with 'invalid choice'. This widens the allowlist; the default remains ['tool'], so no change for existing callers.

DavidBellamy and others added 30 commits April 5, 2026 04:43

Revert "[BUGFIX] [P2PRDMA] Add rollout post-processing after P2PRDMA …

e0fc889

…weight updates" (radixark#882)

Fix null body crash and case-insensitive rollback detection

dd188aa

- Guard against None response_body (use `or b""` instead of default) - Use .lower() for case-insensitive "rollback failed" matching - Only retry without prefix continuation if input_ids was present

Simplify rollout_routed_experts slice to len(tokens) - 1

2564535

Directly reflects the invariant rather than reconstructing the index from prompt_len and keep_tokens.

[Fix] fix ci (radixark#894)

ef5dda6

Avoid threading for ray getting object (radixark#886)

a3db3a9

Add explicit errors for unsupported Megatron profiles (radixark#887)

4dd7770

Add nvfp4 quantizer files (radixark#907)

649a353

Bump flash-linear-attention version to 0.4.2 (radixark#892)

3572922

[BUGFIX] Invoke "post_process_quantization" by default after weight u…

8146a78

…pdating (radixark#890) Co-authored-by: Yueming Yuan <yym022502@gmail.com>

Add heartbeat and id to session server (radixark#866)

eaa36a2

fix: adding thin glm5 image to docker build + latest tag sync (radixa…

70dc402

…rk#871)

Add consistent hashing routing policy for rollout (radixark#891)

c198efa

Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net>

[example] add retool v2 example with multi-turn framework interfaces (r…

afc5b55

…adixark#654) Co-authored-by: GuanxingLu <gxlu02@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Expose rollout-batch-size, n-samples-per-prompt, global-batch-size as…

4db9bfe

… CLI args in swe-agent-v2 (radixark#954) Co-authored-by: Shi Dong <shi.dong@radixark.ai>

chore: remove obsolete swe-agent server.py and run-qwen3.sh (radixark…

6b58ebd

…#952) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add weight staleness control for fully async rollout (radixark#958)

41615af

Fix/pause generation mode (radixark#924)

94dbb8f

Co-authored-by: Yueming Yuan <yym022502@gmail.com>

[v0.5.10][1] Bump sglang to v0.5.10 (radixark#898)

4d8b007

[v0.5.10][2] Fix apply_chat_template behavior for transformers >=5.0 (r…

ef228e6

…adixark#926) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

[v0.5.10][3] Fix processor return_tensors duplicate kwarg for transfo…

b1a4346

…rmers >=5.0 (radixark#927) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

[v0.5.10][4] Fix _no_split_modules set not subscriptable in transform…

2a99108

…ers >=5.0 (radixark#931)

[v0.5.10][5] Disable piecewise cuda graph to avoid NVLS oom (radixark…

c74392d

…#935)

[v0.5.10][7][FSDP] move FSDP to experimental and disable by default (r…

c4e50c8

…adixark#961)

Add skiplist and more robust calculation on val (radixark#965)

8d66ac1

[fix] tiny fix debug rollout only in weight version check (radixark#967)

02f6e05

Zhichenzzz and others added 20 commits April 10, 2026 20:06

feat: real cp support with relayout fix for qwen3.5 train/rollout mis…

eb294e3

…match (radixark#885)

[AMD] Upgrade to sglv0.5.10 (radixark#973)

82bf196

switch model to actor (radixark#756)

ef7481a

[fix] support general logic to bypass fp32 downcast and fix qwen35 A_…

85fe651

…log dtype (radixark#975) Co-authored-by: yueming-yuan <yym022502@gmail.com>

fix: populate prefix_cache_info in OpenAI/session rollout path (radix…

6cc3feb

…ark#960)

Remove prepare_harbor_tasks.py; use harbor-private adapters (radixark…

6706c73

…#982)

[fix] Skip flush_cache in in_place mode and add fully async example (r…

f144961

…adixark#974) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

GLM47 full cmd for async and sync reasoning (radixark#986)

c271e14

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Deploy: merge fix/propagate-pythonpath-to-ray-remote-actors

643bfdf

Deploy: merge fix/allow-pd-worker-type-on-miles-router

c24a0de

Deploy: merge fix/guard-round-none-in-zero-std-metrics

00dc588

Deploy: merge fix/truncate-routed-experts

a189e79

Deploy: merge fix/rollback-error-recovery

e9bace5

Deploy: merge fix/session-auto-create

7c3fa80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arguments: allow 'assistant' in --tito-allowed-append-roles choices#6

arguments: allow 'assistant' in --tito-allowed-append-roles choices#6
DavidBellamy wants to merge 50 commits intomainfrom
fix/tito-allow-assistant-append

DavidBellamy commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

DavidBellamy commented Apr 18, 2026

Summary

Why

Observed

Tested

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants