Skip to content

arguments: allow 'assistant' in --tito-allowed-append-roles choices#6

Open
DavidBellamy wants to merge 50 commits intomainfrom
fix/tito-allow-assistant-append
Open

arguments: allow 'assistant' in --tito-allowed-append-roles choices#6
DavidBellamy wants to merge 50 commits intomainfrom
fix/tito-allow-assistant-append

Conversation

@DavidBellamy
Copy link
Copy Markdown
Collaborator

Summary

Widen the --tito-allowed-append-roles argparse choices from ['tool', 'user', 'system'] to ['tool', 'user', 'system', 'assistant'] so multi-turn agent harnesses (e.g. Harbor's terminus-2) can append their own planning or self-reflection assistant messages between tool calls.

Default remains ['tool'], so existing callers are unaffected.

Why

Terminus-2 in multi-turn agent loops appends its own assistant-role messages to the conversation before the next tool or user message (self-reflection, chain-of-thought, retry framing). TITO's session-server (miles/rollout/session/linear_trajectory.py:assert_messages_append_only_with_allowed_role) checks each appended role against the allowlist derived from this flag, and without assistant in the allowed set it raises:

litellm.BadRequestError: OpenAIException - Error code: 400 -
{'error': "appended message at index N has role='assistant',
allowed=['tool', 'user']; to allow more roles use --tito-allowed-append-roles"}

The error message suggests using the flag to fix it, but argparse then rejects 'assistant' with invalid choice: 'assistant' (choose from 'tool', 'user', 'system'), so users are stuck.

Observed

On job LLM360/RL360#1565146 (GLM-4.7-Flash + PD + L3 + real terminus-2 on tblite), this blocks every non-first agent turn. Mock-agent runs never triggered it because the mock never appends assistant messages.

Tested

  • python -c "from miles.utils.arguments import *" imports cleanly
  • Passes our internal RL360 training launcher once this lands on the deploy branch (auto-rebuilt every 15 min from upstream + LLM360 PRs)

Related

Downstream issue: LLM360/RL360#230 (plumbing of sampling_params.max_tokens through harbor to litellm). This TITO fix is the next-in-sequence bug on the real-agent path.

DavidBellamy and others added 30 commits April 5, 2026 04:43
When the session server restarts, all in-memory sessions are lost.
Previously this returned 404 to every active agent, cascading
failures across all running trials.

Now, get_or_create_session() auto-creates the session if it does
not exist, allowing agents to transparently recover after a
router restart. The GET /sessions/{session_id} and chat completions
endpoints both use this new method.
When SGLang returns 400 "rollback failed" (prefix-cache state
mismatch), retry the request once without pretokenized input_ids.
This bypasses prefix continuation and lets SGLang process the
request from scratch.

Previously, rollback failures were passed through to the caller
as fatal errors, ending the session on the first request.
- Guard against None response_body (use `or b""` instead of default)
- Use .lower() for case-insensitive "rollback failed" matching
- Only retry without prefix continuation if input_ids was present
- GET /sessions/{id} returns empty response for unknown sessions instead
  of auto-creating (keeps the endpoint idempotent)
- Auto-creation in POST still works for restart tolerance
- Add TTL-based eviction (2h) for auto-created sessions to prevent
  unbounded memory growth
rollout_routed_experts has shape (len(tokens)-1, num_layers, topk) and
was not truncated alongside tokens/logprobs/loss_mask, causing
Sample.validate() to fail with an assertion error when agentic sessions
exceed max_seq_len with routing replay enabled.
Directly reflects the invariant rather than reconstructing the index
from prompt_len and keep_tokens.
…pdating (radixark#890)

Co-authored-by: Yueming Yuan <yym022502@gmail.com>
Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net>
…adixark#654)

Co-authored-by: GuanxingLu <gxlu02@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… CLI args in swe-agent-v2 (radixark#954)

Co-authored-by: Shi Dong <shi.dong@radixark.ai>
…#952)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Yueming Yuan <yym022502@gmail.com>
…adixark#926)

Co-authored-by: guapisolo <guapisolo@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rmers >=5.0 (radixark#927)

Co-authored-by: guapisolo <guapisolo@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…#948)

Co-authored-by: guapisolo <guapisolo@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Zhichenzzz and others added 20 commits April 10, 2026 20:06
…log dtype (radixark#975)

Co-authored-by: yueming-yuan <yym022502@gmail.com>
…adixark#974)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_compute_zero_std_metrics crashes with TypeError when any zero-std group's
leading sample has a None reward (typical for Status.ABORTED trials):

  File "miles/ray/rollout.py", line 1266, in _compute_zero_std_metrics
    interesting_rewards = [str(round(g[0].get_reward_value(args), 1)) ...]
  TypeError: type NoneType doesn't define __round__ method

This crash fires on RolloutManager.generate() inside _log_rollout_data,
after the rollout collection + dynamic sampling filter have already
accepted the batch. With agentic tasks where some trials routinely abort
(Daytona sandbox timeout, tool-invocation loops, etc.), the trainer never
receives the batch and optimizer.step() never fires, so async RL training
silently stalls.

Fix: extract a _reward_label helper that buckets None-reward samples under
a dedicated 'none' label instead of passing None to round(). This keeps
the metric informative (zero_std/count_none shows the aborted-group count)
and preserves the existing behavior for numeric rewards.

Observed on LLM360/RL360 radixark#76 FAST_ITER smoke runs (job 1559799) with
GLM-4.7-Flash on agentic terminal-bench tasks.
… path

The old sglang_router (<=0.2.1) and the miles-router both use the single-arg
/add_worker?url=... endpoint for engine registration. Previously, the Miles
engine asserted worker_type=='regular' before hitting that endpoint, so any
attempt to stand up prefill/decode workers via the miles-router path
(including the sgl-model-gateway that mirrors it) fail-fasts at engine
init:

  AssertionError: pd disaggregation is not supported in old router or miles router.

This blocks PD disagg throughput scaling in any deployment that uses the
miles-router path, even when the receiving router (e.g. sgl-model-gateway
with a PD-aware shim) can handle worker_type on /add_worker.

Relax the assertion: forward worker_type (and bootstrap_port for prefill)
as extra query params. Routers that honor them get PD registration;
routers that only accept the single-arg form ignore the extras and register
as regular, with a warning logged so the fallback is visible.

The companion server-side change is on the receiving router:
  - sgl-model-gateway must accept ?worker_type=&bootstrap_port= on /add_worker
  - Or deployments can use the newer /workers endpoint (non-miles path).

Context: LLM360/RL360 radixark#76. Track G (job 1559336) showed full PD KV
transfer via mooncake works with SGLang's own mini_lb; this unblocks the
same flow through Miles-driven rollouts.
SGLangEngine Ray actors created via RolloutRayActor.options(runtime_env=
{'env_vars': env_vars}) only inherit the env vars explicitly listed in
env_vars. PYTHONPATH is not included, so the actor process uses only the
container's default PYTHONPATH. In deployments that install miles as a
pip package in the container image (e.g. the radixark/miles:dev overlay),
'import miles' in the actor imports from /root/miles (site-packages
pointer), silently bypassing any MILES_OVERRIDE prepended to PYTHONPATH
on the driver.

This means local-patched miles code on the driver's PYTHONPATH is not
executed in actors, so per-cluster patches (e.g. kept in a shared clone
under SHARED_DIR/miles) never reach SGLangEngine.

Observed on LLM360/RL360#76: the driver had the patched miles from
LLM360/miles:deploy visible via PYTHONPATH, but SGLangEngine actors
still hit the assert from an earlier revision because they imported
from the container's /root/miles.

Fix: copy os.environ['PYTHONPATH'] into env_vars when it is set.
No-op when the driver doesn't have PYTHONPATH exported (container
defaults apply; same behavior as before).
…proxy response

build_proxy_response forwards upstream gateway headers into either JSONResponse
or Response. Both re-serialize / re-frame the body:

  * JSONResponse runs json.dumps over the parsed content, so whitespace and
    unicode escape behavior may produce a different byte count than the upstream
    did.
  * Response may be re-framed by Starlette with chunked transfer encoding.

Forwarding the upstream content-length, transfer-encoding, or content-encoding
in these cases causes a mismatch between declared framing and the bytes
Starlette actually writes. Clients (e.g. Miles's own http_utils.post) then
error with h11 LocalProtocolError 'Too much data for declared Content-Length'
or 'peer closed connection without sending complete message body' and retry.

Observed: on a mock-agent FAST_ITER run with PD disaggregation through a
gateway that serializes merged prefill+decode logprobs, ~200 of 332 chat
completions hit this error before mock retries salvaged training progress.

Strip the three hop-by-hop headers before building the outgoing Response;
Starlette / hyper then recompute the correct framing from the actual body.
…pe-on-miles-router=f0c9d3cc1f9f9a9ed98723e9462f8e1a3465a428,fix/guard-round-none-in-zero-std-metrics=7b7efa922b47f3cb2b3d1e1b6c549ae6d7aa640e,fix/propagate-pythonpath-to-ray-remote-actors=779839cb2569f411a373dac9a229b469aa0aa991,fix/rollback-error-recovery=dd188aaddb887e5c1de066277e7c9f10eec297b0,fix/session-auto-create=29a0dcad45e36985c7af6f7c1dcf653e0eace4f2,fix/session-server-strip-stale-content-length-clean=9a0ef97613294d854b60e98e2202f58a7734936f,fix/truncate-routed-experts=25645357a7d51daa307a9fa25081095bc3cfb2a1
Multi-turn agent harnesses such as Harbor's terminus-2 append their own
planning or self-reflection assistant messages to the conversation
before the next tool/user turn. TITO's session-server validates the
appended role against this allowlist; without 'assistant' in the choices,
the agent's 400 surfaces as:

  litellm.BadRequestError: OpenAIException - Error code: 400 -
  {'error': "appended message at index N has role='assistant', allowed=[...];
  to allow more roles use --tito-allowed-append-roles"}

and the user cannot fix it via CLI because argparse rejects 'assistant'
with 'invalid choice'. This widens the allowlist; the default remains
['tool'], so no change for existing callers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.