feat(providers): 1h system-prompt cache + thinking-effort lever#210
Merged
Conversation
…ropic) Model-appropriate tuning borrowed from OpenClaw's research, kept only where it applies to LISA's default model (claude-sonnet-4-6) — see docs/PLAN_MODEL_TUNING_v1.0.md for the plan + pros/cons debate. Most OpenClaw knobs were dropped: 1M context is already native on Sonnet 4.6, and fast-mode/task-budgets/service_tier are gated to Opus 4.8/4.7 or Sonnet 5 (400 on Sonnet 4.6). A. Extended (1h) prompt caching on the stable system prefix (soul+skills+memory) so it stays warm across think-time gaps in a bursty personal session instead of a cold re-write every 5 min. Conversational tail stays 5-min. LISA_CACHE_TTL=5m opts back for heavy-continuous use. Verified end-to-end via the relay: ephemeral_1h_input_tokens written on turn 1, cache_read on turn 2. B. Optional thinking effort (output_config.effort, GA on Sonnet 4.6) threaded provider→agent→subagent. Default-off globally (keeps API default "high"); dispatched subagents default to "low" (cheap parallel work); LISA_EFFORT overrides globally for power users. Verified: tsc clean; 64 provider + 4 subagent tests pass; live /chat unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Model-appropriate tuning from the OpenClaw research — kept only what applies to LISA's default
claude-sonnet-4-6. Plan + pros/cons debate in docs/PLAN_MODEL_TUNING_v1.0.md.Dropped (verified against the claude-api reference): 1M context is already native on Sonnet 4.6; fast-mode / task-budgets / service_tier are gated to Opus 4.8/4.7 or Sonnet 5 (400 on Sonnet 4.6).
A. 1h prompt caching on the stable system prefix (soul+skills+memory) — stays warm across think-time gaps in a bursty personal session instead of a cold re-write every 5 min. Tail stays 5-min;
LISA_CACHE_TTL=5mopts back. Verified end-to-end via the relay:ephemeral_1h_input_tokens=3603written turn 1,cache_read_input_tokens=3603on turn 2.B. Thinking effort (
output_config.effort, GA on Sonnet 4.6) threaded provider→agent→subagent. Default-off globally (keeps "high"); dispatched subagents defaultlow;LISA_EFFORToverrides for power users.Verified:
tscclean · 64 provider + 4 subagent tests pass · live/chatunchanged · already rebuilt+running on the local backend.🤖 Generated with Claude Code