Skip to content

feat(providers): 1h system-prompt cache + thinking-effort lever#210

Merged
oratis merged 1 commit into
mainfrom
feat/model-tuning
Jul 2, 2026
Merged

feat(providers): 1h system-prompt cache + thinking-effort lever#210
oratis merged 1 commit into
mainfrom
feat/model-tuning

Conversation

@oratis

@oratis oratis commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Model-appropriate tuning from the OpenClaw research — kept only what applies to LISA's default claude-sonnet-4-6. Plan + pros/cons debate in docs/PLAN_MODEL_TUNING_v1.0.md.

Dropped (verified against the claude-api reference): 1M context is already native on Sonnet 4.6; fast-mode / task-budgets / service_tier are gated to Opus 4.8/4.7 or Sonnet 5 (400 on Sonnet 4.6).

A. 1h prompt caching on the stable system prefix (soul+skills+memory) — stays warm across think-time gaps in a bursty personal session instead of a cold re-write every 5 min. Tail stays 5-min; LISA_CACHE_TTL=5m opts back. Verified end-to-end via the relay: ephemeral_1h_input_tokens=3603 written turn 1, cache_read_input_tokens=3603 on turn 2.

B. Thinking effort (output_config.effort, GA on Sonnet 4.6) threaded provider→agent→subagent. Default-off globally (keeps "high"); dispatched subagents default low; LISA_EFFORT overrides for power users.

Verified: tsc clean · 64 provider + 4 subagent tests pass · live /chat unchanged · already rebuilt+running on the local backend.

🤖 Generated with Claude Code

…ropic)

Model-appropriate tuning borrowed from OpenClaw's research, kept only where it
applies to LISA's default model (claude-sonnet-4-6) — see docs/PLAN_MODEL_TUNING_v1.0.md
for the plan + pros/cons debate. Most OpenClaw knobs were dropped: 1M context is
already native on Sonnet 4.6, and fast-mode/task-budgets/service_tier are gated to
Opus 4.8/4.7 or Sonnet 5 (400 on Sonnet 4.6).

A. Extended (1h) prompt caching on the stable system prefix (soul+skills+memory)
   so it stays warm across think-time gaps in a bursty personal session instead of
   a cold re-write every 5 min. Conversational tail stays 5-min. LISA_CACHE_TTL=5m
   opts back for heavy-continuous use. Verified end-to-end via the relay:
   ephemeral_1h_input_tokens written on turn 1, cache_read on turn 2.

B. Optional thinking effort (output_config.effort, GA on Sonnet 4.6) threaded
   provider→agent→subagent. Default-off globally (keeps API default "high");
   dispatched subagents default to "low" (cheap parallel work); LISA_EFFORT
   overrides globally for power users.

Verified: tsc clean; 64 provider + 4 subagent tests pass; live /chat unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@oratis oratis merged commit 5a112aa into main Jul 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant