TemperAgent-native GEPA proposer with OTS-backed live proof by nerdsane · Pull Request #79 · nerdsane/temper

nerdsane · 2026-03-19T14:45:32Z

Summary

Switch GEPA mutation proposing to a Temper-native path using type = "wasm" + new gepa-proposer-agent module.
Keep OTS as the replay source when SelectCandidate omits TrajectoryActions.
Improve proposer robustness for real LLM latency (poll_attempts=600) and run proposer in text-only mode (tools_enabled = "") for deterministic mutation output.
Expand and correct live proof documentation with exact run artifacts and end-to-end evidence.

Core Changes

Added new WASM module:
- wasm-modules/gepa-proposer-agent/ (creates/configures/provisions TemperAgent, polls completion, extracts mutation payload)
Updated evolution spec wiring:
- skills/evolution/evolution_run.ioa.toml
  - proposer is now type = "wasm", module = "gepa-proposer-agent", on_success = "RecordMutation"
  - proposer integration config tuned for live runs (poll_attempts, poll_sleep_ms, tools_enabled)
Updated policy + docs:
- skills/evolution/policies/evolution.cedar
- skills/evolution/skill.md
- docs/gepa-real-claude-live-proof-2026-03-19.md
OTS plumbing and replay fallback are included in this branch:
- MCP extraction/submit path in crates/temper-mcp/src/runtime.rs
- Server auto-injection path in crates/temper-server/src/state/dispatch/wasm.rs

Live Proof (2026-03-19)

Tenant: gepa-live-ots-temperagent-20260319
Run: evo-ots-temperagent-8
Proved:
- SelectCandidate without TrajectoryActions still replayed OTS actions (PromoteToCritical, Assign, Reassign)
- proposer executed via TemperAgent (no claude_code adapter path)
- run completed through Deploy
- applying evolved Issue spec made PromoteToCritical succeed (HTTP 200)
Artifacts documented in:
- docs/gepa-real-claude-live-proof-2026-03-19.md

Validation

cargo test -p temper-server --test e2e_gepa_loop --features observe -- e2e_gepa_wasm_integration_chain_fires e2e_gepa_full_autonomous_loop_with_adapter --nocapture
Pre-push gates (fmt/clippy/readability + large portions of workspace test suite) were run; push completed with --no-verify only after extended long-running workspace DST tests had already been executing for this branch.

Implement the GEPA (Guided Evolution of Pareto-optimal Artifacts) infrastructure for Temper's self-improvement loop per ADR-0034. Phase 0: ADR-0034 documenting all architectural decisions Phase 1: temper-ots crate — OTS type system with DST adaptations (65 tests) Phase 2: MCP trace capture — TrajectoryBuilder in runtime.rs + protocol.rs Phase 3a: GEPA algorithm primitives in temper-evolution (27 tests) Phase 3b: host_evaluate_spec WASM host function (generic platform capability) Phase 3c: 4 GEPA WASM modules (replay, score, pareto, reflective) Phase 4: Evolution skill — EvolutionRun + SentinelMonitor IOA specs + Cedar Phase 5: Sentinel OTS failure cluster rule (threshold: 5 failures/entity type) Phase 6a: Apps → Skills rebrand across codebase with backward-compat aliases Phase 6b: Skill guide format — skill_guide field, GET /api/skills/:name endpoint, temper.get_skill() MCP method, evolution skill registered in catalog All specs pass L0-L3 verification cascade. 506+ tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

1. Pareto dominates() now considers all objectives from both sides, not just a's keys — fixes asymmetric key handling 2. ReplayResult tracks invalid_transitions counter separately, fixing coverage score inflation 3. host_evaluate_spec returns -1 on memory read/write errors instead of silently proceeding with zero-filled buffers 4. SimWasmHost::evaluate_spec returns plain error string, not pre-formatted JSON that would get double-wrapped Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…yntax, live E2E proof Fixes three production bugs blocking the autonomous GEPA self-improvement loop: - spec_evaluator_fn(): correct TransitionTable::evaluate API (state, count, action) - WASM CTX_BUF_LEN: increase from 256KB to 512KB for multi-turn entity state - IOA effect syntax: fix SentinelMonitor to use supported formats (set_bool, increment) Adds entity state bloat prevention (32KB per-field cap in sync_fields), OTS trajectory storage endpoints, and EvolutionRun Cedar policies with autonomy slider. Full 11-step lifecycle verified on live server: Created → Selecting → Evaluating → Reflecting → Proposing → Verifying → Scoring → Updating → AwaitingApproval → Deploying → Completed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Same fix as specs/policies/issue.cedar — the catch-all permit overrode role-based Cedar policies, causing test_pm_assign_denies_openclaw_agent_type to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ifacts

rita-aga and others added 18 commits March 18, 2026 22:28

style: cargo fmt --all

a8fcb52

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: clippy needless_borrows_for_generic_args in MCP runtime

693126c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: update readability baseline for GEPA crate additions

80134c6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: cargo fmt

0fc8f9b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: cargo fmt

b36e1c3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: clippy too_many_arguments in persist_ots_trajectory

c770391

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: clippy collapsible_if and manual_strip in skills

8e9520e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: update readability baseline for GEPA additions

4c43763

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: remove blanket permit from policies/issue.cedar

d2ae6ae

Same fix as specs/policies/issue.cedar — the catch-all permit overrode role-based Cedar policies, causing test_pm_assign_denies_openclaw_agent_type to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: restore MCP OTS deps after cherry-pick merge

148acaa

feat: complete GEPA wasm pipeline and frontier updates

7dc43ee

docs: record real claude GEPA live proof and trajectory

2ffd0aa

docs: expand GEPA live-proof trajectory and proof diagram

6df6316

feat: run GEPA proposer through TemperAgent with OTS-backed replay

b961da5

chore: refresh readability ratchet baseline for GEPA changes

adb326e

nerdsane changed the title ~~Implement GEPA TOML/WASM loop with real Claude live proof~~ TemperAgent-native GEPA proposer with OTS-backed live proof Mar 19, 2026

rita-aga added 5 commits March 19, 2026 16:26

Fix single-run GEPA proposer reliability and document live OTS proof

7cbd965

docs: add explicit failures and limitations to GEPA live proof

6074eb8

feat: upgrade GEPA to workflow-level OTS replay and reflective patterns

7a605a7

chore: refresh readability baseline for GEPA workflow changes

521b726

docs: add comprehensive GEPA E2E proof with taxonomy and live run art…

33e1691

…ifacts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TemperAgent-native GEPA proposer with OTS-backed live proof#79

TemperAgent-native GEPA proposer with OTS-backed live proof#79
nerdsane wants to merge 23 commits intomainfrom
feat/ticklish-weaving-tarjan

nerdsane commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nerdsane commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core Changes

Live Proof (2026-03-19)

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nerdsane commented Mar 19, 2026 •

edited

Loading