TemperAgent-native GEPA proposer with OTS-backed live proof#79
Open
TemperAgent-native GEPA proposer with OTS-backed live proof#79
Conversation
Implement the GEPA (Guided Evolution of Pareto-optimal Artifacts)
infrastructure for Temper's self-improvement loop per ADR-0034.
Phase 0: ADR-0034 documenting all architectural decisions
Phase 1: temper-ots crate — OTS type system with DST adaptations (65 tests)
Phase 2: MCP trace capture — TrajectoryBuilder in runtime.rs + protocol.rs
Phase 3a: GEPA algorithm primitives in temper-evolution (27 tests)
Phase 3b: host_evaluate_spec WASM host function (generic platform capability)
Phase 3c: 4 GEPA WASM modules (replay, score, pareto, reflective)
Phase 4: Evolution skill — EvolutionRun + SentinelMonitor IOA specs + Cedar
Phase 5: Sentinel OTS failure cluster rule (threshold: 5 failures/entity type)
Phase 6a: Apps → Skills rebrand across codebase with backward-compat aliases
Phase 6b: Skill guide format — skill_guide field, GET /api/skills/:name endpoint,
temper.get_skill() MCP method, evolution skill registered in catalog
All specs pass L0-L3 verification cascade. 506+ tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Pareto dominates() now considers all objectives from both sides, not just a's keys — fixes asymmetric key handling 2. ReplayResult tracks invalid_transitions counter separately, fixing coverage score inflation 3. host_evaluate_spec returns -1 on memory read/write errors instead of silently proceeding with zero-filled buffers 4. SimWasmHost::evaluate_spec returns plain error string, not pre-formatted JSON that would get double-wrapped Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…yntax, live E2E proof Fixes three production bugs blocking the autonomous GEPA self-improvement loop: - spec_evaluator_fn(): correct TransitionTable::evaluate API (state, count, action) - WASM CTX_BUF_LEN: increase from 256KB to 512KB for multi-turn entity state - IOA effect syntax: fix SentinelMonitor to use supported formats (set_bool, increment) Adds entity state bloat prevention (32KB per-field cap in sync_fields), OTS trajectory storage endpoints, and EvolutionRun Cedar policies with autonomy slider. Full 11-step lifecycle verified on live server: Created → Selecting → Evaluating → Reflecting → Proposing → Verifying → Scoring → Updating → AwaitingApproval → Deploying → Completed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same fix as specs/policies/issue.cedar — the catch-all permit overrode role-based Cedar policies, causing test_pm_assign_denies_openclaw_agent_type to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
type = "wasm"+ newgepa-proposer-agentmodule.SelectCandidateomitsTrajectoryActions.poll_attempts=600) and run proposer in text-only mode (tools_enabled = "") for deterministic mutation output.Core Changes
wasm-modules/gepa-proposer-agent/(creates/configures/provisions TemperAgent, polls completion, extracts mutation payload)skills/evolution/evolution_run.ioa.tomltype = "wasm",module = "gepa-proposer-agent",on_success = "RecordMutation"poll_attempts,poll_sleep_ms,tools_enabled)skills/evolution/policies/evolution.cedarskills/evolution/skill.mddocs/gepa-real-claude-live-proof-2026-03-19.mdcrates/temper-mcp/src/runtime.rscrates/temper-server/src/state/dispatch/wasm.rsLive Proof (2026-03-19)
gepa-live-ots-temperagent-20260319evo-ots-temperagent-8SelectCandidatewithoutTrajectoryActionsstill replayed OTS actions (PromoteToCritical,Assign,Reassign)DeployIssuespec madePromoteToCriticalsucceed (HTTP 200)docs/gepa-real-claude-live-proof-2026-03-19.mdValidation
cargo test -p temper-server --test e2e_gepa_loop --features observe -- e2e_gepa_wasm_integration_chain_fires e2e_gepa_full_autonomous_loop_with_adapter --nocapture--no-verifyonly after extended long-running workspace DST tests had already been executing for this branch.