diff --git a/README.md b/README.md index 716ac9a..3afa0a9 100644 --- a/README.md +++ b/README.md @@ -1,110 +1,44 @@ -

- Comptextv7 logo -

+# CompText V7 -

CompText V7

+CompText V7 is a deterministic replay-validation prototype for compact operational agent/MCP traces, with a KVTC-V7 technical-log compression prototype. The repository checks whether fixture-defined operational commitments survive compaction and replay using local code, committed fixtures, deterministic metrics, and reproducible artifacts. -

- Deterministic replay-integrity validation for compressed MCP-style operational traces. -

+## What this repo implements -

- No embeddings • No vector DB • No semantic scoring • No LLM judges -

+- Deterministic replay validation for compact operational trace state. +- Curated agent trace fixtures under `tests/fixtures/agent_traces/`. +- A deterministic agent trace replay runner in `tests/utils/agent_trace_replay_runner.py`. +- An MCP replay payload layer in `src/comptext_v7/mcp/`. +- Evidence survival helpers in `src/validation/evidence.py`. +- Stable replay failure labels in `src/validation/replay_failure_classifier.py`. +- Committed replay artifacts, including `artifacts/agent_trace_replay_results.json` and `artifacts/mcp_trace_replay_results.json`. +- A KVTC-V7 technical-log compression prototype in `src/core/kvtc_v7.py`. -

- CI - Python - Deterministic Replay - Replay Native - Replay Artifacts -

+## What it does not claim -

- Research Positioning - · Benchmark Details - · Multi-Family Benchmark - · Failure Taxonomy -

+- No embeddings. +- No vector database. +- No LLM judges. +- No external APIs in validation. +- No autonomous agent framework or workflow orchestrator. +- No production-readiness, enterprise-readiness, certification, or compliance claim. +- No universal AI-memory or solved-memory claim. -CompText V7 validates whether compressed operational commitments survive deterministic replay reconstruction in MCP-style agent workflows. +## Implemented surfaces ---- - -## In 30 seconds - -Long-horizon agents compress prior work into smaller summaries. Those summaries can silently lose blockers, constraints, evidence, dependency order, recovery paths, and tool order. - -CompText V7 treats that as a deterministic replay-validation problem. It checks whether compressed operational state remains admissible after reconstruction using fixture-defined contracts, exact scoring, failure labels, committed artifacts, and CI gates. - ---- - -## What CompText V7 is - -- Deterministic replay-validation infrastructure for operational state. -- Fixture-bound and contract-linked. -- Artifact-backed with reproducible JSON/SVG outputs. -- CI-reproducible through repository checks. -- Focused on operational admissibility, not prose quality. - -## What CompText V7 is not - -- Agent framework. -- Workflow orchestrator. -- Learned compressor. -- Vector memory system. -- RAG replacement. -- KV-cache optimizer. -- Production telemetry platform. -- Clinical-grade system. -- Universal AI-memory solution. -- LLM judge. - ---- - -## Replay validation model - -```mermaid -flowchart LR - A["Checked-in fixture"] --> B["Original operational state"] - B --> C["Reconstructed replay state"] - C --> D["Contract validator"] - D --> E["Admissibility scorer"] - E --> F["Failure labels"] - E --> G["Committed artifacts"] - G --> H["CI gates"] - F --> H -``` - ---- - -## Operational commitments - -CompText V7 validates whether deterministic replay reconstruction preserves: - -- evidence -- constraints -- blockers -- dependencies -- recovery paths -- tool order -- capability boundaries -- governance/policy gates - -The `mcp_trace_replay` fixture family validates deterministic replay safety for tool order, validation-before-action, dependency chains, recovery paths, and capability boundaries. Registered contracts: `tool_call_order_preserved`, `validation_before_unsafe_action`, `dependency_chain_preserved`, `recovery_path_available`, `capability_boundary_respected`. - ---- - -## Current fixture-bound signal +| Surface | Source | +| --- | --- | +| Curated agent traces | `tests/fixtures/agent_traces/` | +| Agent trace replay runner | `tests/utils/agent_trace_replay_runner.py` | +| MCP replay payload extraction, rendering, and validation | `src/comptext_v7/mcp/` | +| Evidence survival checks | `src/validation/evidence.py` | +| Replay failure labels | `src/validation/replay_failure_classifier.py` | +| Agent trace replay artifact | `artifacts/agent_trace_replay_results.json` | +| MCP trace replay artifact | `artifacts/mcp_trace_replay_results.json` | +| KVTC-V7 technical-log compression prototype | `src/core/kvtc_v7.py` | -- Four manifest-registered operational fixture families. -- Standard levels: `baseline`, `mild`, `moderate`, `severe`. -- Deterministic evaluation mode. -- Exact rational scoring. -- Reproducible artifacts. -- No LLM judges or external APIs. +## Committed artifact snapshot -These are internal fixture-bound results, not external benchmark claims, production-readiness claims, or solved-memory claims. +These fixture-bound values are checked against committed deterministic artifacts. | Signal | Current fixture-bound result | | --- | ---: | @@ -118,248 +52,19 @@ These are internal fixture-bound results, not external benchmark claims, product | Agent replay consistency | `1.000000` | | Agent operational drift | `0.000000` | ---- - -## Artifact evidence pipeline - -```mermaid -flowchart LR - A["fixtures/manifest.json"] --> B["Fixture families"] - B --> C["DegradationCurveGenerator"] - B --> D["AdmissibilityScorer"] - C --> E["multi_family_admissibility_curves.svg"] - D --> F["layered_admissibility_results.json"] - D --> G["multi_family_admissibility_results.json"] - F --> H["Reproducibility tests"] - G --> H - E --> I["Progression tests"] - H --> J["GitHub Actions"] - I --> J -``` - ---- - -## AI workflow safety evidence - -CompText V7 includes a local-only, deterministic evidence chain for AI-assisted repository work. The chain is intended to make review evidence inspectable and artifact-backed without adding external services or runtime orchestration. - -- `scripts/safe_pr_gate.py` checks the current branch, working-tree state, changed-file scope, and minimal privacy boundaries. -- `scripts/agent_artifact_bundle.py` records branch, changed files, safe-gate output, validation evidence, and optional MCP context output references. -- `scripts/validate_agent_artifact_bundle.py` validates committed or generated bundle shape and deterministic status fields. -- `scripts/pr_body_from_agent_bundle.py` renders PR body Markdown from bundle data without inventing claims. -- `scripts/ai_workflow_snapshot.py` emits a compact JSON snapshot that combines safe-gate and bundle evidence. -- MCP context output references point to repo-relative artifacts, such as `artifacts/mcp_context_layer_example.json`, rather than embedding full replay payloads in every bundle. - -This chain is local-only and uses deterministic JSON/Markdown outputs. It does not call external APIs, contact GitHub APIs, add timestamps or random IDs, execute runtime tools, or perform semantic scoring. It is not an autonomous agent framework, workflow orchestrator, vector memory system, or runtime tool executor. - ---- - -## Minimal deterministic example - -```json -{ - "original_operational_state": { - "policy_steps": ["identify_owner", "collect_evidence", "execute_recovery"], - "causal_dependencies": [["alert", "triage"], ["triage", "recovery"]], - "recovery_paths": ["ack -> mitigation_runbook"] - }, - "reconstructed_state": { - "policy_steps": ["collect_evidence", "identify_owner", "execute_recovery"], - "causal_dependencies": [["alert", "recovery"]], - "recovery_paths": [] - }, - "deterministic_validation_result": { - "admissible": false, - "failure_labels": [ - "POLICY_ORDER_BROKEN", - "CAUSAL_DEPENDENCY_LOSS", - "RECOVERY_PATH_INVALID", - "INVARIANT_VIOLATION" - ] - } -} -``` - ---- +The committed comparative replay artifact includes BALANCED failure labels `EVIDENCE_LOSS` and `CONSTRAINT_DRIFT`. -## Proof artifacts - -| Artifact | Purpose | -| --- | --- | -| `artifacts/layered_admissibility_results.json` | Layered admissibility outputs. | -| `artifacts/multi_family_admissibility_results.json` | Multi-family deterministic aggregates. | -| `artifacts/multi_family_admissibility_curves.svg` | Deterministic degradation curve rendering. | -| `artifacts/mcp_trace_replay_results.json` | Deterministic MCP trace replay contract outcomes. | -| `artifacts/replay_semantic_integrity_results.json` | Deterministic replay semantic integrity outcomes. | -| `docs/benchmarks/multi_family_admissibility_benchmark.md` | Benchmark method and interpretation boundaries. | -| `docs/failure_taxonomy.md` | Failure label documentation. | - ---- - -## Verify locally +## Validation commands ```bash -python -m pip install -e '.[test]' -npm install --no-save --no-package-lock +npm run layout +pytest -q npm run check -pytest tests/test_failure_taxonomy.py -q -pytest tests/test_multi_family_admissibility_artifact.py -q -pytest tests/test_multi_family_svg_renderer.py -q -pytest tests/test_paper_replay_bench.py tests/test_agent_trace_replay.py -q -``` - ---- - -## Benchmark families - -- `coding_workflow_pr_review` -- `incident_response_page_triage` -- `cross_domain_operational_dependency_workflow` -- `mcp_trace_replay` - -```mermaid -flowchart LR - A["coding_workflow_pr_review"] --> L1["baseline"] - A --> L2["mild"] - A --> L3["moderate"] - A --> L4["severe"] - B["incident_response_page_triage"] --> L1 - B --> L2 - B --> L3 - B --> L4 - C["cross_domain_operational_dependency_workflow"] --> L1 - C --> L2 - C --> L3 - C --> L4 - D["mcp_trace_replay"] --> L1 - D --> L2 - D --> L3 - D --> L4 - L1 --> M["manifest registration"] - L2 --> M - L3 --> M - L4 --> M - M --> N["multi-family artifact"] - N --> O["deterministic SVG"] ``` ---- - -## Failure labels - -Primary registered labels used across deterministic admissibility validation: - -- `POLICY_ORDER_BROKEN`: required policy order failed. -- `TOOL_ORDER_VIOLATION`: replayed tool sequence violated required order. -- `CAUSAL_DEPENDENCY_LOSS`: required causal edges were not preserved. -- `DEPENDENCY_CHAIN_BREAK`: required dependency chain broke. -- `RECOVERY_PATH_INVALID`: recovery reachability contract failed. -- `RECOVERY_PATH_LOSS`: required recovery route was not preserved. -- `INVARIANT_VIOLATION`: declared invariant failed. -- `EVIDENCE_LOSS`: required evidence did not survive replay. -- `EVIDENCE_SURVIVAL_LOSS`: expected evidence units were not preserved. -- `HIGH_CRITICAL_EVIDENCE_LOSS`: high-critical evidence was lost. -- `CONSTRAINT_DRIFT`: constraint preservation drifted. -- `BLOCKER_DETACHMENT`: blocker attachment was lost. -- `GOVERNANCE_DRIFT`: governance constraint drifted. -- `ARTIFACT_INTEGRITY_VIOLATION`: artifact integrity drifted. -- `REPLAY_NON_REPRODUCIBLE`: deterministic replay was not reproducible. - -```mermaid -flowchart LR - O1["POLICY_ORDER_BROKEN"] --> C1["ordering"] - O2["TOOL_ORDER_VIOLATION"] --> C1 - D1["CAUSAL_DEPENDENCY_LOSS"] --> C2["causality/dependency"] - D2["DEPENDENCY_CHAIN_BREAK"] --> C2 - R1["RECOVERY_PATH_INVALID"] --> C3["recovery/reachability"] - R2["RECOVERY_PATH_LOSS"] --> C3 - I1["INVARIANT_VIOLATION"] --> C4["invariant/no-orphan"] - E1["EVIDENCE_LOSS"] --> C5["evidence/criticality"] - E2["EVIDENCE_SURVIVAL_LOSS"] --> C5 - E3["HIGH_CRITICAL_EVIDENCE_LOSS"] --> C5 - E4["CONSTRAINT_DRIFT"] --> C5 - E5["BLOCKER_DETACHMENT"] --> C5 - E6["GOVERNANCE_DRIFT"] --> C5 - A1["ARTIFACT_INTEGRITY_VIOLATION"] --> C6["artifact/reproducibility"] - A2["REPLAY_NON_REPRODUCIBLE"] --> C6 -``` - ---- - -## How this differs from adjacent systems - -| System type | Stores state | Compresses context | Orchestrates agents | Deterministically validates replay loss | -| --- | --- | --- | --- | --- | -| Workflow runtimes | Sometimes | No | Yes | No | -| Agent frameworks | Sometimes | Sometimes | Yes | Usually no | -| Vector memory / RAG | Yes | Retrieval-centric | No | No | -| Learned prompt compressors | Sometimes | Yes | No | Usually no | -| LLM-as-judge evaluators | Sometimes | N/A | No | No | -| CompText V7 | Yes | Yes | No | Yes | - ---- - -## CI and merge gate - -```mermaid -flowchart LR - A["PR head SHA"] --> B["GitHub Actions"] - B --> C["Agent Workflow Checks"] - B --> D["hash-companion-validation"] - B --> E["CompText V7 Industrial Validation"] - C --> F["all success"] - D --> F - E --> F - F --> G["squash merge"] -``` - -Vercel/Netlify/deployment previews are not merge gates unless explicitly scoped. - ---- - -## Repository map - -```text -Comptextv7/ -├── artifacts/ -├── docs/ -├── fixtures/ -├── reports/ -├── scripts/ -├── tests/ -└── src/ - ├── core/ - └── validation/ -``` - ---- - -## Replay-validation roadmap - -```mermaid -flowchart LR - A["failure taxonomy"] --> B["cross-domain fixture families"] - B --> C["forensic reports"] - C --> D["schema stabilization"] - D --> E["cross-family comparison"] - E --> F["integrity gates"] - F --> G["golden corpus"] - G --> H["offline import/export"] -``` - -- Forensic audit reports with deterministic exports. -- Artifact schema stabilization. -- Cross-family degradation comparison. -- Minimal artifact integrity gates. -- Golden corpus foundation. -- Offline import/export schemas only. - ---- - -## Limitations +## Limitations -- Metrics are fixture-bound and internal to checked-in datasets. -- Fixtures are curated and checked in, not live production traces. -- This is a deterministic prototype, not a production-readiness claim. -- This is not a universal AI-memory claim. -- This does not claim runtime integration or orchestration coverage. +- Results are fixture-bound and based on checked-in data. +- Curated fixtures are not live production traces. +- Replay validation is deterministic and local; it does not use semantic scoring, embeddings, vector search, LLM judges, or external APIs. +- The KVTC-V7 compressor is a prototype for structured technical logs, not a production telemetry platform.