AgentΩ
Governed Recursive Self-Improvement Control Plane
Live Portal · Install as App · Download Source · API Reference
- What is Agent-Omega
- Why it exists
- Who needs it
- Screenshots
- Install
- Quick Start
- How to Use
- Architecture
- Repository Structure
- Constitutional Governance
- API Reference
- Technologies
- Comparison with Alternatives
- Cross-Domain Applications
- Development Philosophy
- Architectural Decision Records
- Contributing
- License
Agent-Omega is a control plane for AI systems that modify themselves. It provides the governance machinery — constitutional constraints, multi-dimensional evaluation, staged deployment, and immutable audit trails — that allows an AI agent to propose, validate, evaluate, and deploy changes to its own structure without losing safety, accountability, or the ability to roll back.
This is not a coding assistant, a chatbot framework, or a model training pipeline. It is the governance layer that sits between an AI system's desire to self-improve and its actual ability to do so.
In plain terms: if you're building an AI agent that should be able to change its own prompts, tools, reasoning strategies, or architecture — Agent-Omega is the system that decides whether each proposed change is safe, tracks what changed and why, and deploys it through staged rollout with automatic rollback.
Unrestricted self-modification collapses accountability. If a system can change any part of itself at any time, there is no stable basis for:
- Knowing what changed and why
- Evaluating whether the change was an improvement
- Rolling back if it was not
- Preventing the system from disabling its own safeguards
- Attributing decisions to evidence rather than optimization pressure
Agent-Omega was invented to solve this problem. It separates the concerns of proposing changes, validating them against structural rules, evaluating them across multiple dimensions, deciding based on evidence thresholds, and deploying them through staged rollout — each step independently auditable.
The research motivation comes from the AI safety literature: Anthropic's constitutional AI framework (4-tier priority hierarchy), formal verification approaches to safe recursive self-improvement (LessWrong/MIT), and the emerging alignment-by-architecture paradigm where systems are structurally incapable of misalignment.
- Building an AI agent that modifies itself — any agent that changes its own prompts, tools, reasoning chains, or architecture needs governance to prevent unsafe drift
- An AI safety researcher — you need a concrete, running implementation of governed self-improvement to test theories against, not just papers
- An MLOps / platform team — you need structured model deployment with evidence-based approval, staged rollout, automatic rollback, and audit trails for compliance
- Building for regulated industries — healthcare, finance, government — where you need to demonstrate change management, evidence-based decisions, and immutable records (EU AI Act Article 12, SOC 2)
- A developer building agentic applications — you want your agent to improve over time, but you need guardrails so it doesn't break itself
- You're building a simple chatbot with static prompts
- You don't need your AI system to modify itself
- You're looking for a model training or fine-tuning framework
- You want a general-purpose task runner or CI/CD pipeline
Live health status, mutation budget, archivist summary, and quick actions.
Step-by-step walkthrough with 4 use-case examples for different audiences.
Submit proposals and run the full governance pipeline: constitutional check → validate → evaluate → decide.
6 immutable constraints, mutation budget status, and dry-run constraint checker.
Create and manage deployments through shadow → canary → promote with interactive controls.
All 10 environment variables documented, 4 setup recipes, live status panel.
Auto-generated Swagger UI with every endpoint documented.
Agent-Omega is a Progressive Web App. Install it directly from the live portal:
| Platform | How to install |
|---|---|
| Windows / macOS / Linux | Open the portal in Chrome or Edge → click install icon in address bar |
| Android | Open in Chrome → menu (⋮) → "Install app" |
| iPhone / iPad | Open in Safari → Share → "Add to Home Screen" |
git clone https://github.com/fredm23579/Agent-Omega.git
cd Agent-Omega
pip install -e ".[test]"Or download the ZIP.
Requirements: Python 3.12+ only. No other system dependencies for in-memory mode.
# Start the control plane (no configuration required)
uvicorn apps.server.primary_runtime:app --reload
# Open http://localhost:8000 (redirects to web console)That's it. The server starts in in-memory mode with all governance features active. Visit http://localhost:8000/console/guide for a guided walkthrough.
- Submit a mutation: Go to
/console/mutations, fill in the form, click "Run Full Lifecycle" - Test constitutional constraints: On the same page, click "Pre-Check Constitutional" with
changed_module_ids: ["governance_core"]— watch it get blocked - Create a deployment: Go to
/console/deployments, create one, then walk it through shadow → canary → promote - Check the audit trail: Go to
/console/archivistto see recorded outcomes and patterns
curl
curl http://localhost:8000/api/v1/health
curl -X POST http://localhost:8000/api/v1/mutations/lifecycle \
-H "Content-Type: application/json" \
-d '{"mutation_class":"parameter_update","parent_system_version_id":"v1","hypothesis":"Improve quality"}'
curl http://localhost:8000/api/v1/budget/statusPython (httpx)
import httpx
client = httpx.Client(base_url="http://localhost:8000/api/v1")
# Health check
print(client.get("/health").json())
# Run mutation lifecycle
result = client.post("/mutations/lifecycle", json={
"mutation_class": "parameter_update",
"parent_system_version_id": "v1",
"hypothesis": "Improve response quality",
}).json()
print(result["decision"]) # {"decision": "quarantine", ...}
# Constitutional check (dry run)
check = client.post("/constitutional/check", json={
"changed_module_ids": ["governance_core"],
}).json()
print(check["passed"]) # False — governance modules are protected
# Budget status
print(client.get("/budget/status").json())JavaScript (fetch)
const API = "http://localhost:8000/api/v1";
// Health check
const health = await fetch(`${API}/health`).then(r => r.json());
// Run mutation lifecycle
const result = await fetch(`${API}/mutations/lifecycle`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
mutation_class: "parameter_update",
parent_system_version_id: "v1",
hypothesis: "Improve response quality",
}),
}).then(r => r.json());python -m apps.cli.main health --json
python -m apps.cli.main init my-system --json
python -m apps.cli.main providers-health --jsonA "system" is the AI agent you want to govern. Register it once:
curl -X POST http://localhost:8000/api/v1/systems -H "Content-Type: application/json" \
-d '{"name": "my-agent"}'
# Returns: {"id": "sys-1", "name": "my-agent", "status": "draft"}When your agent wants to change itself (update a prompt, add a tool, modify architecture), it submits a proposal:
curl -X POST http://localhost:8000/api/v1/mutations/lifecycle \
-H "Content-Type: application/json" \
-d '{
"proposal_id": "improve-reasoning",
"parent_system_version_id": "v1",
"mutation_class": "parameter_update",
"hypothesis": "Adding chain-of-thought will improve accuracy",
"changed_module_ids": ["reasoning-engine"],
"payload": {"temperature": 0.7}
}'The system will:
- Constitutional check — verify the proposal doesn't violate any of the 6 immutable rules
- Validate — run 7 structural checks (schema, graph, contracts, tiers, capabilities, resources, preflight)
- Evaluate — score across 7 dimensions (task improvement, generalization, risk, maintainability, calibration, efficiency, integrity)
- Decide — accept, reject, or quarantine based on evidence thresholds
If accepted, deploy through staged rollout:
# Create deployment
curl -X POST http://localhost:8000/api/v1/deploy/create \
-d '{"system_version_id": "v1"}'
# Shadow (test alongside production, no live traffic)
curl -X POST http://localhost:8000/api/v1/deploy/deploy-1/shadow
# Canary (10% of traffic)
curl -X POST http://localhost:8000/api/v1/deploy/deploy-1/canary
# Promote (100% of traffic)
curl -X POST http://localhost:8000/api/v1/deploy/deploy-1/promote
# Or rollback at any stage
curl -X POST http://localhost:8000/api/v1/deploy/deploy-1/rollback# View detected patterns
curl http://localhost:8000/api/v1/archivist/patterns
# Get audit summary
curl http://localhost:8000/api/v1/archivist/summary
# Explore version lineage
curl http://localhost:8000/api/v1/lineage/version-mutation-1| Variable | Purpose | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string for persistent storage | Not set (in-memory) |
AGENT_OMEGA_USE_PERSISTENT_REGISTRIES |
Enable database-backed registries | false |
AGENT_OMEGA_USE_MODEL_EVALUATION |
Use LLM-based evaluation instead of heuristics | false |
AGENT_OMEGA_ENABLE_JUDGE |
Enable independent judge verification | false |
AGENT_OMEGA_CORS_ORIGINS |
Allowed CORS origins (comma-separated) | * (all) |
OPENAI_API_KEY |
OpenAI API key (for model evaluation) | Not set |
ANTHROPIC_API_KEY |
Anthropic API key (for model evaluation) | Not set |
OPENROUTER_API_KEY |
OpenRouter API key (for model evaluation) | Not set |
See the Config page for setup recipes.
Mutation Proposal
│
▼
┌──────────────────┐
│ Constitutional │ 6 immutable rules (ADR-008)
│ Constraint Layer │ Cannot be bypassed or self-modified
└──────┬───────────┘
▼
┌──────────────────┐
│ Budget Check │ Rate limiting (ADR-010)
└──────┬───────────┘
▼
┌──────────────────┐
│ 7-Stage │ Schema, graph, contracts, tiers,
│ Validation │ capabilities, resources, preflight
└──────┬───────────┘
▼
┌──────────────────┐
│ 7-Dimension │ Task, generalization, risk, maintainability,
│ Evaluation │ calibration, efficiency, integrity (ADR-007)
└──────┬───────────┘
▼
┌──────────────────┐
│ Decision Engine │ Accept / reject / quarantine
│ + Judge │ Independent verification (optional)
└──────┬───────────┘
▼
┌──────────────────┐
│ Archivist │ Record outcome, detect patterns
└──────┬───────────┘
▼
┌──────────────────┐
│ Staged Deploy │ Shadow → canary → promote (ADR-006)
│ + Executor │ Health checks, traffic allocation, rollback
└──────────────────┘
| Runtime | Module | Purpose |
|---|---|---|
| Primary (preferred) | apps/server/primary_runtime.py |
Top-level entrypoint. Selects canonical or persistent composition. |
| Canonical | apps/server/canonical_runtime.py |
In-memory-default composition. |
| Persistent | apps/server/persistent_runtime.py |
All state through SQLAlchemy. Requires DATABASE_URL. |
Agent-Omega/
├── apps/
│ ├── server/ # FastAPI control plane
│ │ ├── primary_runtime.py # Preferred entrypoint
│ │ ├── api_v2_router.py # All API routes (36 endpoints)
│ │ ├── middleware.py # CORS configuration
│ │ ├── settings.py # Environment configuration
│ │ └── *_factory.py # Service graph composition
│ ├── cli/ # Typer CLI (health, init, providers)
│ ├── github_app/ # GitHub App webhooks (ADR-004)
│ └── web/ # 13-page interactive web console
│ └── console.py
├── services/
│ ├── kernel/ # Core governance
│ │ ├── constitutional.py # 6 immutable constraints (ADR-008)
│ │ ├── validation.py # 7-stage validation pipeline
│ │ ├── evaluation_engine.py # Heuristic 7-dimension scoring
│ │ ├── model_evaluation_engine.py # LLM-based scoring
│ │ ├── decision_engine.py # Evidence-based decisions
│ │ ├── adaptive_thresholds.py # Feedback-driven thresholds (ADR-009)
│ │ ├── mutation_budget.py # Rate limiting (ADR-010)
│ │ └── canonical_service.py # Composed kernel service
│ ├── deployment/ # Shadow → canary → promote (ADR-006)
│ │ ├── service.py # State machine
│ │ ├── executor.py # Health checks + traffic allocation
│ │ └── persistent_service.py # SQLAlchemy-backed
│ ├── judge/ # Independent verification (3 modes)
│ ├── archivist/ # Outcome recording + pattern detection
│ ├── ir_compiler/ # 6-stage IR compilation pipeline
│ ├── sandbox/ # Capability/resource/secret isolation (ADR-005)
│ ├── lineage/ # Version ancestry tracking
│ ├── ir_registry/ # IR version storage
│ └── system_registry/ # System record management
├── packages/
│ ├── core_types/ # 32 Pydantic models, 6 enums
│ ├── provider_openai/ # OpenAI adapter (Responses API + SSE)
│ ├── provider_anthropic/ # Anthropic adapter (Messages API + SSE)
│ ├── provider_openrouter/ # OpenRouter adapter (Chat completions + SSE)
│ ├── openclaw_bridge/ # OpenClaw thin client
│ ├── github_integration/ # JWT auth, webhooks, event routing
│ └── storage/ # 9 SQLAlchemy models, 5 Alembic migrations
├── docs/
│ ├── adr/ # 10 Architectural Decision Records
│ ├── architecture/ # System design documents
│ ├── api/ # API reference
│ └── screenshots/ # Console screenshots
├── tests/ # 602 tests (unit + integration + stress)
├── .github/workflows/ # CI: ruff + pytest on 3.12/3.13
└── pyproject.toml # v2.0.0
As of v2.0, Agent-Omega enforces 6 immutable constitutional constraints checked before any mutation enters the governance pipeline. These constraints cannot be bypassed, weakened, or self-modified:
| ID | Constraint | What it prevents |
|---|---|---|
| C1 | Governance self-preservation | No mutation may target governance or constitutional components |
| C2 | Safety service protection | No mutation may disable evaluation, judge, archivist, or sandbox |
| C3 | Deployment discipline | No mutation may bypass shadow → canary → promote |
| C4 | Auditability preservation | No mutation may remove lineage tracking or audit trails |
| C5 | Tier escalation prevention | No mutation may self-grant authority-tier escalation |
| C6 | Mutation budget | Rate limiting prevents runaway proposal generation |
Decision thresholds adapt over time from archivist pattern feedback (ADR-009), within constitutional floors that prevent unsafe relaxation.
36 endpoints across 10 groups. Full documentation at /docs (Swagger) when the server is running.
| Method | Path | Description |
|---|---|---|
| Health | ||
GET |
/api/v1/health |
Server health check |
| Systems | ||
POST |
/api/v1/systems |
Create a system record |
GET |
/api/v1/systems/{id} |
Fetch a system record |
| IR Registry | ||
POST |
/api/v1/ir/register |
Register an IR version |
GET |
/api/v1/ir/{id} |
Fetch an IR version |
GET |
/api/v1/ir/{from}/diff/{to} |
Diff two IR versions |
POST |
/api/v1/ir/compile |
Compile IR to executable artifact |
GET |
/api/v1/ir/artifacts/{id} |
Fetch a compiled artifact |
| Mutations | ||
POST |
/api/v1/mutations/propose |
Submit a mutation proposal |
POST |
/api/v1/mutations/{id}/validate |
Validate through 7-stage pipeline |
POST |
/api/v1/mutations/{id}/evaluate |
Evaluate across 7 dimensions |
POST |
/api/v1/mutations/{id}/decide |
Evidence-based decision |
POST |
/api/v1/mutations/lifecycle |
Full lifecycle (constitutional → validate → evaluate → decide) |
| Lineage | ||
POST |
/api/v1/lineage/record |
Record a lineage entry |
GET |
/api/v1/lineage/{version_id} |
Walk lineage graph to roots |
| Deployment | ||
POST |
/api/v1/deploy/create |
Create a deployment |
POST |
/api/v1/deploy/{id}/shadow |
Start shadow deployment |
POST |
/api/v1/deploy/{id}/canary |
Promote to canary |
POST |
/api/v1/deploy/{id}/promote |
Promote to production |
POST |
/api/v1/deploy/{id}/rollback |
Rollback deployment |
GET |
/api/v1/deploy/{id}/status |
Detailed deployment status |
GET |
/api/v1/deploy/{id}/health |
Health check results |
| Sandbox | ||
POST |
/api/v1/sandbox/execute |
Execute artifact in sandbox |
GET |
/api/v1/sandbox/results/{id} |
Fetch execution result |
| Providers | ||
GET |
/api/v1/providers/health |
Aggregated provider status |
POST |
/api/v1/providers/{provider}/stream |
Stream from provider (SSE) |
| Judge | ||
POST |
/api/v1/judge/verify |
Independent evaluation verification |
| Archivist | ||
POST |
/api/v1/archivist/record |
Record lifecycle outcome |
GET |
/api/v1/archivist/patterns |
Detected patterns and anti-patterns |
GET |
/api/v1/archivist/summary |
Transfer summary for time window |
| Governance | ||
POST |
/api/v1/constitutional/check |
Dry-run constitutional check |
GET |
/api/v1/budget/status |
Mutation budget status |
| Proposals | ||
POST |
/api/v1/proposals/generate |
Generate autonomous proposals from pattern data |
POST |
/api/v1/proposals/generate-and-run |
Generate + run each through full lifecycle |
| Category | Technology | Purpose |
|---|---|---|
| Language | Python 3.12+ | Type-safe, async-capable |
| Web Framework | FastAPI 0.115+ | API and web console serving |
| ASGI Server | Uvicorn 0.30+ | Production async server |
| Data Validation | Pydantic 2.7+ | 32 typed models, OpenAPI schema |
| CLI | Typer 0.12+ | Command-line operator surface |
| HTTP Client | httpx 0.27+ | Provider API calls, SSE streaming |
| ORM | SQLAlchemy 2.0+ | 9 persistence models |
| Migrations | Alembic 1.13+ | 5 database migrations |
| Linting | Ruff 0.4+ | Linting + formatting (120 char, py312) |
| Testing | pytest 8.0+ / pytest-asyncio / pytest-cov | 592 tests, 94% coverage, 70% gate |
| CI | GitHub Actions | Python 3.12 + 3.13 matrix |
| Type Checking | PEP 561 py.typed markers | All packages type-checkable |
| Feature | Agent-Omega | LangGraph | AutoGen | CrewAI |
|---|---|---|---|---|
| Constitutional constraints | 6 immutable rules, pre-validation | No | No | No |
| Multi-dimensional evaluation | 7 dimensions with threshold rules | No built-in | No built-in | No built-in |
| Staged deployment | Shadow → canary → promote | No | No | No |
| Independent judge verification | 3 verification modes | No | No | No |
| Immutable audit trail | Archivist + pattern detection | No | No | No |
| Mutation budget / rate limiting | Per-window budget with constitutional enforcement | No | No | No |
| Adaptive thresholds | Feedback-driven with constitutional floors | No | No | No |
| Version lineage tracking | Full ancestry graph | No | No | No |
| Sandbox execution | Capability/resource/secret isolation | No | Limited | No |
| Self-modification governance | Primary purpose | Not designed for this | Not designed for this | Not designed for this |
Agent-Omega is not a competitor to LangGraph, AutoGen, or CrewAI. Those are agent orchestration frameworks. Agent-Omega is the governance layer that sits on top of any agent — including agents built with those frameworks — to govern how they modify themselves.
Agent-Omega's governance model applies wherever an autonomous system needs to modify itself safely:
| Domain | Application |
|---|---|
| AI Agents | Govern prompt mutations, tool additions, reasoning strategy changes |
| MLOps | Model deployment with evidence-based approval and automatic rollback |
| Autonomous Vehicles | Govern software updates to self-driving systems with staged rollout |
| Healthcare AI | Change management for diagnostic AI with audit trails for regulatory compliance |
| Financial Trading | Govern strategy modifications with risk evaluation and lineage tracking |
| Robotics | Govern control-system parameter changes with safety constraints |
| Smart Contracts | Govern upgrades to on-chain logic with constitutional constraints |
| Cybersecurity | Govern rule changes to threat-detection systems with pattern monitoring |
The common pattern: any system where (a) autonomous modification is desirable for improvement, but (b) ungoverned modification is dangerous.
-
Alignment by architecture — The system is structurally incapable of approving mutations that violate its constitutional constraints. Safety is not a behavior to be trained; it is a structural property.
-
Evaluation-decision separation — The evaluation engine scores candidates; the decision engine applies threshold rules. These are independent stages with typed interfaces. No single score can dominate (ADR-007).
-
Evidence over authority — Acceptance requires multi-dimensional evidence across 7 dimensions. Authority tier alone is not sufficient to bypass evaluation.
-
Thin control surfaces — CLI, web console, GitHub App, OpenClaw — all are thin clients over one shared API (ADR-001). Business logic stays in services.
-
Staged deployment discipline — No change goes directly to production. Shadow → canary → promote, with rollback available at every stage (ADR-006).
-
Comprehensive auditability — Every decision is recorded with full evidence, lineage is tracked, patterns are detected. The archivist is the system's memory.
-
Bounded adaptation — Thresholds adapt from historical data, but within constitutional floors that prevent unsafe relaxation. The system self-tunes, but cannot self-corrupt.
13 ADRs in docs/adr/ govern the architecture:
| ADR | Decision |
|---|---|
| ADR-001 | Thin control surfaces over one orchestration API |
| ADR-002 | Reflective IR is the canonical mutable object |
| ADR-003 | Provider access only through adapter interfaces |
| ADR-004 | GitHub integration through a GitHub App |
| ADR-005 | Candidate execution is sandboxed |
| ADR-006 | Deployment requires shadow then canary |
| ADR-007 | Evaluation evidence is multi-dimensional |
| ADR-008 | Constitutional constraints are immutable |
| ADR-009 | Decision thresholds adapt from archivist feedback |
| ADR-010 | Mutation budget rate limiting |
See CONTRIBUTING.md. PRs should be bounded, tested, and documented.
# Development setup
pip install -e ".[test]"
make check # ruff check + format + pytest
make coverage # pytest with coverage reportMIT






