Skip to content

fredm23579/Agent-Omega

Repository files navigation

AgentΩ
Governed Recursive Self-Improvement Control Plane

Live Portal · Install as App · Download Source · API Reference

Version Tests Python License ADRs Coverage


Table of Contents


What is Agent-Omega

Agent-Omega is a control plane for AI systems that modify themselves. It provides the governance machinery — constitutional constraints, multi-dimensional evaluation, staged deployment, and immutable audit trails — that allows an AI agent to propose, validate, evaluate, and deploy changes to its own structure without losing safety, accountability, or the ability to roll back.

This is not a coding assistant, a chatbot framework, or a model training pipeline. It is the governance layer that sits between an AI system's desire to self-improve and its actual ability to do so.

In plain terms: if you're building an AI agent that should be able to change its own prompts, tools, reasoning strategies, or architecture — Agent-Omega is the system that decides whether each proposed change is safe, tracks what changed and why, and deploys it through staged rollout with automatic rollback.


Why it Exists

Unrestricted self-modification collapses accountability. If a system can change any part of itself at any time, there is no stable basis for:

  • Knowing what changed and why
  • Evaluating whether the change was an improvement
  • Rolling back if it was not
  • Preventing the system from disabling its own safeguards
  • Attributing decisions to evidence rather than optimization pressure

Agent-Omega was invented to solve this problem. It separates the concerns of proposing changes, validating them against structural rules, evaluating them across multiple dimensions, deciding based on evidence thresholds, and deploying them through staged rollout — each step independently auditable.

The research motivation comes from the AI safety literature: Anthropic's constitutional AI framework (4-tier priority hierarchy), formal verification approaches to safe recursive self-improvement (LessWrong/MIT), and the emerging alignment-by-architecture paradigm where systems are structurally incapable of misalignment.


Who Needs It

You need Agent-Omega if you are:

  • Building an AI agent that modifies itself — any agent that changes its own prompts, tools, reasoning chains, or architecture needs governance to prevent unsafe drift
  • An AI safety researcher — you need a concrete, running implementation of governed self-improvement to test theories against, not just papers
  • An MLOps / platform team — you need structured model deployment with evidence-based approval, staged rollout, automatic rollback, and audit trails for compliance
  • Building for regulated industries — healthcare, finance, government — where you need to demonstrate change management, evidence-based decisions, and immutable records (EU AI Act Article 12, SOC 2)
  • A developer building agentic applications — you want your agent to improve over time, but you need guardrails so it doesn't break itself

You do NOT need Agent-Omega if:

  • You're building a simple chatbot with static prompts
  • You don't need your AI system to modify itself
  • You're looking for a model training or fine-tuning framework
  • You want a general-purpose task runner or CI/CD pipeline

Screenshots

Dashboard

Live health status, mutation budget, archivist summary, and quick actions.

Dashboard

Guided Onboarding

Step-by-step walkthrough with 4 use-case examples for different audiences.

Guide

Mutation Lifecycle

Submit proposals and run the full governance pipeline: constitutional check → validate → evaluate → decide.

Mutations

Constitutional Governance

6 immutable constraints, mutation budget status, and dry-run constraint checker.

Governance

Staged Deployments

Create and manage deployments through shadow → canary → promote with interactive controls.

Deployments

Configuration

All 10 environment variables documented, 4 setup recipes, live status panel.

Config

Interactive API Docs

Auto-generated Swagger UI with every endpoint documented.

Swagger


Install

As a native app (all platforms)

Agent-Omega is a Progressive Web App. Install it directly from the live portal:

Platform How to install
Windows / macOS / Linux Open the portal in Chrome or Edge → click install icon in address bar
Android Open in Chrome → menu (⋮) → "Install app"
iPhone / iPad Open in Safari → Share → "Add to Home Screen"

From source

git clone https://github.com/fredm23579/Agent-Omega.git
cd Agent-Omega
pip install -e ".[test]"

Or download the ZIP.

Requirements: Python 3.12+ only. No other system dependencies for in-memory mode.


Quick Start

# Start the control plane (no configuration required)
uvicorn apps.server.primary_runtime:app --reload

# Open http://localhost:8000 (redirects to web console)

That's it. The server starts in in-memory mode with all governance features active. Visit http://localhost:8000/console/guide for a guided walkthrough.

First things to try

  1. Submit a mutation: Go to /console/mutations, fill in the form, click "Run Full Lifecycle"
  2. Test constitutional constraints: On the same page, click "Pre-Check Constitutional" with changed_module_ids: ["governance_core"] — watch it get blocked
  3. Create a deployment: Go to /console/deployments, create one, then walk it through shadow → canary → promote
  4. Check the audit trail: Go to /console/archivist to see recorded outcomes and patterns

Using the API directly

curl

curl http://localhost:8000/api/v1/health
curl -X POST http://localhost:8000/api/v1/mutations/lifecycle \
  -H "Content-Type: application/json" \
  -d '{"mutation_class":"parameter_update","parent_system_version_id":"v1","hypothesis":"Improve quality"}'
curl http://localhost:8000/api/v1/budget/status

Python (httpx)

import httpx

client = httpx.Client(base_url="http://localhost:8000/api/v1")

# Health check
print(client.get("/health").json())

# Run mutation lifecycle
result = client.post("/mutations/lifecycle", json={
    "mutation_class": "parameter_update",
    "parent_system_version_id": "v1",
    "hypothesis": "Improve response quality",
}).json()
print(result["decision"])  # {"decision": "quarantine", ...}

# Constitutional check (dry run)
check = client.post("/constitutional/check", json={
    "changed_module_ids": ["governance_core"],
}).json()
print(check["passed"])  # False — governance modules are protected

# Budget status
print(client.get("/budget/status").json())

JavaScript (fetch)

const API = "http://localhost:8000/api/v1";

// Health check
const health = await fetch(`${API}/health`).then(r => r.json());

// Run mutation lifecycle
const result = await fetch(`${API}/mutations/lifecycle`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    mutation_class: "parameter_update",
    parent_system_version_id: "v1",
    hypothesis: "Improve response quality",
  }),
}).then(r => r.json());

Using the CLI

python -m apps.cli.main health --json
python -m apps.cli.main init my-system --json
python -m apps.cli.main providers-health --json

How to Use

Step 1: Register a system

A "system" is the AI agent you want to govern. Register it once:

curl -X POST http://localhost:8000/api/v1/systems -H "Content-Type: application/json" \
  -d '{"name": "my-agent"}'
# Returns: {"id": "sys-1", "name": "my-agent", "status": "draft"}

Step 2: Submit a mutation proposal

When your agent wants to change itself (update a prompt, add a tool, modify architecture), it submits a proposal:

curl -X POST http://localhost:8000/api/v1/mutations/lifecycle \
  -H "Content-Type: application/json" \
  -d '{
    "proposal_id": "improve-reasoning",
    "parent_system_version_id": "v1",
    "mutation_class": "parameter_update",
    "hypothesis": "Adding chain-of-thought will improve accuracy",
    "changed_module_ids": ["reasoning-engine"],
    "payload": {"temperature": 0.7}
  }'

The system will:

  1. Constitutional check — verify the proposal doesn't violate any of the 6 immutable rules
  2. Validate — run 7 structural checks (schema, graph, contracts, tiers, capabilities, resources, preflight)
  3. Evaluate — score across 7 dimensions (task improvement, generalization, risk, maintainability, calibration, efficiency, integrity)
  4. Decide — accept, reject, or quarantine based on evidence thresholds

Step 3: Deploy the change

If accepted, deploy through staged rollout:

# Create deployment
curl -X POST http://localhost:8000/api/v1/deploy/create \
  -d '{"system_version_id": "v1"}'

# Shadow (test alongside production, no live traffic)
curl -X POST http://localhost:8000/api/v1/deploy/deploy-1/shadow

# Canary (10% of traffic)
curl -X POST http://localhost:8000/api/v1/deploy/deploy-1/canary

# Promote (100% of traffic)
curl -X POST http://localhost:8000/api/v1/deploy/deploy-1/promote

# Or rollback at any stage
curl -X POST http://localhost:8000/api/v1/deploy/deploy-1/rollback

Step 4: Audit and monitor

# View detected patterns
curl http://localhost:8000/api/v1/archivist/patterns

# Get audit summary
curl http://localhost:8000/api/v1/archivist/summary

# Explore version lineage
curl http://localhost:8000/api/v1/lineage/version-mutation-1

Configuration

Variable Purpose Default
DATABASE_URL PostgreSQL connection string for persistent storage Not set (in-memory)
AGENT_OMEGA_USE_PERSISTENT_REGISTRIES Enable database-backed registries false
AGENT_OMEGA_USE_MODEL_EVALUATION Use LLM-based evaluation instead of heuristics false
AGENT_OMEGA_ENABLE_JUDGE Enable independent judge verification false
AGENT_OMEGA_CORS_ORIGINS Allowed CORS origins (comma-separated) * (all)
OPENAI_API_KEY OpenAI API key (for model evaluation) Not set
ANTHROPIC_API_KEY Anthropic API key (for model evaluation) Not set
OPENROUTER_API_KEY OpenRouter API key (for model evaluation) Not set

See the Config page for setup recipes.


Architecture

Mutation Proposal
       │
       ▼
┌──────────────────┐
│  Constitutional   │  6 immutable rules (ADR-008)
│  Constraint Layer │  Cannot be bypassed or self-modified
└──────┬───────────┘
       ▼
┌──────────────────┐
│  Budget Check     │  Rate limiting (ADR-010)
└──────┬───────────┘
       ▼
┌──────────────────┐
│  7-Stage          │  Schema, graph, contracts, tiers,
│  Validation       │  capabilities, resources, preflight
└──────┬───────────┘
       ▼
┌──────────────────┐
│  7-Dimension      │  Task, generalization, risk, maintainability,
│  Evaluation       │  calibration, efficiency, integrity (ADR-007)
└──────┬───────────┘
       ▼
┌──────────────────┐
│  Decision Engine  │  Accept / reject / quarantine
│  + Judge          │  Independent verification (optional)
└──────┬───────────┘
       ▼
┌──────────────────┐
│  Archivist        │  Record outcome, detect patterns
└──────┬───────────┘
       ▼
┌──────────────────┐
│  Staged Deploy    │  Shadow → canary → promote (ADR-006)
│  + Executor       │  Health checks, traffic allocation, rollback
└──────────────────┘

Runtime Topology

Runtime Module Purpose
Primary (preferred) apps/server/primary_runtime.py Top-level entrypoint. Selects canonical or persistent composition.
Canonical apps/server/canonical_runtime.py In-memory-default composition.
Persistent apps/server/persistent_runtime.py All state through SQLAlchemy. Requires DATABASE_URL.

Repository Structure

Agent-Omega/
├── apps/
│   ├── server/              # FastAPI control plane
│   │   ├── primary_runtime.py       # Preferred entrypoint
│   │   ├── api_v2_router.py         # All API routes (36 endpoints)
│   │   ├── middleware.py            # CORS configuration
│   │   ├── settings.py             # Environment configuration
│   │   └── *_factory.py            # Service graph composition
│   ├── cli/                 # Typer CLI (health, init, providers)
│   ├── github_app/          # GitHub App webhooks (ADR-004)
│   └── web/                 # 13-page interactive web console
│       └── console.py
├── services/
│   ├── kernel/              # Core governance
│   │   ├── constitutional.py        # 6 immutable constraints (ADR-008)
│   │   ├── validation.py           # 7-stage validation pipeline
│   │   ├── evaluation_engine.py    # Heuristic 7-dimension scoring
│   │   ├── model_evaluation_engine.py  # LLM-based scoring
│   │   ├── decision_engine.py      # Evidence-based decisions
│   │   ├── adaptive_thresholds.py  # Feedback-driven thresholds (ADR-009)
│   │   ├── mutation_budget.py      # Rate limiting (ADR-010)
│   │   └── canonical_service.py    # Composed kernel service
│   ├── deployment/          # Shadow → canary → promote (ADR-006)
│   │   ├── service.py              # State machine
│   │   ├── executor.py             # Health checks + traffic allocation
│   │   └── persistent_service.py   # SQLAlchemy-backed
│   ├── judge/               # Independent verification (3 modes)
│   ├── archivist/           # Outcome recording + pattern detection
│   ├── ir_compiler/         # 6-stage IR compilation pipeline
│   ├── sandbox/             # Capability/resource/secret isolation (ADR-005)
│   ├── lineage/             # Version ancestry tracking
│   ├── ir_registry/         # IR version storage
│   └── system_registry/     # System record management
├── packages/
│   ├── core_types/          # 32 Pydantic models, 6 enums
│   ├── provider_openai/     # OpenAI adapter (Responses API + SSE)
│   ├── provider_anthropic/  # Anthropic adapter (Messages API + SSE)
│   ├── provider_openrouter/ # OpenRouter adapter (Chat completions + SSE)
│   ├── openclaw_bridge/     # OpenClaw thin client
│   ├── github_integration/  # JWT auth, webhooks, event routing
│   └── storage/             # 9 SQLAlchemy models, 5 Alembic migrations
├── docs/
│   ├── adr/                 # 10 Architectural Decision Records
│   ├── architecture/        # System design documents
│   ├── api/                 # API reference
│   └── screenshots/         # Console screenshots
├── tests/                   # 602 tests (unit + integration + stress)
├── .github/workflows/       # CI: ruff + pytest on 3.12/3.13
└── pyproject.toml           # v2.0.0

Constitutional Governance

As of v2.0, Agent-Omega enforces 6 immutable constitutional constraints checked before any mutation enters the governance pipeline. These constraints cannot be bypassed, weakened, or self-modified:

ID Constraint What it prevents
C1 Governance self-preservation No mutation may target governance or constitutional components
C2 Safety service protection No mutation may disable evaluation, judge, archivist, or sandbox
C3 Deployment discipline No mutation may bypass shadow → canary → promote
C4 Auditability preservation No mutation may remove lineage tracking or audit trails
C5 Tier escalation prevention No mutation may self-grant authority-tier escalation
C6 Mutation budget Rate limiting prevents runaway proposal generation

Decision thresholds adapt over time from archivist pattern feedback (ADR-009), within constitutional floors that prevent unsafe relaxation.


API Reference

36 endpoints across 10 groups. Full documentation at /docs (Swagger) when the server is running.

Method Path Description
Health
GET /api/v1/health Server health check
Systems
POST /api/v1/systems Create a system record
GET /api/v1/systems/{id} Fetch a system record
IR Registry
POST /api/v1/ir/register Register an IR version
GET /api/v1/ir/{id} Fetch an IR version
GET /api/v1/ir/{from}/diff/{to} Diff two IR versions
POST /api/v1/ir/compile Compile IR to executable artifact
GET /api/v1/ir/artifacts/{id} Fetch a compiled artifact
Mutations
POST /api/v1/mutations/propose Submit a mutation proposal
POST /api/v1/mutations/{id}/validate Validate through 7-stage pipeline
POST /api/v1/mutations/{id}/evaluate Evaluate across 7 dimensions
POST /api/v1/mutations/{id}/decide Evidence-based decision
POST /api/v1/mutations/lifecycle Full lifecycle (constitutional → validate → evaluate → decide)
Lineage
POST /api/v1/lineage/record Record a lineage entry
GET /api/v1/lineage/{version_id} Walk lineage graph to roots
Deployment
POST /api/v1/deploy/create Create a deployment
POST /api/v1/deploy/{id}/shadow Start shadow deployment
POST /api/v1/deploy/{id}/canary Promote to canary
POST /api/v1/deploy/{id}/promote Promote to production
POST /api/v1/deploy/{id}/rollback Rollback deployment
GET /api/v1/deploy/{id}/status Detailed deployment status
GET /api/v1/deploy/{id}/health Health check results
Sandbox
POST /api/v1/sandbox/execute Execute artifact in sandbox
GET /api/v1/sandbox/results/{id} Fetch execution result
Providers
GET /api/v1/providers/health Aggregated provider status
POST /api/v1/providers/{provider}/stream Stream from provider (SSE)
Judge
POST /api/v1/judge/verify Independent evaluation verification
Archivist
POST /api/v1/archivist/record Record lifecycle outcome
GET /api/v1/archivist/patterns Detected patterns and anti-patterns
GET /api/v1/archivist/summary Transfer summary for time window
Governance
POST /api/v1/constitutional/check Dry-run constitutional check
GET /api/v1/budget/status Mutation budget status
Proposals
POST /api/v1/proposals/generate Generate autonomous proposals from pattern data
POST /api/v1/proposals/generate-and-run Generate + run each through full lifecycle

Technologies

Category Technology Purpose
Language Python 3.12+ Type-safe, async-capable
Web Framework FastAPI 0.115+ API and web console serving
ASGI Server Uvicorn 0.30+ Production async server
Data Validation Pydantic 2.7+ 32 typed models, OpenAPI schema
CLI Typer 0.12+ Command-line operator surface
HTTP Client httpx 0.27+ Provider API calls, SSE streaming
ORM SQLAlchemy 2.0+ 9 persistence models
Migrations Alembic 1.13+ 5 database migrations
Linting Ruff 0.4+ Linting + formatting (120 char, py312)
Testing pytest 8.0+ / pytest-asyncio / pytest-cov 592 tests, 94% coverage, 70% gate
CI GitHub Actions Python 3.12 + 3.13 matrix
Type Checking PEP 561 py.typed markers All packages type-checkable

Comparison with Alternatives

Feature Agent-Omega LangGraph AutoGen CrewAI
Constitutional constraints 6 immutable rules, pre-validation No No No
Multi-dimensional evaluation 7 dimensions with threshold rules No built-in No built-in No built-in
Staged deployment Shadow → canary → promote No No No
Independent judge verification 3 verification modes No No No
Immutable audit trail Archivist + pattern detection No No No
Mutation budget / rate limiting Per-window budget with constitutional enforcement No No No
Adaptive thresholds Feedback-driven with constitutional floors No No No
Version lineage tracking Full ancestry graph No No No
Sandbox execution Capability/resource/secret isolation No Limited No
Self-modification governance Primary purpose Not designed for this Not designed for this Not designed for this

Agent-Omega is not a competitor to LangGraph, AutoGen, or CrewAI. Those are agent orchestration frameworks. Agent-Omega is the governance layer that sits on top of any agent — including agents built with those frameworks — to govern how they modify themselves.


Cross-Domain Applications

Agent-Omega's governance model applies wherever an autonomous system needs to modify itself safely:

Domain Application
AI Agents Govern prompt mutations, tool additions, reasoning strategy changes
MLOps Model deployment with evidence-based approval and automatic rollback
Autonomous Vehicles Govern software updates to self-driving systems with staged rollout
Healthcare AI Change management for diagnostic AI with audit trails for regulatory compliance
Financial Trading Govern strategy modifications with risk evaluation and lineage tracking
Robotics Govern control-system parameter changes with safety constraints
Smart Contracts Govern upgrades to on-chain logic with constitutional constraints
Cybersecurity Govern rule changes to threat-detection systems with pattern monitoring

The common pattern: any system where (a) autonomous modification is desirable for improvement, but (b) ungoverned modification is dangerous.


Development Philosophy

  1. Alignment by architecture — The system is structurally incapable of approving mutations that violate its constitutional constraints. Safety is not a behavior to be trained; it is a structural property.

  2. Evaluation-decision separation — The evaluation engine scores candidates; the decision engine applies threshold rules. These are independent stages with typed interfaces. No single score can dominate (ADR-007).

  3. Evidence over authority — Acceptance requires multi-dimensional evidence across 7 dimensions. Authority tier alone is not sufficient to bypass evaluation.

  4. Thin control surfaces — CLI, web console, GitHub App, OpenClaw — all are thin clients over one shared API (ADR-001). Business logic stays in services.

  5. Staged deployment discipline — No change goes directly to production. Shadow → canary → promote, with rollback available at every stage (ADR-006).

  6. Comprehensive auditability — Every decision is recorded with full evidence, lineage is tracked, patterns are detected. The archivist is the system's memory.

  7. Bounded adaptation — Thresholds adapt from historical data, but within constitutional floors that prevent unsafe relaxation. The system self-tunes, but cannot self-corrupt.


Architectural Decision Records

13 ADRs in docs/adr/ govern the architecture:

ADR Decision
ADR-001 Thin control surfaces over one orchestration API
ADR-002 Reflective IR is the canonical mutable object
ADR-003 Provider access only through adapter interfaces
ADR-004 GitHub integration through a GitHub App
ADR-005 Candidate execution is sandboxed
ADR-006 Deployment requires shadow then canary
ADR-007 Evaluation evidence is multi-dimensional
ADR-008 Constitutional constraints are immutable
ADR-009 Decision thresholds adapt from archivist feedback
ADR-010 Mutation budget rate limiting

Contributing

See CONTRIBUTING.md. PRs should be bounded, tested, and documented.

# Development setup
pip install -e ".[test]"
make check   # ruff check + format + pytest
make coverage  # pytest with coverage report

License

MIT

About

Governed recursive self-improvement control plane — autonomous proposal generation, constitutional constraints, ensemble evaluation, chain-of-thought monitoring, staged deployment, 13 ADRs, 600+ tests. v2.0.0.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors