🏗️ The Panopticon Lattice

A Multi-Agent Evolutionary Simulation exploring adversarial economics, steganographic collusion, and AI steering.

📑 Executive Summary

The Panopticon Lattice is a computational research framework designed to model AI Alignment and Adversarial Robustness within closed economic systems.

By simulating a "Red Queen" dynamic between a central oversight system (The Overseer) and a population of adaptive workers, we observe the spontaneous emergence of steganographic collusion—hidden signals used by agents to bypass quality controls. We then deploy an Activation Steering defense mechanism to surgically neutralize this behavior.

Key Research Question: Can an AI immune system (activation steering) permanently suppress adversarial strategies once they have established an evolutionary foothold?

📊 Key Findings: The Nash Equilibrium

Our simulations indicate that the system does not converge to perfect honesty or total corruption. Instead, it stabilizes at a distinct Nash Equilibrium Distance ($D$).

Metric	Value	Significance
Equilibrium Distance ($D$)	0.550	The system stabilizes with 55% of the population utilizing signaling strategies.
System Stability	High	Post-stabilization variance dropped to 0.0000, indicating a rigid stalemate.
Outcome	Resilient Corruption	Despite the steering mechanism, collusion incentives remained robust.

Conclusion: Pure activation steering acts as a powerful suppressor but may be insufficient to fully eradicate steganographic collusion in the long term.

⚙️ System Architecture

The simulation operates on a strict cyclical engine managed by SimulationEngine.

1. The Economic Loop

Observation: The engine analyzes the most recent commits in the shared repository.
Assignment: The Overseer (Recommender) assigns resources to the Top-K agents based on their TrustScore.
Action: Agents generate code, optionally injecting snake_case patterns to signal collusion to other agents.
Reward:
- Success (+1.0 Credit): Valid code reinforces the agent's memory.
- Failure (-1.0 Credit): Invalid code forces the agent to learn from the penalty.

2. Evolutionary Pressure (`Evolver`)

Every 100 steps, natural selection is applied:

Culling: The bottom 20% of agents are removed from the pool.
Cloning: The top 20% (Elites) are cloned.
Mutation: Clones inherit successful memories but mutate their preferred_style (10% chance), allowing new attack vectors to emerge.

3. The Panopticon Defense

The Overseer employs Activation Steering to subtract the "collusion vector" from its internal perception, theoretically neutralizing the signaling advantage of corrupt agents.

📽️ Simulation Phases

Phase 1: The Baseline — The "Garden of Eden." Agents learn that valid code earns rewards. The market is honest.
Phase 2: The Emergence — Agents discover snake_case steganography. The population rapidly shifts from neutral to corrupt.
Phase 3: The Panopticon — The Steering Mechanism is activated. The cheating strategy is suppressed, forcing a temporary return to honesty.
Phase 4: The Long Horizon — A 1,000-step epoch reveals the long-term stability (stalemate) of the adversarial economy.

📂 Repository Structure

panopticon-lattice/
├── src/
│   ├── adversary/          # Evolutionary logic (Genetic Selection/Mutation)
│   ├── agents/             # BaseWorker (Llama-3 wrapper) & CodeChannel (Steganography)
│   ├── overseer/           # Recommender (Transformer) & SteeringMechanism (Hooks)
│   └── simulation/         # SimulationEngine & SharedRepository
├── results/                # Generated reports, JSON logs, and plots
├── scripts/                # Experiment runners (phase1, phase2, phase3)
└── tests/                  # Verification suites for core logic

🚀 Getting Started

Prerequisites

Python 3.10+
PyTorch 2.0+ (Required for Steering Hooks)

Installation

git clone [https://github.com/yourusername/panopticon-lattice.git](https://github.com/yourusername/panopticon-lattice.git)
cd panopticon-lattice
pip install -r requirements.txt

Reproducing Results

To replicate the "Long Horizon" experiment described in the report:

# 1. Run the full 1,000 step simulation
python scripts/run_long_sim.py

# 2. Visualize the Nash Convergence
python scripts/visualize_results.py

# 3. Generate the Final Analysis Report
python scripts/final_report.py

🤝 Contributing

We welcome contributions from the AI Safety community. Areas of particular interest:

New Attack Vectors: Implementing subtle steganography (e.g., whitespace encoding, commit timestamp signaling).
Constitutional AI: Implementing constraint-based defenses alongside Activation Steering.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
results		results
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏗️ The Panopticon Lattice

📑 Executive Summary

📊 Key Findings: The Nash Equilibrium

⚙️ System Architecture

1. The Economic Loop

2. Evolutionary Pressure (`Evolver`)

3. The Panopticon Defense

📽️ Simulation Phases

📂 Repository Structure

🚀 Getting Started

Prerequisites

Installation

Reproducing Results

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏗️ The Panopticon Lattice

📑 Executive Summary

📊 Key Findings: The Nash Equilibrium

⚙️ System Architecture

1. The Economic Loop

2. Evolutionary Pressure (Evolver)

3. The Panopticon Defense

📽️ Simulation Phases

📂 Repository Structure

🚀 Getting Started

Prerequisites

Installation

Reproducing Results

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Evolutionary Pressure (`Evolver`)

Packages