Skip to content

IntelligentDDS/AgentChaos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgentChaos

Chaos Engineering for Agent Systems via Programmatic Fault Injection

Python 3.12 License: MIT PRs Welcome


AgentChaos is a chaos engineering framework that evaluates the robustness of LLM-based agent systems through controlled, runtime, non-intrusive fault injection at the LLM API layer.

All agent systems access LLMs through the same HTTP interface. AgentChaos exploits this shared layer by installing a fault injection wrapper on the HTTP client at runtime — no source code modification required.

  • 🎯 Non-Intrusive — Injects faults at the HTTP transport layer; works with any agent system without code changes
  • 🔬 Systematic — 65 fault configurations derived from a principled taxonomy covering crash, omission, and value faults
  • Runtime — Faults are injected into live running systems, capturing real dynamic behaviors (retries, early termination, error propagation)
  • 📊 Reproducible — Deterministic modification functions with configurable injection strategies and trigger verification

Overview

LLM APIs in production can return server errors, truncated responses, or corrupted content. When an agent system issues multiple LLM API calls per task, any such fault can propagate through downstream agents and cause task failure.

AgentChaos addresses this by:

  1. Defining a fault taxonomy adapted from classical distributed systems fault classification, covering 6 fault types × 2 target fields × 4 injection strategies + position and compound experiments = 65 fault configurations.
  2. Injecting faults at the HTTP layer by patching the HTTP client at runtime, intercepting and modifying LLM API responses according to the configured policy.
  3. Verifying trigger status by checking execution traces after task completion and filtering untriggered tasks from evaluation.

AgentChaos Framework Overview

AgentChaos Framework Overview


Fault Taxonomy

We enumerate all fault types by applying each classical fault category to each LLM API response field.

Category Fault Type Content Tool Call Real-World Scenario
Crash Error Server overload, HTTP 5xx, rate limiting
Timeout Network congestion, backend delay, API latency
Omission Empty Safety filter, content policy rejection
Truncate Token limit, TCP interruption, incomplete completion
Value Corrupt Encoding error, garbled characters
Schema Parsing error, schema mismatch

Injection Strategies

Strategy Description
Single Inject once at the first matching LLM call, then stop
Persistent Inject at every matching LLM call throughout the entire task
Intermittent Inject at each matching call independently with probability 0.3
Burst Inject at the first 3 consecutive matching calls, then stop

Compound Scenarios

Scenario Description
API degradation Delay then return error response
Content filter Remove tool calls and replace content with filter message
Max tokens Truncate content and set finish_reason to length
Proxy HTML Replace content with an HTML error page
Stale cache Replay previous response on the next call
Stale data Replace tool call arguments with wrong values
Wrong entity Replace tool call arguments with ambiguous values
Slow response Add delay with no content change

Installation

  • Python 3.12
  • uv (recommended) or pip

Steps

# Clone the repository
git clone https://github.com/YOUR_USERNAME/AgentChaos.git
cd AgentChaos

# Install dependencies with uv
uv sync

# Or with pip
pip install -e .

Configuration

Create scripts/.env with your LLM API credentials:

MODEL_PROVIDER="openai"
OPENAI_MODEL="gpt-4o"
OPENAI_BASE_URL="https://api.openai.com/v1"
OPENAI_API_KEY="sk-..."

Quick Start

1. Prepare Dataset

cd scripts

# Prepare a single dataset
python prepare_dataset.py --dataset_name HumanEval

# Prepare all supported datasets
for ds in MATH MMLU-Pro HumanEval "HumanEval+" MBPP "MBPP+"; do
    python prepare_dataset.py --dataset_name "$ds"
done

2. Run Agent Systems (Without Fault Injection)

cd scripts
python run_all_method_dataset.py \
    --methods autogen mad mapcoder evomac \
    --datasets HumanEval MBPP MATH MMLU-Pro

3. Run with Fault Injection

cd scripts
python run_all_method_dataset.py \
    --methods autogen mad mapcoder evomac \
    --datasets HumanEval MBPP MATH MMLU-Pro \
    --fault_inject

4. Evaluate Results

cd scripts

# Run evaluation
python run_all_eval.py --workers 50

# Extract raw results into CSV
python all_extract.py --force

5. Generate Figures and Tables

cd scripts

# RQ1: Overall robustness
python all_RQ1.py

# RQ2: Fault configuration impact
python all_RQ2.py

# RQ3: Fault diagnosis
python all_RQ3.py

Supported Agent Systems

All systems are reimplemented on Google ADK with unified tool interfaces, preserving each system's original interaction logic.

System Pattern Agents Tools Script
AutoGen Conversation 2 2 run_autogen.py
MAD Multi-agent debate 4 2 run_mad.py
MapCoder Multi-stage pipeline 5 1 run_mapcoder.py
EvoMAC Evolutionary decomposition 4 1 run_evomac.py
Mini-SE Single-agent with tools 1 4 run_mini_se.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages