Symphony Experiments

This directory contains all experiments for the Symphony paper.

Overview

Experiment	Description	Type
Exp1	Efficiency & Cost Analysis	Simulation + Real
Exp2	Robustness & Recovery under Role Shock	Simulation + Real
Exp3	System Performance Optimization	Simulation

Directory Structure

experiments/
├── README.md                 # This file
├── pretrain.py               # Main experiment runner
├── configs/                  # Configuration files
│   ├── openrouter/           # OpenRouter model configs
│   └── *.yaml                # Basic configs
├── scripts/                  # Shell scripts
│   ├── run_all_datasets.sh
│   ├── run_gsm8k_pretrain.sh
│   ├── run_bbh_pretrain.sh
│   ├── run_balanced_pretrain.sh
│   └── run_select_only.sh
├── exp1/                     # Efficiency & Cost
│   ├── sim/
│   └── real/
├── exp2/                     # Robustness & Recovery
│   ├── sim/
│   ├── real/
│   ├── plot/
│   └── scripts/
├── exp3/                     # System Optimization
│   └── ...
└── common/                   # Shared utilities

Quick Start

Prerequisites

pip install -r requirements.txt
export OPENROUTER_API_KEY="sk-or-v1-..."

Running Experiments

# Exp1: Efficiency & Cost
python experiments/exp1/sim/sim_efficiency_cost.py --n 1000

# Exp2: Robustness
bash experiments/exp2/scripts/run_exp2_both.sh

# Exp3: System Optimization
bash experiments/exp3/run_exp3.sh

# Main pretrain experiments
bash experiments/scripts/run_gsm8k_pretrain.sh

Experiment Details

Experiment 1: Efficiency & Cost Analysis

Goal: Compare agent selection strategies on efficiency and cost.

Strategies:

Always-A: Always use strongest agent
Static Rule: Fixed rule-based selection
Random: Uniform random from candidates
LinUCB: Adaptive bandit-based selection

See exp1/README.md for details.

Experiment 2: Robustness & Recovery

Goal: Evaluate adaptation under non-stationary conditions.

Shock Types:

A_unavailable: Agent becomes unavailable
A_degraded: Agent performance degrades

See exp2/README.md for details.

Experiment 3: System Performance Optimization

Goal: Evaluate routing optimization based on latency and load.

Scenarios:

latency_heterogeneous: Different agent latencies
load_burst: Dynamic load spikes
combined: Both latency and load variations

See exp3/README.md for details.

Main Pretrain Runner

The pretrain.py script supports:

Multiple benchmarks (GSM8K, BBH, Medical QA)
Three-phase execution (cold start, pretrain, test)
LinUCB agent selection with online updates
Multi-CoT voting

Key Parameters

Parameter	Description	Default
`--task-pool`	Path to JSONL task file	Required
`--benchmark`	Benchmark name	None
`--n`	Total tasks	100
`--cold-n`	Cold start phase	10
`--pretrain-n`	Pretrain phase	70
`--test-n`	Test phase	20
`--topL`	Top-L candidates	3
`--plan-k`	Number of plans	3
`--cot-count`	CoT paths per task	3
`--agents`	Agent IDs (comma-separated)	Required
`--runtime-dir`	Config directory	configs

Example

python experiments/pretrain.py \
  --task-pool data/gsm8k_full.jsonl \
  --benchmark gsm8k \
  --n 600 \
  --cold-n 200 \
  --pretrain-n 300 \
  --test-n 100 \
  --agents "deepseek-v3,openai-gpt-5-nano" \
  --runtime-dir experiments/configs

Configuration

Agent configs are in configs/openrouter/<model-name>/:

debug: false
role: "agent"
node_id: "agent-openrouter-016"
base_model: "openrouter:deepseek/deepseek-chat"
capabilities: [math, reasoning, code]
max_tokens: 512
temperature: 0.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Symphony Experiments

Overview

Directory Structure

Quick Start

Prerequisites

Running Experiments

Experiment Details

Experiment 1: Efficiency & Cost Analysis

Experiment 2: Robustness & Recovery

Experiment 3: System Performance Optimization

Main Pretrain Runner

Key Parameters

Example

Configuration

Related Documentation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Symphony Experiments

Overview

Directory Structure

Quick Start

Prerequisites

Running Experiments

Experiment Details

Experiment 1: Efficiency & Cost Analysis

Experiment 2: Robustness & Recovery

Experiment 3: System Performance Optimization

Main Pretrain Runner

Key Parameters

Example

Configuration

Related Documentation