Symphony

A Decentralized Multi-Agent Framework for Edge Devices with Beacon-Guided Task Routing and CoT Voting

Symphony is a decentralized multi-agent framework that enables intelligent agents to collaborate across heterogeneous edge devices through beacon-guided task routing and Chain-of-Thought (CoT) voting mechanisms.

Project Demo

Overview

Symphony employs a three-stage pipeline:

Planning Phase: Multiple planning agents decompose complex queries into executable sub-tasks
Execution Phase: Beacon-guided routing matches sub-tasks to specialized agents using LinUCB-based selection
Voting Phase: CoT voting aggregates multiple agent responses for robust final answers

Architecture

User Query
    │
    ▼
┌─────────────────────────────────────────┐
│  Planning Phase                         │
│  - Task decomposition (k plans)         │
│  - LinUCB plan selection                │
└─────────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────────┐
│  Execution Phase                        │
│  - Beacon broadcast for each sub-task   │
│  - Top-L agent candidate selection      │
│  - LinUCB agent selection               │
│  - Parallel CoT execution               │
└─────────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────────┐
│  Voting Phase                           │
│  - CoT voting across responses          │
│  - Final answer aggregation             │
└─────────────────────────────────────────┘
    │
    ▼
Final Result

Key Features

Decentralized Architecture: No central orchestrator required, fault-tolerant
Intelligent Task Routing: Beacon-based capability matching with LinUCB learning
Advanced Reasoning: Multi-path CoT with majority voting
Edge-Optimized: Runs on consumer-grade GPUs (RTX 3060/4090, Jetson, M-series Mac)

Directory Structure

symphony/
├── README.md                    # This file
├── requirements.txt             # Python dependencies
├── pyproject.toml               # Package configuration
│
├── core/                        # Core algorithms
│   ├── capability.py            # Capability matching
│   ├── linucb_selector.py       # LinUCB bandit selector
│   ├── routing.py               # Task routing
│   └── voting.py                # CoT voting mechanisms
│
├── agents/                      # Agent implementations
│   ├── agent.py                 # Main Agent class
│   └── user.py                  # User client
│
├── protocol/                    # Protocol definitions
│   ├── task_contract.py         # Task data structures
│   └── beacon.py                # Beacon messages
│
├── infra/                       # Infrastructure
│   └── ISEP.py                  # Service exchange protocol
│
├── models/                      # Model loaders
│   └── base_loader.py           # LLM loading utilities
│
├── symphony.py                  # Core orchestrator
├── main.py                      # Simple entry point
├── agent_register.py            # Agent registration runner
├── user_register.py             # User registration runner
│
├── experiments/                 # All experiments
│   ├── README.md                # Experiments overview
│   ├── pretrain.py              # Main experiment runner
│   ├── configs/                 # Configuration files
│   ├── scripts/                 # Shell scripts
│   ├── exp1/                    # Exp1: Efficiency & Cost
│   ├── exp2/                    # Exp2: Robustness & Recovery
│   └── exp3/                    # Exp3: System Optimization
│
├── scripts/                     # Utility scripts
│   ├── plotting/                # Visualization
│   │   ├── paper_figures/       # Paper figure generation
│   │   └── routing/             # Routing analysis plots
│   └── analysis/                # Analysis utilities
│
├── symphony-data-generator/     # Benchmark data generation
│   ├── config/data_config.yaml  # Benchmark configurations
│   ├── src/data_generator.py    # Core difficulty scoring module
│   └── src/quick_start.py       # Quick start script
│
├── docs/                        # Documentation
├── examples/                    # Example configurations
└── tests/                       # Test suite

Installation

System Requirements

Requirement	Minimum	Recommended
Python	3.9	3.10 or 3.11
RAM	8 GB	16 GB
GPU	Optional	CUDA-compatible (RTX 3060+)
OS	Linux, macOS, Windows	Linux (Ubuntu 20.04+)

Step-by-Step Setup

# 1. Clone the repository
git clone https://github.com/anonymous/symphony.git
cd symphony

# 2. Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Upgrade pip
pip install --upgrade pip

# 4. Install core dependencies
pip install -r requirements.txt

# 5. Install Symphony in development mode
pip install -e .

# 6. Verify installation
python -c "import symphony; print('Symphony installed successfully')"

Dependencies Overview

The requirements.txt includes:

Core Dependencies (required):

torch>=2.0.0 - Deep learning framework
transformers>=4.30.0 - Hugging Face model library
numpy>=1.24.0 - Numerical computing
pyyaml>=6.0 - Configuration file parsing
requests>=2.28.0 - HTTP client for API calls
pyzmq>=25.0.0 - Distributed messaging
aiohttp>=3.8.0 - Async HTTP client

Optional Dependencies (for GPU acceleration):

accelerate>=0.20.0 - Distributed training
bitsandbytes>=0.41.0 - 8-bit quantization
peft>=0.4.0 - Parameter-efficient fine-tuning

API Key Setup (Required for Real Experiments)

Symphony uses OpenRouter for LLM API access:

# Option 1: Export in terminal (temporary)
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"

# Option 2: Add to shell profile (persistent)
echo 'export OPENROUTER_API_KEY="sk-or-v1-your-key-here"' >> ~/.bashrc
source ~/.bashrc

# Option 3: Create .env file (recommended for development)
echo 'OPENROUTER_API_KEY=sk-or-v1-your-key-here' > .env

Verify API key is set:

python -c "import os; print('API Key configured' if os.getenv('OPENROUTER_API_KEY') else 'API Key NOT set')"

See docs/OPENROUTER_CONFIG_GUIDE.md for detailed API setup instructions.

Quick Start

Running a Simple Task

from symphony import SymphonyOrchestrator
from agents.agent import Agent

# Initialize orchestrator
orchestrator = SymphonyOrchestrator(
    agents=["agent1", "agent2", "agent3"],
    topL=3,
    cot_count=3
)

# Execute a task
result = orchestrator.run_task(
    task_description="Solve: What is 25 * 37?",
    requirements=["math"]
)

print(f"Result: {result['final_answer']}")

Using OpenRouter API

# Set API key
export OPENROUTER_API_KEY="sk-or-v1-..."

# Run with OpenRouter models
python experiments/pretrain.py \
  --task-pool path/to/tasks.jsonl \
  --agents "deepseek-v3" \
  --runtime-dir experiments/configs \
  --n 100

Running Experiments

All experiments are in the experiments/ directory. See experiments/README.md for detailed documentation.

Overview of Experiments

Experiment	Description	Type	Estimated Time
Exp1	Efficiency & Cost Analysis	Simulation + Real	30 min (sim) / 2-4 hrs (real)
Exp2	Robustness & Recovery	Simulation + Real	1-2 hrs
Exp3	System Optimization	Simulation	30 min
Pretrain	Main benchmark evaluation	Real	4-8 hrs per benchmark

Experiment 1: Efficiency & Cost Analysis

Goal: Compare agent selection strategies (Always-A, Static Rule, Random, LinUCB).

Simulation Mode (no API key needed):

cd experiments/exp1/sim
python sim_efficiency_cost.py --n 1000 --seed 42

# Output: Results saved to exp1_sim_results/

Real Mode (requires OpenRouter API key):

cd experiments/exp1/real
python exp1_real_openrouter.py --n 100

# Output: Results saved to exp1_real_results/

Expected Output Files:

accuracy_by_strategy.csv - Accuracy comparison
cost_by_strategy.csv - API cost comparison
selection_trace.json - Agent selection decisions

Experiment 2: Robustness & Recovery

Goal: Evaluate adaptation when agents become unavailable or degraded.

Run both simulation and real:

bash experiments/exp2/scripts/run_exp2_both.sh

# Or run separately:
python experiments/exp2/sim/exp2_sim.py --shock-type A_unavailable
python experiments/exp2/real/exp2_real.py --shock-type A_degraded

Shock Types:

A_unavailable: Agent suddenly becomes unavailable
A_degraded: Agent performance drops significantly

Expected Output Files:

recovery_curve.csv - Accuracy over time after shock
adaptation_metrics.json - Recovery time and final accuracy

Experiment 3: System Optimization

Goal: Evaluate routing optimization under latency and load variations.

bash experiments/exp3/run_exp3.sh

# Or run directly:
python experiments/exp3/sim_system_optimization.py --scenario latency_heterogeneous

Scenarios (defined in experiments/exp3/configs/scenarios.yaml):

latency_heterogeneous: Agents with different response latencies
load_burst: Dynamic load spikes
combined: Both latency and load variations

Expected Output Files:

latency_comparison.csv - Response time metrics
load_balance_metrics.csv - Task distribution across agents

Main Pretrain Experiments (Benchmark Evaluation)

Goal: Evaluate Symphony on standard benchmarks (GSM8K, BBH, Medical QA).

Run individual benchmarks:

# GSM8K (math reasoning)
bash experiments/scripts/run_gsm8k_pretrain.sh

# BBH (Big-Bench Hard)
bash experiments/scripts/run_bbh_pretrain.sh

# Balanced sampling across all tasks
bash experiments/scripts/run_balanced_pretrain.sh

# All datasets sequentially
bash experiments/scripts/run_all_datasets.sh

Run with custom parameters:

python experiments/pretrain.py \
  --task-pool data/gsm8k_full.jsonl \
  --benchmark gsm8k \
  --n 600 \
  --cold-n 200 \
  --pretrain-n 300 \
  --test-n 100 \
  --topL 3 \
  --plan-k 3 \
  --cot-count 3 \
  --agents "deepseek-v3,openai-gpt-5-nano,openai-gpt-4-1-nano" \
  --runtime-dir experiments/configs

Expected Output (saved to pretrain_results/<timestamp>/):

accuracy_summary.csv - Per-phase accuracy
ucb_trace.md - LinUCB arm selection trace
progress_state.json - Checkpoint for resumption

Benchmark Data Generation

Symphony includes a unified data generator for creating experiment-ready task pools with difficulty scoring across 5 benchmarks.

Quick Start

cd symphony-data-generator
pip install -r requirements.txt
python src/quick_start.py

Supported Benchmarks

Benchmark	Source	Tasks	Type
HumanEval	`openai_humaneval`	164	Code Generation
GSM8K	`gsm8k`	1,319	Mathematical Reasoning
BBH	`lukaemon/bbh`	2,437	Multi-hop Reasoning
AMC	`AI-MO/aimo-validation-amc`	83	Competition Math
MedicalQA	`GBaker/MedQA-USMLE-4-options`	1,273	Domain-Specific QA

Difficulty Scoring Formulas

Each benchmark uses a domain-specific difficulty scoring function:

HumanEval (Code Generation): $$d_{\text{code}} = 0.6 \cdot \frac{n_{\text{asserts}}}{\hat{a}} + 0.4 \cdot \frac{|\text{prompt}|}{\hat{p}}$$

GSM8K (Mathematical Reasoning): $$d_{\text{math}} = \frac{\text{reasoning_steps}}{\hat{s}}$$

BBH (Multi-hop Reasoning): $$d_{\text{BBH}} = c_{\text{task}} + 0.3 \cdot \frac{|\text{input}|}{\hat{i}}$$

AMC (Competition Mathematics): $$d_{\text{AMC}} = 0.7 \cdot \frac{|\text{problem}|}{\hat{p}} + 0.3 + 0.12 \cdot \mathbb{1}[\text{math_notation}]$$

Medical QA (Domain-Specific): $$d_{\text{med}} = 0.4 \cdot \bar{q} + 0.3 \cdot \bar{k} + 0.2 \cdot \bar{o} + 0.2 \cdot \mathbb{1}[\text{clinical}]$$

Where $\hat{\cdot}$ denotes 95th percentile normalizers computed from the full dataset.

Difficulty Binning

Tasks are categorized using percentile-based thresholds (P20/P80):

Easy: score ≤ P20
Hard: score ≥ P80
Medium: P20 < score < P80

Generating Task Pools

from src.data_generator import DatasetBuilder

builder = DatasetBuilder('config/data_config.yaml')

# Preprocess all benchmarks (one-time)
builder.preprocess_all_benchmarks(output_dir='data/benchmarks/full')

# Generate experiment stream
tasks = builder.build_task_stream(
    benchmarks_to_include=['humaneval', 'gsm8k'],
    difficulty_split='80:20',  # 80% easy, 20% hard
    n_total_tasks=1000,
    random_seed=2025,
)

builder.save_task_pool(tasks, 'data/exp1/task_pool.jsonl')

Reproducing Paper Results

This section provides step-by-step instructions to reproduce all results in the paper.

Step 1: Environment Setup

# Create fresh environment
python -m venv venv && source venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

# Set API key
export OPENROUTER_API_KEY="sk-or-v1-your-key"

# Verify setup
python -c "import symphony; import os; print('Ready!' if os.getenv('OPENROUTER_API_KEY') else 'Missing API key')"

Step 2: Run All Experiments

# Exp1: Efficiency & Cost Analysis (Table 2 in paper)
python experiments/exp1/real/exp1_real_openrouter.py --n 2000

# Exp2: Robustness & Recovery (Figure 4 in paper)
bash experiments/exp2/scripts/run_all_experiments.sh

# Exp3: System Optimization (Figure 5 in paper)
bash experiments/exp3/run_exp3.sh

# Main Benchmark Results (Table 1 in paper)
bash experiments/scripts/run_all_datasets.sh

Step 3: Generate Paper Figures

# Figure 3: Robustness bar charts
python scripts/plotting/paper_figures/plot_robustness_bars.py

# Figure 4: 3D robustness surface
python scripts/plotting/paper_figures/plot_robustness_3d_surface.py

# Figure 5: Gap analysis
python scripts/plotting/paper_figures/plot_gap_analysis.py

# Figure 6: Parallel coordinates
python scripts/plotting/paper_figures/plot_parallel_coordinates.py

# Routing analysis visualizations
python scripts/plotting/routing/plot_from_json.py pretrain_results/<your-result-dir>
python scripts/plotting/routing/plot_agent_donut.py pretrain_results/<your-result-dir>

Expected Results Summary

Experiment	Key Metric	Expected Range
Exp1 (Efficiency)	LinUCB vs Always-A cost reduction	15-25%
Exp2 (Robustness)	Recovery time after shock	< 50 tasks
Exp3 (Latency)	Load-balanced vs naive improvement	10-20%
GSM8K	Test accuracy (LinUCB)	75-85%
BBH	Macro-average accuracy	60-70%

Troubleshooting

Common Issues

1. ModuleNotFoundError: No module named 'symphony'

# Ensure you're in the project root and installed in dev mode
pip install -e .

2. OPENROUTER_API_KEY not set

# Check if key is exported
echo $OPENROUTER_API_KEY

# If empty, set it
export OPENROUTER_API_KEY="sk-or-v1-your-key"

3. CUDA out of memory

# Use CPU-only mode or reduce batch size
export CUDA_VISIBLE_DEVICES=""  # Force CPU

4. Connection timeout or Rate limit exceeded

# Reduce concurrent requests in config
# Edit experiments/configs/openrouter/<model>/config_*.yaml
# Add: rate_limit_delay: 1.0

5. FileNotFoundError: task-pool not found

# Ensure task data files exist
# Download from paper supplementary materials or generate:
python scripts/analysis/balanced_task_pool.py --output data/tasks.jsonl

Getting Help

Check experiments/README.md for experiment-specific issues
Check docs/OPENROUTER_CONFIG_GUIDE.md for API setup
Verify Python version: python --version (requires 3.9+)

Configuration Guide

Agent Configuration

Configs in experiments/configs/openrouter/<model>/:

debug: false
role: "agent"
node_id: "agent-openrouter-016"
base_model: "openrouter:deepseek/deepseek-chat"
capabilities: [math, reasoning, code]
max_tokens: 512
temperature: 0.2

Key Experiment Parameters

Parameter	Description	Default
`--task-pool`	Task JSONL file	Required
`--n`	Total tasks	100
`--topL`	Top-L candidates	3
`--plan-k`	Plans to generate	3
`--cot-count`	CoT paths	3
`--agents`	Agent IDs	Required

See docs/OPENROUTER_CONFIG_GUIDE.md for detailed setup.

Citation

If you use Symphony in your research, please cite:

@article{symphony2025,
  title={Symphony: A Decentralized Multi-Agent Framework for Edge Devices with Beacon-Guided Task Routing and CoT Voting},
  author={Anonymous},
  journal={arXiv preprint},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
agents		agents
core		core
docs		docs
examples		examples
experiments		experiments
infra		infra
models		models
protocol		protocol
scripts		scripts
symphony-data-generator		symphony-data-generator
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Overview.png		Overview.png
README.md		README.md
__init__.py		__init__.py
agent_register.py		agent_register.py
demo.mp4		demo.mp4
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
symphony.py		symphony.py
user_register.py		user_register.py

Folders and files

Latest commit

History

Repository files navigation

Symphony

Table of Contents

Project Demo

Overview

Architecture

Key Features

Directory Structure

Installation

System Requirements

Step-by-Step Setup

Dependencies Overview

API Key Setup (Required for Real Experiments)

Quick Start

Running a Simple Task

Using OpenRouter API

Running Experiments

Overview of Experiments

Experiment 1: Efficiency & Cost Analysis

Experiment 2: Robustness & Recovery

Experiment 3: System Optimization

Main Pretrain Experiments (Benchmark Evaluation)

Benchmark Data Generation

Quick Start

Supported Benchmarks

Difficulty Scoring Formulas

Difficulty Binning

Generating Task Pools

Reproducing Paper Results

Step 1: Environment Setup

Step 2: Run All Experiments

Step 3: Generate Paper Figures

Expected Results Summary

Troubleshooting

Common Issues

Getting Help

Configuration Guide

Agent Configuration

Key Experiment Parameters

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages