A modular framework for orchestrating structured debates between multiple large language models (LLMs) with specialized judge evaluation. This project implements an adversarial training approach to enhance LLM argumentative reasoning.
- Multi-Agent Architecture: Orchestrates debates between opposing LLM agents
- Structured Debate Protocol: Implements formal opening, rebuttal, and closing rounds
- Adversarial Critique System: Agents analyze and critique opposing arguments
- Evidence Self-Check Mechanism: Ensures factual accuracy and reduces source fabrication
- Multi-Dimensional Judge Framework: Seven specialized judges evaluate different aspects of argument quality
- Local-Based: Compatible with Ollama-hosted models
- Python 3.8+
- Ollama for local model hosting
- YAML for configuration files
- Required Python packages (see Environment Setup)
-
Clone this repository:
git clone https://github.com/[username]/multi-agent-llm-debate.git cd multi-agent-llm-debate -
Create and activate the conda environment:
conda env create -f debate-env.yml conda activate debate-env
-
Install Ollama following instructions at ollama.ai
-
Download required models via Ollama: It's present in the first cell code which can be edited. Selective downloads / Download All can be done.
.
├── .ipynb_checkpoints/ # Jupyter notebook checkpoints
├── prompts/ # YAML configuration files for debate prompts
│ ├── debate_prompts.yml # Core debate prompts
│ ├── judge_prompts.yml # Judge evaluation prompts
├── results/ # Debate outputs and judge evaluations
│ ├── agent_records/ # Saved debate transcripts
│ ├── judge_records/ # Evaluation results
│ ├── perfect_debate_transcripts/ # Curated debate examples, for Judgement Pipeline
├── debate-env.yml # Conda environment configuration
├── MultiLLM Debate.ipynb # Main notebook for running debates
├── OLLAMA EDA, Test Scripts.ipynb # Ollama exploration and testing scripts
PromptManagerclass loads and formats debate prompts from YAML files- Modular design allows testing different prompt strategies
- Phase-specific guidance for opening, rebuttal, and closing rounds
MultiAgentDebateclass orchestrates structured interactions- Implements preparation, critique, and rebuttal phases
- Manages context and maintains debate state
- Generates enhanced arguments based on adversarial feedback
JudgeEvaluatorclass assesses debate quality across multiple dimensions- Specialized judges for logical, factual, rhetorical, and ethical aspects
- Meta-judge synthesizes evaluations into composite assessment
Edit the YAML files in the prompts/ directory to customize:
- Debate instructions and structure
- Critique guidelines
- Evidence check parameters
- Judge evaluation criteria
Update the OllamaDebateManager.models dictionary to include new models:
self.models = {
"custom_model": "model_name:tag",
# Add more models here
}Debate results and judge evaluations are saved to:
results/agent_records/- Full debate transcriptsresults/judge_records/- Judge evaluations and scores
If you use this framework in your research, please cite:
@misc{markapudi2025socraiticcircle,
title={SocrAItic Circle: Enhancing LLM Reasoning Through Multi-Agent Debate Frameworks},
author={Markapudi, Joel},
year={2025},
institution={Northeastern University}
}
TBD
TBD