-
Notifications
You must be signed in to change notification settings - Fork 0
Configuration Overview
PrismBench uses a comprehensive configuration system based on YAML files to control every aspect of the framework's behavior. This document provides an overview of all configuration files and their purposes.
graph TB
subgraph "Configuration Files"
Agent[configs/agents/<br/>Agent Definitions]
Env[configs/environment_config.yaml<br/>Environment Setup]
Phase[configs/phase_configs.yaml<br/>MCTS Parameters]
Tree[configs/tree_configs.yaml<br/>Search Space]
Exp[configs/experiment_configs.yaml<br/>Experiment Setup]
ApiKeys[apis.key<br/>API Credentials]
end
subgraph "Services"
LLM[LLM Interface Service]
Environment[Environment Service]
Search[Search Service]
end
Agent --> LLM
Env --> Environment
Phase --> Search
Tree --> Search
Exp --> Search
ApiKeys --> LLM
| File | Purpose | Service | Description |
|---|---|---|---|
configs/agents/*.yml |
Agent Definitions | LLM Interface | Individual agent role configurations |
configs/environment_config.yaml |
Environment Setup | Environment | Available environments and their agents |
configs/phase_configs.yaml |
MCTS Parameters | Search | Phase-specific algorithm parameters |
configs/tree_configs.yaml |
Search Space | Search | Concepts and difficulties to explore |
configs/experiment_configs.yaml |
Experiment Setup | Search | Experiment configuration and phase sequences |
apis.key |
API Credentials | LLM Interface | API keys for LLM providers |
Agent configurations define the behavior, prompts, and model settings for each specialized agent.
- Define agent roles and responsibilities
- Configure model parameters (temperature, max_tokens, etc.)
- Set up prompt templates and interaction patterns
- Specify output formatting requirements
agent_name: challenge_designer
configs:
model_name: gpt-4o-mini
provider: openai
params:
temperature: 0.8
max_tokens: 5120
local: false
system_prompt: >
You are an expert computer science educator...
interaction_templates:
- name: basic
required_keys: [concepts, difficulty_level]
template: >
Generate a coding problem for: {concepts}...
output_format:
response_begin: <problem_description>
response_end: </problem_description>-
agent_name: Unique identifier for the agent - Used throughout the framework to reference this agent
-
model_name: Specific model to use (e.g., "gpt-4o-mini") -
provider: LLM provider ("openai", "anthropic", "deepseek", etc.) -
params: Model-specific parameters-
temperature: Creativity level (0.0-1.0) -
max_tokens: Maximum response length
-
-
local: Whether this is a local model
-
system_prompt: Core instructions defining the agent's role - Defines expertise, behavior patterns, and output requirements
- Critical for consistent agent performance
-
name: Template identifier -
required_keys: Input parameters needed -
template: Formatted prompt template with placeholders -
output_format: Expected response delimiters
| Agent | Purpose | Key Features |
|---|---|---|
challenge_designer |
Creates coding problems | Problem generation, difficulty scaling |
challenge_designer_advanced |
Creates diverse problems | Duplicate avoidance, variation generation |
test_generator |
Develops test cases | Comprehensive testing, edge cases |
test_validator |
Validates test quality | Test coverage analysis, quality assurance |
problem_solver |
Implements solutions | Algorithm implementation, code generation |
problem_fixer |
Debugs failing code | Error analysis, code repair |
test_error_analyzer |
Analyzes test failures | Failure categorization, root cause analysis |
solution_pattern_analyzer |
Analyzes code patterns | Pattern recognition, complexity analysis |
Defines available execution environments and their agent compositions.
- Configure available evaluation environments
- Define agent teams for different environment types
- Set environment-specific parameters
- Control resource allocation and timeouts
environment_coding_challenge:
agents:
- "challenge_designer"
- "test_generator"
- "problem_solver"
- "problem_fixer"
max_attempts: 3
timeout: 300
num_problems: 1
environment_enhanced_coding_challenge:
agents:
- "challenge_designer_advanced"
- "test_generator"
- "problem_solver"
- "problem_fixer"
- "test_validator"
- "test_error_analyzer"
max_attempts: 3
timeout: 600
num_problems: 5| Parameter | Type | Purpose | Default |
|---|---|---|---|
agents |
List[str] | Agent names for this environment | Required |
max_attempts |
int | Maximum solution attempts per problem | 3 |
timeout |
int | Request timeout in seconds | 300 |
num_problems |
int | Number of problems to generate | 1 |
- Basic four-agent workflow
- Single problem per evaluation
- Standard timeout settings
- Good for baseline evaluations
- Extended six-agent workflow
- Multiple problems per evaluation
- Additional validation and analysis
- Comprehensive evaluation mode
Contains detailed parameters for each MCTS phase, controlling algorithm behavior and convergence criteria.
- Configure MCTS algorithm parameters
- Define phase-specific objectives and strategies
- Set convergence criteria and exploration policies
- Control scoring and evaluation methods
phase_1:
phase_params:
max_depth: 5
max_iterations: 100
performance_threshold: 0.4
value_delta_threshold: 0.3
convergence_checks: 10
exploration_probability: 0.25
num_nodes_per_iteration: 5
search_params:
max_attempts: 3
discount_factor: 0.9
learning_rate: 0.9
scoring_params:
penalty_per_failure: 2
penalty_per_error: 3
penalty_per_attempt: 1
fixed_by_problem_fixer_penalty: 5
max_num_passed: 10
environment:
name: 'environment_coding_challenge'Control the core MCTS algorithm behavior:
| Parameter | Purpose | Impact |
|---|---|---|
max_depth |
Maximum tree depth | Prevents infinite expansion |
max_iterations |
Maximum algorithm iterations | Controls runtime |
performance_threshold |
Node expansion threshold | Higher = more conservative |
value_delta_threshold |
Convergence detection | Higher = easier convergence |
convergence_checks |
Stable iterations needed | Higher = more confidence |
exploration_probability |
Random exploration rate | Higher = more exploration |
num_nodes_per_iteration |
Concurrent evaluations | Higher = faster but more resources |
Control the learning and value updating:
| Parameter | Purpose | Range |
|---|---|---|
max_attempts |
Solution attempts per problem | 1-10 |
discount_factor |
Future reward discounting | 0.0-1.0 |
learning_rate |
New vs old value weighting | 0.0-1.0 |
Define how performance is calculated:
| Parameter | Purpose | Impact |
|---|---|---|
penalty_per_failure |
Test failure penalty | Higher = harsher scoring |
penalty_per_error |
Runtime error penalty | Higher = error intolerance |
penalty_per_attempt |
Multiple attempt penalty | Higher = efficiency emphasis |
fixed_by_problem_fixer_penalty |
Fixing needed penalty | Higher = correctness emphasis |
max_num_passed |
Maximum tests contributing to score | Caps score inflation |
- Lower performance threshold for broader exploration
- Higher convergence checks for thorough evaluation
- Standard environment for baseline assessment
- Higher challenge threshold for difficulty identification
- UCB exploration weight for balanced search
- Focus on identifying problematic areas
- Enhanced environment for detailed analysis
- Variation generation for thorough testing
- Node selection threshold for quality focus
Defines the search space dimensions and available exploration targets.
- Define available CS concepts for testing
- Set difficulty levels and progression
- Control search space size and scope
- Enable domain-specific customization
tree_configs:
concepts:
- loops
- conditionals
- functions
- data_structures
- algorithms
- error_handling
- recursion
- sorting
- searching
- dynamic_programming
difficulties:
- very easy
- easy
- medium
- hard
- very hardDefine the CS topics to evaluate:
- Basic concepts: loops, conditionals, functions
- Data structures: arrays, lists, trees, graphs
- Algorithms: sorting, searching, graph algorithms
- Advanced topics: dynamic programming, recursion
Progressive difficulty levels:
- Very Easy: Basic syntax and simple operations
- Easy: Common patterns and straightforward logic
- Medium: Multiple concepts and edge cases
- Hard: Complex algorithms and optimization
- Very Hard: Advanced techniques and corner cases
concepts:
- machine_learning
- neural_networks
- computer_vision
- natural_language_processingdifficulties:
- beginner
- intermediate
- advanced
- expert
- research_levelDefines experiment parameters and phase execution sequences.
- Configure experiment metadata
- Define phase execution order
- Set up experiment variations
- Control experimental workflow
name: "default"
description: "Default experiment configuration"
phase_sequences:
- phase_1
- phase_2
- phase_3-
name: Experiment identifier -
description: Human-readable description -
version: Experiment version for tracking
- Order of phase execution
- Flexible sequencing for different objectives
- Support for custom phase combinations
name: "capability_only"
phase_sequences:
- phase_1name: "challenge_focus"
phase_sequences:
- phase_1
- phase_2Stores authentication credentials for LLM providers.
- Secure API key storage
- Multi-provider authentication
- Local model configuration
- Environment isolation
OPENAI_API_KEY = sk-your-openai-key-here
ANTHROPIC_API_KEY = your-anthropic-key-here
DEEPSEEK_API_KEY = your-deepseek-key-here
CHATLAMMA_API_KEY = your-chatlamma-key-here
LOCAL = your-local-api-key-or-token| Provider | Key Format | Usage |
|---|---|---|
| OpenAI | sk-... |
GPT models |
| Anthropic | ant-... |
Claude models |
| DeepSeek | Custom | DeepSeek models |
| Local | Custom | ollama/LMstudio |
The framework includes built-in validation for:
- Required configuration fields
- Agent existence verification
- Model availability checking
Recommended checks:
- API key validity
- Model access permissions
- Resource allocation limits
- Network connectivity
Next Steps:
- 🤖 Agent Configurations - Detailed agent configuration guide
- 🌍 Environment Configurations - Environment setup details
- 🔍 Phase Configurations - MCTS parameter tuning
- 🌳 Tree Configurations - Search space configuration
- 🤖 Agent System - Understanding agent configuration
- 🌍 Environment System - Environment configuration details
- 🧩 Custom Agents - Creating custom agent configurations
- 🌐 Custom Environments - Building custom environment configs
- ⚡ Quick Start - Basic configuration setup
- 🏗️ Architecture Overview - System design and components
- 🆘 Troubleshooting - Configuration-related issues
- 🔍 Custom MCTS Phases - Phase configuration parameters
- 🔗 Extension Combinations - Configuring complex setups
- 🔧 Extending PrismBench - Framework extension configuration
🧠 MCTS System
- 🔍 MCTS Algorithm
- 🔍 Core MCTS Process
- 🔍 Key Components
- 🔍 PrismBench's Three-Phase MCTS
- 🌳 Tree Structure
- 🌳 Node Structure
🤖 Agent System
🌍 Environment System
- 🌍 Environment Overview
- 🏗️ Environment Types
- 🔧 Environment Registry
- 🔧 Agent Integration
- 🔧 Environment Configuration
📋 Main Configuration
- ⚙️ Configuration Overview
- 🔗 Agent Configurations
- 🔗 Environment Configurations
- 🔗 Phase Configurations
- 🌳 Tree Configurations
🔧 Extension
- 🚀 Extending PrismBench
- 🤖 Custom Agents
- 🌍 Custom Environments
- 🔍 Custom MCTS Phases
- 🔗 Extension Combinations
- 💡 Basic Examples (Coming Soon)
- 🏗️ Advanced Examples (Coming Soon)
- 📚 Step-by-Step Tutorials (Coming Soon)