-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture Overview
This document provides a comprehensive overview of PrismBench's architecture, design principles, and component interactions.
PrismBench follows a microservices architecture pattern designed for scalability, modularity, and extensibility. The system has four services that communicate via REST APIs.
graph TB
subgraph "PrismBench Framework"
subgraph "Core Services"
GUI[GUI Service<br/>Web UI<br/>Port 3000]
Search[Search Service<br/>MCTS Engine<br/>Port 8002]
Environment[Environment Service<br/>Challenge Execution<br/>Port 8001]
LLM[LLM Interface Service<br/>Model Communication<br/>Port 8000]
end
subgraph "Data Layer"
Redis[Redis<br/> Session Storage]
FileSystem[File System<br/> Results & Logs]
end
GUI <--> Search
Search <--> Environment
Environment <--> LLM
LLM <--> Redis
Search --> FileSystem
end
subgraph "External Services"
OpenAI[OpenAI API]
Anthropic[Anthropic API]
DeepSeek[DeepSeek API]
Local[Local Models<br/>ollama/LMstudio]
end
LLM <--> OpenAI
LLM <--> Anthropic
LLM <--> DeepSeek
LLM <--> Local
Purpose: Provides a web interface for starting and monitoring Search runs.
Key Responsibilities:
- Submit sessions and runs to Search
- Poll task status and show phase progress
- Display completed run summaries
Key Components:
- Next.js Frontend: App router + client UI
- API Client: Axios calls to Search API
- Job State Hooks: Polling and job lifecycle management
Purpose: Orchestrates the Monte Carlo Tree Search algorithm and manages evaluation phases.
Key Responsibilities:
- MCTS algorithm execution across multiple phases
- Tree structure management and traversal
- Node selection, expansion, and evaluation orchestration
- Session management for search experiments
- Task coordination and progress tracking
Key Components:
- Phase Registry: Pluggable phase strategy system
- MCTS Service: Core algorithm implementation
- Tree Framework: Search tree data structures
- Session Management: Experiment lifecycle management
Purpose: Executes coding challenges and manages the agent-based evaluation workflow.
Key Responsibilities:
- Challenge generation through specialized agents
- Test case creation and validation
- Solution generation and debugging
- Code execution in isolated environments
- Multi-agent workflow orchestration
Key Components:
- Environment Registry: Pluggable environment implementations
- Agent Orchestration: Multi-agent workflow management
- Code Execution: Isolated Python execution environment
- Challenge Management: Problem generation and evaluation
Purpose: Provides unified access to multiple LLM providers and manages agent interactions.
Key Responsibilities:
- Multi-provider LLM abstraction
- Session-based conversation management
- Role-based template routing
- Agent role and prompt management
- Response parsing and formatting
Key Components:
- Provider Abstraction: Unified interface for multiple LLM APIs
- Session Management: Multi-turn conversation handling
- Agent Framework: Role-based prompt management
- History Store: Redis-backed role/session histories
Each service is independently deployable and scalable. Services communicate only through well-defined REST APIs, allowing for:
- Independent development and testing
- Technology stack flexibility
- Horizontal scaling of individual components
- Easy replacement or enhancement of services
The framework supports extension at multiple levels:
- Pluggable Agents: Add new agent types without code changes
- Custom Environments: Implement domain-specific evaluation environments
- Phase Strategies: Create new MCTS phases with different objectives
- Model Providers: Integrate new LLM providers seamlessly
All services are built with async-first design:
- Non-blocking operations for better resource utilization
- Concurrent processing of multiple evaluations
- Task-based processing with status tracking
- Event-driven communication patterns
Behavior is controlled through external configuration:
- YAML-based configuration files
- Runtime parameter adjustment
- Environment-specific settings
- Agent role definitions
The search tree is the core data structure representing the exploration space:
class ChallengeNode:
"""Represents a node in the MCTS tree"""
def __init__(self, concepts: List[str], difficulty: str):
self.concepts = concepts # CS concepts tested
self.difficulty = difficulty # Difficulty level
self.visits = 0 # MCTS visit count
self.value = 0.0 # Average performance score
self.children = [] # Child nodes
self.parent = None # Parent node
self.phase = 1 # Which phase created this node
self.run_results = [] # Historical evaluation resultsTree Growth Pattern:
- Root nodes: Single concepts at various difficulties
- Child nodes: Concept combinations or difficulty progressions
- Leaf nodes: Unexplored combinations awaiting evaluation
The agent system provides specialized AI assistants for different tasks:
graph LR
subgraph "Agent Workflow"
CD[Challenge Designer<br/> Problem Creation]
TG[Test Generator<br/> Test Cases]
PS[Problem Solver<br/> Solutions]
PF[Problem Fixer<br/> Debugging]
CD --> TG
TG --> PS
PS --> PF
end
subgraph "Enhanced Workflow"
CDA[Challenge Designer Advanced<br/> Diverse Problems]
TV[Test Validator<br/> Quality Assurance]
TEA[Test Error Analyzer<br/> Failure Analysis]
CDA --> TV
PS --> TEA
end
The environment registry enables pluggable evaluation strategies:
@environment_registry.register_environment_method("custom_env", "execute_node")
async def execute_node(self: "BaseEnvironment", **kwargs) -> Dict:
"""Custom environment execution logic"""
# Environment-specific implementation
passThis pattern allows:
- Runtime environment discovery
- Zero-configuration environment loading
- Polymorphic environment behavior
- Easy testing and development
Similar pattern for MCTS phases:
@phase_registry.register_phase_method("phase_1", "select_node")
async def select_node(self: "BasePhase") -> ChallengeNode:
"""Phase-specific node selection strategy"""
# Selection algorithm implementation
passsequenceDiagram
participant Client
participant Search
participant Environment
participant LLM
Client->>Search: Start Evaluation
Search->>Search: Initialize Tree
Search->>Search: Phase 1: Select Node
Search->>Environment: Execute Challenge
Environment->>LLM: Generate Problem
LLM-->>Environment: Problem Description
Environment->>LLM: Generate Tests
LLM-->>Environment: Test Cases
Environment->>LLM: Solve Problem
LLM-->>Environment: Solution Code
Environment->>Environment: Execute Tests
Environment-->>Search: Evaluation Results
Search->>Search: Update Tree
Search->>Search: Check Convergence
Search-->>Client: Phase Complete
sequenceDiagram
participant Environment as Environment Service
participant LLM as LLM Interface
participant Provider as LLM Provider
Environment->>LLM: Initialize Agent Session
LLM-->>Environment: Session ID
Environment->>LLM: Agent Request (async)
LLM-->>Environment: Task ID
LLM->>Provider: LLM API Call
Provider-->>LLM: Response
LLM->>LLM: Parse & Format
Environment->>LLM: Check Task Status
LLM-->>Environment: Completed Result
- Graceful degradation on service failures
- Comprehensive logging and monitoring
- Input validation at all service boundaries
- Rollback capabilities for failed operations
- Data consistency checks
- Default configurations: Built-in sensible defaults
- Environment-specific: Development, staging, production
- Service configuration: Port, host, logging levels
- Algorithm parameters: MCTS settings, convergence thresholds
- Agent definitions: Prompts, models, parameters
- Environment setup: Available environments and their agents
- Agent System - Deep dive into the agent architecture
- Environment System - Environment framework details
- Tree Structure - Search tree implementation
- MCTS Algorithm - Monte Carlo Tree Search details
- Configuration Overview - Configuration system details
- Quick Start - Getting started guide
- Troubleshooting - Common issues and solutions
- Extending PrismBench - Framework extensibility
- Custom Agents - Creating custom agents
- Custom Environments - Building custom environments
- Custom MCTS Phases - Implementing search strategies
MCTS System
- MCTS Algorithm
- Core MCTS Process
- Key Components
- PrismBench's Three-Phase MCTS
- Tree Structure
- Node Structure
Agent System
Environment System
- Environment Overview
- Environment Types
- Environment Registry
- Agent Integration
- Environment Configuration
Main Configuration
- Configuration Overview
- Agent Configurations
- Environment Configurations
- Phase Configurations
- Tree Configurations
Extension