-
Notifications
You must be signed in to change notification settings - Fork 0
MCTS Algorithm
This guide provides a comprehensive explanation of how PrismBench uses Monte Carlo Tree Search to evaluate LLM capabilities.
Monte Carlo Tree Search (MCTS) is a probabilistic search algorithm that combines tree search principles with random sampling to find optimal decisions. PrismBench uses MCTS to systematically explore the space of computer science concepts and difficulty levels to map LLM capabilities.
MCTS iteratively builds a search tree through four key steps:
Starting from the root, navigate through the tree using a selection policy (like UCB1) that balances exploration and exploitation until reaching a leaf node.
Create one or more child nodes from the selected leaf node to explore new states.
From the new node, perform a "rollout" by simulating random actions until reaching a terminal state or predefined depth.
Update the statistics (visits and values) of all nodes along the path from the simulated node back to the root.
Balances exploration vs exploitation using:
UCB1 = average_value + C * sqrt(ln(total_parent_visits) / node_visits)
Where:
-
average_valueis the node's historical performance -
Cis an exploration constant (typically β2) -
total_parent_visitsis the parent node's visit count -
node_visitsis this node's visit count
Each node maintains:
- Visit count: Number of times node was traversed
- Value: Aggregated results from simulations
- Children: Available next states/concepts
Objective: Understand the model's basic capabilities across different concepts.
Process:
flowchart TD
Start1[Start Phase 1] --> SelectNode1[Select Node]
SelectNode1 --> Simulate1[Simulate Challenge]
Simulate1 --> Score1[Calculate Score]
Score1 --> Backprop1[Backpropagate Results]
Backprop1 --> Expand1{Should Expand?}
Expand1 -->|Yes| CreateChild1[Create Child Node]
CreateChild1 --> CheckConv1[Check Convergence]
Expand1 -->|No| CheckConv1
CheckConv1 -->|Not Converged| SelectNode1
CheckConv1 -->|Converged| End1[End Phase 1]
Key Features:
- Runs until value convergence is achieved
- Node selection uses inverse probability distribution to favor less-explored nodes
- Score calculation considers:
- Challenge success/failure
- Number of tests passed
- Penalties for failures, errors, and multiple attempts
- Difficulty level weighting
- Node expansion occurs when:
- Node value exceeds performance threshold
- Node depth is below maximum
- Expansion happens by either:
- Combining with another node to add new concepts
- Increasing difficulty level
Objective: Identify areas where the model struggles.
Process:
flowchart TD
Start2[Start Phase 2] --> LoadP1[Load Phase 1 Tree]
LoadP1 --> SelectNode2[Select Low Performance Node]
SelectNode2 --> Simulate2[Simulate Variations]
Simulate2 --> Score2[Calculate Challenge Score]
Score2 --> Backprop2[Backpropagate Results]
Backprop2 --> Expand2{Challenge Score > Threshold?}
Expand2 -->|Yes| Record[Record Challenge Combination]
Record --> CheckMore2[Check More Nodes]
Expand2 -->|No| CheckMore2
CheckMore2 -->|Continue| SelectNode2
CheckMore2 -->|Complete| End2[End Phase 2]
Key Features:
- Uses UCB (Upper Confidence Bound) with challenge metrics
- Challenge score calculation considers:
- Inverse success rate
- Number of attempts needed
- Whether fixes were required
- Node expansion prioritizes:
- Combinations exceeding challenge threshold
- Increasing difficulty for challenging nodes
- Adding new concepts for non-challenging nodes
- Maintains a record of challenging combinations with their scores
Objective: Deeply understand the model's challenges.
Process:
flowchart LR
Start3[Start Phase 3] --> LoadChallenges[Load Challenge Combinations]
LoadChallenges --> GenVar[Generate Problem Variations]
GenVar --> RunTests[Run Enhanced Tests]
RunTests --> Analyze[Analyze Performance]
Analyze --> CollectMetrics[Collect Metrics]
CollectMetrics --> GenReport[Generate Report]
GenReport --> End3[End Phase 3]
Key Features:
- Takes challenging nodes identified in Phase 2
- Generates multiple problem variations per concept combination
- Records detailed challenge descriptions and results
- Maintains full history of:
- Problem statements
- Solution attempts
- Test cases
- Performance metrics
base_score = (tests_passed / total_tests) * difficulty_multiplierfinal_score = base_score - (
num_failures * penalty_per_failure +
num_errors * penalty_per_error +
(attempts - 1) * penalty_per_attempt +
(1 if fixed_by_fixer else 0) * fixer_penalty
)- Easy: 1.0x
- Medium: 1.5x
- Hard: 2.0x
- Expert: 3.0x
Nodes can expand by combining with other concepts:
[Arrays] + [Sorting] β [Arrays, Sorting]
Nodes can expand to higher difficulty levels:
[Dynamic Programming, Easy] β [Dynamic Programming, Medium]
Achieved when node values stabilize:
convergence = abs(current_value - previous_value) < thresholdEnsures sufficient exploration:
min_visits_per_node = total_simulations / num_nodes * min_ratio- UCB1 C value: β2 (standard) to 2.0 (high exploration)
- Exploration probability: 0.1-0.3 for random node selection
- Value delta: 0.01-0.05 for stability detection
- Convergence checks: 3-10 consecutive stable iterations
- Expansion threshold: 0.6-0.8 minimum performance
- Challenge threshold: 0.2-0.4 maximum performance for challenges
class MCTSNode:
def __init__(self):
self.concepts = [] # CS concepts in this node
self.difficulty = "easy" # Difficulty level
self.visits = 0 # Visit count
self.value = 0.0 # Average performance
self.children = [] # Child nodes
self.parent = None # Parent nodedef select_node(node):
if not node.children:
return node
best_child = max(node.children, key=lambda c: ucb1_score(c))
return select_node(best_child)def backpropagate(node, score):
while node:
node.visits += 1
node.value = (node.value * (node.visits - 1) + score) / node.visits
node = node.parentFor implementation details, see the Search Service README.
- π³ Tree Structure - Search tree implementation details
- π Custom MCTS Phases - Implementing custom search strategies
- π Results Analysis - Analyzing MCTS results
- ποΈ Architecture Overview - How MCTS fits in the system
- π Environment System - Evaluation environments
- π€ Agent System - Agent integration
- π§ Extending PrismBench - Framework extensions
- π Extension Combinations - Combining MCTS with custom components
- π Configuration Overview - MCTS configuration parameters
π§ MCTS System
- π MCTS Algorithm
- π Core MCTS Process
- π Key Components
- π PrismBench's Three-Phase MCTS
- π³ Tree Structure
- π³ Node Structure
π€ Agent System
- π€ Agent Overview
- π Agent Roles
- π§ Agent Configuration
- π§ Agent Workflows
- π§ Agent Communication
π Environment System
- π Environment Overview
- ποΈ Environment Types
- π§ Environment Registry
- π§ Agent Integration
- π§ Environment Configuration
π Main Configuration
- βοΈ Configuration Overview
- π Agent Configurations
- π Environment Configurations
- π Phase Configurations
- π³ Tree Configurations
π§ Extension
- π Extending PrismBench
- π€ Custom Agents
- π Custom Environments
- π Custom MCTS Phases
- π Extension Combinations
- π‘ Basic Examples (Coming Soon)
- ποΈ Advanced Examples (Coming Soon)
- π Step-by-Step Tutorials (Coming Soon)