Skip to content

Commit a7eb99e

Browse files
committed
Update README.md
1 parent 988f9ff commit a7eb99e

File tree

1 file changed

+86
-20
lines changed

1 file changed

+86
-20
lines changed

optillm/plugins/spl/README.md

Lines changed: 86 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,43 @@
11
# System Prompt Learning (SPL) Plugin for OptiLLM
22

3-
This plugin implements Andrej Karpathy's [proposed](https://x.com/karpathy/status/1921368644069765486) "third paradigm" for LLM learning, enabling LLMs to learn and improve their problem-solving strategies over time.
3+
This plugin implements Andrej Karpathy's [proposed](https://x.com/karpathy/status/1921368644069765486) "third paradigm" for LLM learning, enabling large language models to learn and improve their problem-solving strategies over time through experience and reflection.
44

5-
## Concept
5+
## Introduction: The Evolution of LLM Learning
66

7-
While traditional LLM learning involves either:
8-
- **Pretraining**: Learning factual knowledge from data
9-
- **Finetuning**: Learning behavioral patterns through supervision or reinforcement
7+
Large Language Models (LLMs) have traditionally learned in two primary ways:
8+
1. **Pretraining**: Learning facts, patterns, and language from massive text corpora
9+
2. **Finetuning**: Learning behaviors through supervised or reinforcement learning
1010

1111
System Prompt Learning introduces a third paradigm:
12-
- **Strategy Learning**: The model learns explicit problem-solving strategies and remembers them in a growing database
13-
- These strategies can be selectively applied to new problems based on their type and similarity
14-
- The system tracks which strategies work well and refines them over time
12+
3. **Strategy Learning**: The model learns explicit problem-solving strategies through experience, maintains them in a growing knowledge base, and applies them selectively based on problem types
13+
14+
This approach addresses a fundamental limitation of current LLMs—their inability to learn cumulatively from experience. While LLMs can solve individual problems impressively, they typically approach each new problem from scratch rather than building on past successes.
15+
16+
## The SPL Paradigm
17+
18+
System Prompt Learning represents a significant shift in how LLMs approach problem-solving:
19+
20+
- **Experience-Driven Learning**: Rather than relying solely on pretraining or supervised finetuning, SPL enables models to learn from their own problem-solving experiences
21+
- **Strategy Formalization**: The system explicitly generates, evaluates, and refines problem-solving strategies
22+
- **Performance Tracking**: SPL tracks which strategies work well for different problem types, creating a dynamic feedback loop
23+
- **Selective Application**: When faced with a new problem, the system selects the most relevant strategies based on similarity and past performance
24+
25+
This approach mirrors how human experts develop expertise—by accumulating strategies through experience and applying them selectively to new situations.
26+
27+
## Experimental Results
28+
29+
We conducted extensive experiments using the SPL plugin with gemini-2.0-flash-lite on various benchmarks. The learning phase used the OptILLMBench training split (400 instances), while evaluation was performed on the test split (100 instances) and additional popular mathematical benchmarks.
30+
31+
The results demonstrate consistent improvements across all benchmarks:
32+
33+
| Benchmark | Baseline | With SPL | Improvement |
34+
|-----------|----------|----------|-------------|
35+
| OptILLMBench | 61% | 65% | +4% |
36+
| MATH-500 | 85% | 85.6% | +0.6% |
37+
| Arena Auto Hard | 29% | 37.6% | +8.6% |
38+
| AIME24 | 23.33% | 30% | +6.67% |
39+
40+
These results are particularly notable for the challenging Arena Auto Hard and AIME24 benchmarks, where traditional approaches often struggle. The improvements suggest that SPL is especially effective for complex problem-solving tasks that benefit from strategic approaches.
1541

1642
## Usage
1743

@@ -60,6 +86,18 @@ The plugin maintains two separate limits:
6086
- **Storage Limit** (MAX_STRATEGIES_PER_TYPE): Controls how many strategies can be stored in the database per problem type
6187
- **Inference Limit** (MAX_STRATEGIES_FOR_INFERENCE): Controls how many strategies are used during inference for system prompt augmentation
6288

89+
## Learning Metrics
90+
91+
After training on the OptILLMBench dataset, the system developed a rich knowledge base of strategies:
92+
93+
- **Total queries processed**: 500
94+
- **Strategies created**: 129
95+
- **Strategies refined**: 97
96+
- **Successful resolutions**: 346
97+
- **Strategies merged**: 28
98+
99+
These metrics indicate a healthy learning process with a balance between creation, refinement, and merging of similar strategies.
100+
63101
## Data Storage
64102

65103
Strategies are stored in JSON format in the `spl_data` directory:
@@ -79,27 +117,55 @@ You can:
79117

80118
## Example Strategy
81119

82-
A strategy in the database looks like this:
120+
Below is an example of a strategy learned by the system for word problems:
83121

84122
```json
85123
{
86-
"strategy_id": "strategy_1",
87-
"problem_type": "arithmetic_calculation",
88-
"strategy_text": "When solving arithmetic calculations:\n1. Identify the operations needed (addition, subtraction, multiplication, division)\n2. Follow the order of operations (PEMDAS)\n3. Simplify expressions step by step, showing your work\n4. Double-check your calculations with inverse operations",
89-
"examples": ["Solve 3x + 5 = 14"],
90-
"success_count": 8,
91-
"total_attempts": 10,
92-
"created_at": "2025-05-12T10:15:30.123456",
93-
"last_used": "2025-05-12T14:25:10.654321",
94-
"last_updated": "2025-05-12T12:30:45.987654",
95-
"confidence": 0.8,
96-
"tags": ["math", "equations"]
124+
"strategy_id": "strategy_3",
125+
"problem_type": "word_problem",
126+
"strategy_text": "**Refined Strategy for Solving Word Problems:**\n\n1. **Understand:**\n * Read the problem carefully (multiple times).\n * Identify the question (what are you trying to find?).\n * List all given information (facts, numbers, units).\n * Clarify ambiguous terms/units.\n\n2. **Organize Information & Identify Unknowns:**\n * Choose an organization method: (e.g., table, diagram, list, drawing).\n * Clearly identify the unknowns (what you need to solve for).\n\n3. **Plan and Translate:**\n * Define *all* variables with units (e.g., `p = number of pennies`, `c = number of compartments`).\n * Identify relationships between knowns and unknowns.\n * Convert units if necessary.\n * Write equations or expressions, including units, that relate the knowns and unknowns.\n * Ensure units are consistent throughout the equations.\n * Outline the solution steps.\n\n4. **Solve:**\n * Show work step-by-step.\n * Track units throughout calculations.\n * Calculate accurately.\n * Solve for the unknowns.\n\n5. **Evaluate and Verify:**\n * Check if the answer is reasonable.\n * Verify the answer.\n\n6. **Summarize:**\n * State the answer with units.",
127+
"success_count": 85,
128+
"total_attempts": 192,
129+
"confidence": 0.425
97130
}
98131
```
99132

133+
This strategy was developed through multiple refinement cycles and has a success rate of 44.3% (85/192). The system continuously updates these metrics as the strategy is applied to new problems.
134+
135+
## Motivations and Broader Impact
136+
137+
### The System Prompt Gap
138+
139+
Most LLM providers like Anthropic (Claude) and OpenAI (GPT) employ elaborate system prompts that encode sophisticated problem-solving strategies. However, the majority of users interact with these models using very basic or empty system prompts, missing out on the benefits of strategic guidance.
140+
141+
SPL bridges this gap by automatically learning and applying effective strategies, democratizing access to the benefits of well-crafted system prompts without requiring expertise in prompt engineering.
142+
143+
### Learning from Experience
144+
145+
Current LLMs are often described as "one-shot learners"—they can solve individual problems but don't accumulate knowledge from these experiences. SPL represents a step toward models that improve through use, similar to how humans develop expertise through practice and reflection.
146+
147+
### Human-Readable Learning
148+
149+
Unlike black-box learning approaches, SPL produces human-readable strategies that can be inspected, understood, and even manually edited. This transparency allows for:
150+
- Understanding how the model approaches different problems
151+
- Identifying potential biases or flaws in reasoning
152+
- Transferring strategies between models or domains
153+
100154
## Benefits
101155

102156
1. **Cumulative Learning**: The LLM improves on specific problem types over time
103157
2. **Explicit Knowledge**: Strategies are human-readable and provide insight into the LLM's reasoning
104158
3. **Efficiency**: Reuses successful approaches rather than solving each problem from scratch
105159
4. **Adaptability**: Different strategies for different problem types
160+
5. **Transparency**: Learning process and outcomes can be inspected and understood
161+
162+
## Conclusion and Future Work
163+
164+
System Prompt Learning represents a promising new direction for enabling LLMs to learn from experience in a transparent and interpretable way. Our experiments demonstrate significant performance improvements across multiple benchmarks, particularly for complex problem-solving tasks.
165+
166+
Future work will focus on:
167+
1. Expanding the range of problem types the system can recognize
168+
2. Improving the strategy refinement process
169+
3. Enabling cross-domain strategy transfer
170+
4. Developing mechanisms for human feedback on strategies
171+
5. Exploring hybrid approaches that combine SPL with other learning paradigms

0 commit comments

Comments
 (0)