Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci-cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
python-version: ["3.12", "3.13", "3.14"]
python-version: ["3.12", "3.13"]
steps:
- uses: actions/checkout@ff7abcd0c3c05ccf6adc123a8cd1fd4fb30fb493
- name: Set up Python ${{ matrix.python-version }}
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -146,3 +146,4 @@ cython_debug/

# Runtime Logs
logs/
optimized_manifest.json
107 changes: 67 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,69 @@
# coreason-optimizer

coreason-optimizer

[![CI/CD](https://github.com/CoReason-AI/coreason_optimizer/actions/workflows/ci-cd.yml/badge.svg)](https://github.com/CoReason-AI/coreason_optimizer/actions/workflows/ci-cd.yml)
[![PyPI](https://img.shields.io/pypi/v/coreason_optimizer.svg)](https://pypi.org/project/coreason_optimizer/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/coreason_optimizer.svg)](https://pypi.org/project/coreason_optimizer/)
[![License](https://img.shields.io/github/license/CoReason-AI/coreason_optimizer)](https://github.com/CoReason-AI/coreason_optimizer/blob/main/LICENSE)
[![Codecov](https://codecov.io/gh/CoReason-AI/coreason_optimizer/branch/main/graph/badge.svg)](https://codecov.io/gh/CoReason-AI/coreason_optimizer)
[![Downloads](https://static.pepy.tech/badge/coreason_optimizer)](https://pepy.tech/project/coreason_optimizer)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)

## Getting Started

### Prerequisites

- Python 3.12+
- Poetry

### Installation

1. Clone the repository:
```sh
git clone https://github.com/CoReason-AI/coreason_optimizer.git
cd coreason_optimizer
```
2. Install dependencies:
```sh
poetry install
```

### Usage

- Run the linter:
```sh
poetry run pre-commit run --all-files
```
- Run the tests:
```sh
poetry run pytest
```
**Automated Prompt Engineering / LLM Compilation / DSPy Integration for CoReason-AI**

[![License: Prosperity 3.0](https://img.shields.io/badge/license-Prosperity%203.0-blue)](https://prosperitylicense.com/versions/3.0.0)
[![CI Status](https://github.com/CoReason-AI/coreason-optimizer/actions/workflows/main.yml/badge.svg)](https://github.com/CoReason-AI/coreason-optimizer/actions)
[![Code Style: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Documentation](https://img.shields.io/badge/docs-product_requirements-blue)](docs/product_requirements.md)

**coreason-optimizer** is the "Compiler" for the CoReason Agentic Platform. It automates prompt engineering by treating prompts as trainable weights, optimizing them against ground-truth datasets to maximize performance metrics.

---

## Installation

```bash
pip install coreason-optimizer
```

## Features

- **Automated Optimization:** Rewrites instructions and selects examples to maximize a score, not human intuition.
- **Model-Specific Compilation:** Generates optimized prompts specifically tuned for target models (e.g., GPT-4, Claude 3.5).
- **Continuous Learning:** Re-runs optimization on recent logs to patch prompts against data drift.
- **Mutate-Evaluate Loop:** Systematic cycle of drafting, evaluating, diagnosing, mutating, and selecting prompts.
- **Strategies:** Includes BootstrapFewShot (mining successful traces) and MIPRO (Multi-prompt Instruction PRoposal Optimizer).
- **Integration:** Works seamlessly with `coreason-construct`, `coreason-archive`, and `coreason-assay`.

For full product requirements, see [docs/product_requirements.md](docs/product_requirements.md).

## Usage

Here is how to initialize and use the library to compile an agent:

```python
from coreason_optimizer import OptimizerConfig, PromptOptimizer
from coreason_optimizer.core.interfaces import Construct
from coreason_optimizer.data import Dataset

# 1. Configuration
config = OptimizerConfig(
target_model="gpt-4o",
metric="exact_match",
max_rounds=10
)

# 2. Load Data
dataset = Dataset.from_csv("data/gold_set.csv")
train_set, val_set = dataset.split(test_size=0.2)

# 3. Load Agent (Construct)
# In a real scenario, this would be imported from your agent code
# from src.agents.analyst import analyst_agent
class MockAgent(Construct):
inputs = ["question"]
outputs = ["answer"]
system_prompt = "You are a helpful assistant."
agent = MockAgent()

# 4. Compile
optimizer = PromptOptimizer(config=config)
optimized_manifest = optimizer.compile(
agent=agent,
trainset=train_set,
valset=val_set
)

print(f"Optimization complete. New Score: {optimized_manifest.performance_metric}")
print(f"Optimized Instruction: {optimized_manifest.optimized_instruction}")
69 changes: 69 additions & 0 deletions VIGNETTE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# The Architecture and Utility of coreason-optimizer

## 1. The Philosophy (The Why)

The prevailing method of interacting with Large Language Models (LLMs)—manual "prompt engineering"—is an exercise in frustration. It is artisan work: fragile, unscalable, and often relying on "magic words" that break when models update. The author of `coreason-optimizer` recognizes that prompts are not merely text; they are **trainable parameters** of a software system.

This package exists to replace intuition with optimization. Instead of a developer guessing which few-shot examples might help, `coreason-optimizer` empirically selects them. Instead of rewriting instructions hoping for better JSON compliance, it uses a meta-learner to rewrite them for you. It shifts the paradigm from "Prompt Whisperer" to "Prompt Compiler," treating the agent definition as source code and the deployed prompt as a compiled, frozen binary.

## 2. Under the Hood (The Dependencies & Logic)

The engine runs on a focused stack designed for iterative evaluation:

* **Pydantic** enforces the rigorous schema definitions (`OptimizerConfig`, `OptimizedManifest`) required for a compiler that must output deterministic artifacts.
* **OpenAI** & **Numpy/Scikit-Learn** power the semantic search and generation capabilities. The package doesn't just call LLMs; it uses embeddings to find "nearest neighbor" successful examples to inject into prompts (`SemanticSelector`).
* **Loguru** provides the observability backbone. When an optimization run takes 4 hours and spends $10, you need structured, searchable logs to understand *why* a specific mutation was rejected.
* **Click** exposes the compiler interface to CI/CD pipelines, allowing optimization to be a step in the build process, not a manual task.

The core logic revolves around the **Mutate-Evaluate Loop**. Inspired by DSPy, the `MiproOptimizer` (Multi-prompt Instruction PRoposal Optimizer) generates candidate instructions using a "Teacher" model. Simultaneously, it selects sets of few-shot examples. It then performs a grid search across these combinations, scoring them against a ground-truth dataset using a defined `Metric` (like `exact_match`). The result is not just a better prompt, but a mathematically optimal one for that specific dataset and model.

## 3. In Practice (The How)

Here is how `coreason-optimizer` transforms a raw agent definition into a deployed artifact.

### Compiling an Agent

The `compile` method is the heart of the system. It takes your agent logic and training data, runs the optimization strategies (like BootstrapFewShot or MIPRO), and returns a frozen manifest.

```python
from coreason_optimizer.core.config import OptimizerConfig
from coreason_optimizer.strategies.mipro import MiproOptimizer
from coreason_optimizer.core.metrics import MetricFactory

# 1. Configuration: Define the target environment
config = OptimizerConfig(
target_model="gpt-4o",
budget_limit_usd=5.00, # Safety first
max_rounds=10,
)

# 2. Instantiate the Optimizer with a specific Metric
# "exact_match" ensures the output strictly adheres to the reference
optimizer = MiproOptimizer(
llm_client=client, metric=MetricFactory.get("exact_match"), config=config
)

# 3. The Compilation Step
# This runs the "Mutate-Evaluate" loop, finding the best instruction/example pair
manifest = optimizer.compile(
agent=my_agent_construct,
trainset=training_examples,
valset=validation_examples,
)

print(f"Optimization improved score to: {manifest.performance_metric}")
```

### The Optimized Artifact

The output is a portable JSON manifest. This file allows the runtime to execute the optimized agent without needing the optimizer or the training data again.

```python
# The manifest contains the "compiled" prompt logic
print(manifest.optimized_instruction)
# > "Extract adverse events from the text. Format as JSON. [Optimized Instructions...]"

# It also holds the mathematically selected few-shot examples
for example in manifest.few_shot_examples:
print(f"Input: {example.inputs} -> Output: {example.reference}")
```
129 changes: 129 additions & 0 deletions docs/product_requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Product Requirements Document: coreason-optimizer

**Domain:** Automated Prompt Engineering / LLM Compilation / DSPy Integration
**Package Name:** coreason-optimizer

---

## 1. Executive Summary

**coreason-optimizer** is the "Compiler" for the CoReason Agentic Platform.

In the current SOTA (State-of-the-Art), writing static prompts by hand is considered technical debt. **coreason-optimizer** automates this by treating prompts (instructions and few-shot examples) as **trainable weights**. It ingests a "Draft Agent" defined in `coreason-construct` and iterates on it against a ground-truth dataset (validated by `coreason-assay`), mathematically maximizing performance metrics. It outputs a "Frozen Manifest" that is deployed to production, ensuring GxP stability.

## 2. Problem Statement & Rationale

| Problem | Impact | The coreason-optimizer Solution |
| :---- | :---- | :---- |
| **The "Prompt Whisperer" Bottleneck** | Engineers spend hours tweaking words ("Please be careful") with unpredictable results. | **Automated Optimization:** A meta-algorithm rewrites instructions and selects examples to maximize a score, not human intuition. |
| **Brittleness** | A prompt that works for GPT-4 often fails for Claude 3.5 or Llama 3. | **Model-Specific Compilation:** The optimizer can run separate jobs to generate optimized prompts specifically tuned for the target model. |
| **Drift** | Agents degrade over time as data distributions change (e.g., new medical slang). | **Continuous Learning:** Re-running the optimizer on recent "Gold" logs from `coreason-archive` automatically patches the prompt. |

## 3. Architectural Design

### 3.1 The "Mutate-Evaluate" Loop

The package implements a systematic optimization cycle (inspired by DSPy):

1. **Draft:** Start with the developer's base intention.
2. **Evaluate:** Run the agent on a training set.
3. **Diagnose:** Identify failing examples using `coreason-assay` metrics.
4. **Mutate:**
* **Bootstrap Few-Shot:** Find historical examples where the agent *succeeded* on similar hard cases and inject them into the prompt.
* **Instruction Induction:** Use a Meta-LLM to rewrite the System Prompt to explicitly address the observed failures.
5. **Select:** Keep the mutation that yields the highest metric score.

### 3.2 Integration Map

* **Input (Schema):** `coreason-construct` defines the Agent structure (Inputs/Outputs).
* **Input (Data):** `coreason-archive` provides historical logs to mine for training examples.
* **Feedback (Loss Function):** `coreason-assay` provides the scoring function (e.g., accuracy, json_validity, f1_score).
* **Output (Artifact):** Produces a versioned `OptimizedManifest.json` used by the runtime.

## 4. Functional Specifications

### 4.1 The Optimization Engine

* **Strategy: BootstrapFewShot:**
* Automatically mines the "Teacher" model's successful traces to create few-shot examples for the "Student" prompt.
* **Strategy: MIPRO (Multi-prompt Instruction PRoposal Optimizer):**
* Generates 10 candidates for the System Instruction and 5 combinations of Few-Shot examples, finding the optimal pair via Bayesian optimization or simple grid search.
* **Cost Awareness:**
* Must implement a `BudgetManager` to halt optimization if the token spend exceeds a defined limit (e.g., $10.00).

### 4.2 Data Management

* **Dataset Loader:** Standardizes inputs from CSV, JSONL, or `coreason-archive` SQL queries into a `TrainingExample` object.
* **Splitter:** automatically creates Train/Dev/Test splits to prevent overfitting the prompt to the training data.

### 4.3 The Manifest Serializer

* The output must be deterministic and immutable.
* **Schema:**
```json
{
"agent_id": "adverse_event_extractor",
"base_model": "gpt-4o",
"optimized_instruction": "Extract adverse events... [Modified by Optimizer]",
"few_shot_examples": [ ... ],
"performance_metric": "0.94",
"optimization_run_id": "opt_20250119_xyz"
}
```

## 5. Technical Specifications (API)

### 5.1 The Interface

```python
class OptimizerConfig(BaseModel):
target_model: str = "gpt-4o"
metric: str = "exact_match"
max_bootstrapped_demos: int = 4
max_rounds: int = 10

class PromptOptimizer(ABC):
@abstractmethod
def compile(self,
agent: Construct,
trainset: List[Example],
valset: List[Example]) -> OptimizedManifest:
"""Run the optimization loop."""
pass
```

### 5.2 The CLI (coreason-opt)

The package should expose a command-line interface for CI/CD integration:

* `coreason-opt tune --agent src/agents/analyst.py --dataset data/gold_set.csv`
* `coreason-opt evaluate --manifest dist/analyst_v2.json --dataset data/test_set.csv`

## 6. Implementation Plan: Atomic Units of Change (AUC)

### Phase 1: Foundation

* **AUC-1: Scaffold & Configuration:** Project structure, `pyproject.toml`, and `OptimizerConfig` Pydantic models.
* **AUC-2: Abstract Base Classes:** Define `BaseOptimizer`, `BaseSelector` (for examples), and `BaseMutator` (for instructions).

### Phase 2: Data & Metrics

* **AUC-3: Dataset Loader:** Implement `Dataset` class that handles loading/splitting from CSV and `coreason-archive`.
* **AUC-4: Metric Adapter:** Create a wrapper that adapts `coreason-assay` functions into the format required by the optimization loop.

### Phase 3: The Strategies

* **AUC-5: Few-Shot Selector:** Implement logic to select examples using Semantic Similarity (via `coreason-foundry` embeddings) or Random Sampling.
* **AUC-6: Bootstrap Logic:** Implement the "Teacher-Student" loop where the model generates its own training data from input questions.
* **AUC-7: Instruction Mutator:** Implement the Meta-Prompt that analyzes failures and rewrites the system prompt.

### Phase 4: The Loop & Artifacts

* **AUC-8: The Compile Loop:** Connect the Mutators and Selectors into the main `compile()` orchestration method.
* **AUC-9: Manifest Serializer:** Logic to dump the final state to JSON.
* **AUC-10: CLI Entrypoint:** Build the `coreason-opt` command line tool.

## 7. Compliance & Safety

* **Audit Trail:** Every optimization run must log the `trace_id` of the experiments to `coreason-veritas`. We must be able to explain *why* the prompt changed.
* **Human-in-the-Loop Gate:** The `OptimizedManifest` is not automatically deployed. It is saved as a "Candidate" that requires a human to review the score improvement before promotion to production.
Loading