|
| 1 | +# DSPy Prompt Optimization |
| 2 | + |
| 3 | +Offline prompt optimization for any repo with LLM calls. |
| 4 | + |
| 5 | +## How It Works |
| 6 | + |
| 7 | +``` |
| 8 | +Your app runs → logs LLM calls (input, output, score) → DSPy optimizes → better prompts → repeat |
| 9 | +``` |
| 10 | + |
| 11 | +DSPy does NOT run in your app's runtime. It runs offline, reads your logs, and exports optimized prompts that you paste back into your code. |
| 12 | + |
| 13 | +## Adding to a New Repo |
| 14 | + |
| 15 | +### Step 1: Instrument your LLM calls |
| 16 | + |
| 17 | +Log every LLM call with: |
| 18 | +```json |
| 19 | +{ |
| 20 | + "id": "uuid", |
| 21 | + "timestamp": "ISO", |
| 22 | + "prompt_name": "query_router", // which prompt template |
| 23 | + "input": { ... }, // what went in |
| 24 | + "output": { ... }, // what came out |
| 25 | + "score": 0.85, // quality signal (0-1) |
| 26 | + "latency_ms": 340, |
| 27 | + "tokens_used": 1200 |
| 28 | +} |
| 29 | +``` |
| 30 | + |
| 31 | +Store in any DB (SQLite, Postgres, JSON files). |
| 32 | + |
| 33 | +### Step 2: Define DSPy Signatures |
| 34 | + |
| 35 | +Map each LLM call to a signature: |
| 36 | + |
| 37 | +```python |
| 38 | +# signatures.py |
| 39 | +import dspy |
| 40 | + |
| 41 | +class YourPrompt(dspy.Signature): |
| 42 | + """One-line description of what this LLM call does.""" |
| 43 | + user_input: str = dspy.InputField(desc="what the user asked") |
| 44 | + context: str = dspy.InputField(desc="retrieved context") |
| 45 | + response: str = dspy.OutputField(desc="the answer") |
| 46 | + confidence: float = dspy.OutputField(desc="0-1 confidence") |
| 47 | +``` |
| 48 | + |
| 49 | +### Step 3: Write a data loader |
| 50 | + |
| 51 | +```python |
| 52 | +# data.py |
| 53 | +def load_examples(db_path, min_score=0.7): |
| 54 | + """Load high-quality examples from your logs.""" |
| 55 | + rows = db.query("SELECT * FROM llm_logs WHERE score >= ?", min_score) |
| 56 | + return [dspy.Example(**row).with_inputs("user_input", "context") for row in rows] |
| 57 | +``` |
| 58 | + |
| 59 | +### Step 4: Define a metric |
| 60 | + |
| 61 | +```python |
| 62 | +# optimize.py |
| 63 | +def my_metric(prediction, example, trace=None): |
| 64 | + """How good is this prediction? Return 0-1.""" |
| 65 | + score = 0.0 |
| 66 | + if prediction.response and len(prediction.response) > 10: |
| 67 | + score += 0.5 # non-empty response |
| 68 | + if some_quality_check(prediction.response, example): |
| 69 | + score += 0.5 # domain-specific quality |
| 70 | + return score |
| 71 | +``` |
| 72 | + |
| 73 | +### Step 5: Run |
| 74 | + |
| 75 | +```bash |
| 76 | +./scripts/dspy/setup.sh |
| 77 | +source scripts/dspy/.venv/bin/activate |
| 78 | +python scripts/dspy/optimize.py |
| 79 | +# → outputs optimized_state.json |
| 80 | +# → paste improved prompt back into your code |
| 81 | +``` |
| 82 | + |
| 83 | +## File Structure |
| 84 | + |
| 85 | +``` |
| 86 | +scripts/dspy/ |
| 87 | +├── setup.sh # Create venv, install deps |
| 88 | +├── requirements.txt # dspy + provider SDK |
| 89 | +├── signatures.py # DSPy I/O contracts (per-repo) |
| 90 | +├── data.py # Load training data from your DB (per-repo) |
| 91 | +├── optimize.py # Run optimization + export |
| 92 | +├── eval.py # CI regression detection |
| 93 | +├── loop.sh # Cron wrapper with threshold gating |
| 94 | +├── optimized_state.json # Output (gitignored) |
| 95 | +└── .last_run # Tracks audit row count (gitignored) |
| 96 | +``` |
| 97 | + |
| 98 | +## Per-Repo Examples |
| 99 | + |
| 100 | +### ProvenantAI (Node/Express/Postgres) |
| 101 | + |
| 102 | +```python |
| 103 | +# signatures.py |
| 104 | +class QueryRoute(dspy.Signature): |
| 105 | + """Route a natural language query to data sources and generate a response.""" |
| 106 | + prompt: str = dspy.InputField() |
| 107 | + org_context: str = dspy.InputField() |
| 108 | + response: str = dspy.OutputField() |
| 109 | + sources_used: str = dspy.OutputField() |
| 110 | + |
| 111 | +class LeadScore(dspy.Signature): |
| 112 | + """Score a sales lead 0-100 based on signals.""" |
| 113 | + name: str = dspy.InputField() |
| 114 | + title: str = dspy.InputField() |
| 115 | + company: str = dspy.InputField() |
| 116 | + signals: str = dspy.InputField(desc="visit count, page views, engagement") |
| 117 | + score: int = dspy.OutputField(desc="0-100 lead score") |
| 118 | + reasoning: str = dspy.OutputField() |
| 119 | + |
| 120 | +class BriefingGeneration(dspy.Signature): |
| 121 | + """Generate a daily executive briefing from org data.""" |
| 122 | + kpis: str = dspy.InputField() |
| 123 | + alerts: str = dspy.InputField() |
| 124 | + recent_activity: str = dspy.InputField() |
| 125 | + briefing: str = dspy.OutputField() |
| 126 | + |
| 127 | +# data.py |
| 128 | +def load_from_postgres(): |
| 129 | + conn = psycopg2.connect(DATABASE_URL) |
| 130 | + rows = conn.execute(""" |
| 131 | + SELECT prompt, response, feedback_score |
| 132 | + FROM query_logs |
| 133 | + WHERE feedback_score IS NOT NULL |
| 134 | + ORDER BY created_at DESC LIMIT 200 |
| 135 | + """).fetchall() |
| 136 | + return [dspy.Example(...) for row in rows] |
| 137 | +``` |
| 138 | + |
| 139 | +### Generic RAG App |
| 140 | + |
| 141 | +```python |
| 142 | +class RAGAnswer(dspy.Signature): |
| 143 | + """Answer a question using retrieved documents.""" |
| 144 | + question: str = dspy.InputField() |
| 145 | + documents: str = dspy.InputField() |
| 146 | + answer: str = dspy.OutputField() |
| 147 | + citations: str = dspy.OutputField() |
| 148 | + |
| 149 | +# Metric: answer contains info from documents + is factual |
| 150 | +def rag_metric(pred, ex, trace=None): |
| 151 | + has_citation = any(doc_id in pred.citations for doc_id in ex.doc_ids) |
| 152 | + is_relevant = pred.answer and len(pred.answer) > 20 |
| 153 | + return 0.5 * has_citation + 0.5 * is_relevant |
| 154 | +``` |
| 155 | + |
| 156 | +## Cron Installation |
| 157 | + |
| 158 | +```bash |
| 159 | +# Add to crontab (every 4h, skips if not enough new data) |
| 160 | +(crontab -l; echo "23 */4 * * * /path/to/repo/scripts/dspy/loop.sh") | crontab - |
| 161 | +``` |
| 162 | + |
| 163 | +## Cost |
| 164 | + |
| 165 | +| Model | Per Run | Daily (6x) | Monthly | |
| 166 | +|-------|---------|------------|---------| |
| 167 | +| Haiku | ~$0.05 | ~$0.30 | ~$9 | |
| 168 | +| Sonnet | ~$0.50 | ~$3.00 | ~$90 | |
| 169 | + |
| 170 | +Use Haiku for routine optimization, Sonnet for periodic deep optimization. |
0 commit comments