docs: enhance documentation on parameter tuning and experiment tracking

mceachen · mceachen · commit 1398e9ae50c7 · 2026-02-11T18:53:40.000-08:00
diff --git a/_todo/20260211-build-speed-optimization.md b/_todo/20260211-build-speed-optimization.md
@@ -12,8 +12,9 @@ Reduce DiskANN index build time from 707s to <150s for 25k vectors through param
 - [x] Test-First Development
 - [x] Implementation (Cache + Hash Set)
 - [x] Integration (Cache into insert path)
-- [x] **Fix Test Failures** ✅
-- [ ] Cleanup & Documentation
+- [x] Fix Test Failures
+- [x] **Documentation & Analysis** ✅
+- [ ] Further Optimization (max_neighbors tuning)
 - [ ] Final Review
 
 ## Required Reading
@@ -1017,6 +1018,274 @@ The flaky recall test revealed **positive news**: block size fix made the graph
    - Verify cache is being used (add debug prints for hit/miss counts)
    - Check if reusable_blob optimization is interfering with cache
 
+---
+
+## Handoff Summary (Session 2026-02-12 - Documentation & Analysis)
+
+**Context:** This session focused on documenting parameters, creating experiment tracking infrastructure, analyzing benchmark results, and identifying next optimization targets.
+
+**Major Accomplishments:**
+
+### 1. **Comprehensive Parameter Documentation** (3 hours)
+
+Created user-facing documentation explaining parameter mutability and recommendations:
+
+**Files Created:**
+- `PARAMETERS.md` (340 lines) - Complete parameter guide
+  - 🔒 IMMUTABLE parameters (dimensions, metric, max_neighbors, block_size)
+  - ⚠️ SEMI-MUTABLE parameters (insert_list_size, pruning_alpha)
+  - ✅ RUNTIME MUTABLE parameters (search_list_size - overridable per-query)
+  - Recommended values by use case (text embeddings, images, small/large datasets)
+  - How to change parameters safely
+  - Common mistakes and pitfalls
+  - Performance tuning checklist
+
+**Files Modified:**
+- `src/types.ts` - Enhanced TypeScript interfaces with mutability indicators
+  - Added 🔒⚠️✅ symbols to all parameter JSDoc
+  - Comprehensive examples for each option
+  - Links to PARAMETERS.md
+- `typedoc.json` (NEW) - TypeDoc configuration using PARAMETERS.md as readme
+- `CLAUDE.md` - Added 10-line reminder to document performance experiments
+
+**Key Documentation Insights:**
+- Parameters stored in `<table>_metadata` shadow table determine index lifecycle
+- `search_list_size` is the ONLY runtime-tunable parameter (via SQL constraint)
+- Changing `max_neighbors` requires full rebuild (determines block_size)
+- Block size formula: `f(dimensions, max_neighbors)` creates 40KB blocks for 256D/32-neighbors
+
+### 2. **Experiment Tracking System** (2 hours)
+
+Created structured system to document expensive performance experiments:
+
+**Files Created:**
+- `experiments/README.md` - Guidelines, best practices, experiment index
+- `experiments/template.md` - Structured format (hypothesis, setup, results, analysis, lessons)
+- `experiments/experiment-001-cache-hash-optimization.md` - Documented cache work
+
+**Experiment Index:**
+| ID | Title | Status | Key Finding |
+|----|-------|--------|-------------|
+| 001 | Cache + Hash Set Optimization | Complete | 37% speedup (not 5x as hoped) |
+| 002 | insert_list_size 200→100 | Complete | Only 2% improvement (cache masked) |
+| 003 | Block Size Impact | Planned | Test max_neighbors [24,32,48,64] |
+| 004 | Scaling Test 10k→200k | Planned | Find crossover vs brute-force |
+
+**Why This Matters:**
+- Performance experiments take hours to run
+- Future engineers need to know what was tried, what worked, what didn't
+- Prevents repeating failed experiments
+- Documents "why" behind parameter choices
+
+### 3. **Parameter Tuning Framework** (1 hour)
+
+Created benchmark profiles and methodology for finding optimal defaults:
+
+**Files Created:**
+- `benchmarks/profiles/param-sweep-insert-list.json` - Test [50,75,100,150,200]
+- `benchmarks/profiles/param-sweep-max-neighbors.json` - Test [24,32,48,64]
+- `benchmarks/profiles/scaling-test.json` - Test [10k,25k,50k,100k,200k]
+- `benchmarks/TUNING-GUIDE.md` - Complete methodology for parameter optimization
+
+**Tuning Strategy:**
+1. **Insert list sweep** - Find where recall plateaus (~30 min)
+2. **Max neighbors sweep** - Balance index size vs recall (~25 min)
+3. **Scaling test** - Find crossover point where DiskANN beats brute-force (~90 min)
+
+**Expected Crossover (based on O(log n) analysis):**
+- <50k vectors: sqlite-vec wins (brute force faster)
+- ~75k-100k: Crossover point (DiskANN becomes competitive)
+- 100k+: DiskANN dominates (logarithmic vs linear scaling)
+
+### 4. **Benchmark Analysis & Results**
+
+**Ran 2 benchmarks with 25k vectors:**
+
+```
+With insert_list_size=200 (old default):
+- Build: 442.4s (37% faster than 707s baseline)
+- Recall@10: 99.2%
+- QPS: 77
+
+With insert_list_size=100 (new default):
+- Build: 432.7s (39% faster than baseline)
+- Recall@10: 99.2% (unchanged)
+- QPS: 83
+- Improvement: Only 2.2% (cache masks the parameter effect!)
+```
+
+**Key Findings:**
+
+1. **Cache is VERY effective** - 37% speedup dominates all other optimizations
+2. **Cache masks insert_list_size** - Reducing BLOB reads has minimal impact when cache hit rate is high
+3. **Recall unexpectedly high** - 99.2% vs expected 95% (block size fix worked well)
+4. **DiskANN not competitive at 25k** - 83 QPS vs sqlite-vec's 206 QPS
+5. **Index bloat is the real problem** - 988MB for 25k vectors = 38x overhead
+
+**Index Size Breakdown:**
+```
+25k vectors × 40KB/block = 1GB index
+Raw vectors: 25k × 256D × 4 bytes = 25.6 MB
+Overhead: 38x!
+
+Root cause: max_neighbors=32 with 256D needs 40KB blocks
+Most nodes only use ~50% of allocated space
+```
+
+### 5. **Parameter Change**
+
+**Applied:**
+- `DEFAULT_INSERT_LIST_SIZE`: 200 → 100 (in `src/diskann_api.c`)
+- Rationale: Faster builds with no recall loss (validated by benchmarks)
+- Impact: New indices build ~2% faster (cache masks most of the benefit)
+
+**Proposed but NOT applied yet:**
+- `DEFAULT_MAX_NEIGHBORS`: 32 → 24 (would reduce index size by 30%)
+- Waiting for experiment-003 to validate recall impact
+
+### 6. **Documentation of Experiment 001**
+
+Captured full details of cache optimization work in structured format:
+
+**Hypothesis:** Cache would provide 5x speedup (707s → 140s)
+
+**Actual:** Cache provided 1.6x speedup (707s → 442s)
+
+**Why the gap?**
+1. Cache hit rate likely <60% (not measured, needs instrumentation)
+2. SQLite transaction overhead (not just BLOB I/O)
+3. Edge pruning cost may dominate after cache
+4. Baseline 707s may have been measured incorrectly
+
+**Lessons Learned:**
+- Always measure baseline carefully
+- Instrument production code (should have added cache hit rate logging)
+- Test in isolation (should have measured cache and hash set separately)
+- Profile before optimizing (should have used perf/gprof to find bottleneck)
+- Lower expectations (5x speedup predictions rarely materialize)
+
+## Tribal Knowledge Added
+
+### Parameter Mutability Deep Dive
+
+**All parameters are stored in metadata table** but have different mutability:
+- **Immutable:** dimensions, metric, max_neighbors, block_size (require full rebuild)
+- **Semi-mutable:** insert_list_size, pruning_alpha (require graph rebuild)
+- **Runtime mutable:** search_list_size (override per-query via SQL constraint)
+
+**Block size calculation:**
+```c
+node_overhead = 16 + (dimensions × 4)
+edge_overhead = (dimensions × 4) + 16
+margin = max_neighbors + (max_neighbors / 10)
+block_size = node_overhead + (margin × edge_overhead)
+```
+
+For 256D @ 32 max_neighbors: 40KB blocks
+For 256D @ 24 max_neighbors: 28KB blocks (30% smaller!)
+
+### Experiment Documentation Pattern
+
+**Before running expensive benchmark:**
+1. Copy `experiments/template.md`
+2. Fill in hypothesis, expected results, setup
+3. Run benchmark, save output
+4. Document actual results and analysis
+5. Update `experiments/README.md` index
+6. Explain WHY results differed from expectations
+
+**Critical:** Document failures, not just successes. Failed experiments prevent future engineers from repeating mistakes.
+
+### Cache Effectiveness Analysis
+
+**Cache provides 37% speedup** but only **2% additional benefit from insert_list_size reduction**.
+
+This means:
+- Cache hit rate is high enough to mask I/O reduction
+- Further optimization should focus on non-I/O bottlenecks (transaction overhead, edge pruning)
+- OR test at larger scale where cache capacity (100 entries) becomes limiting factor
+
+### Index Size is the Bottleneck
+
+**At 25k vectors:**
+- DiskANN: 988MB (38x overhead)
+- sqlite-vec: 25.6MB (raw data)
+
+**Impact:**
+- Slower builds (more bytes to write)
+- Larger disk footprint (storage cost)
+- Potentially slower queries (more cache pressure)
+
+**Solution:** Reduce max_neighbors (32 → 24) for 30% smaller index
+
+## Next Steps
+
+**Immediate Priority:**
+
+1. **Run Experiment 003: max_neighbors sweep** (~25 min)
+   ```bash
+   cd benchmarks
+   npm run bench -- --profile=profiles/param-sweep-max-neighbors.json > \
+     ../experiments/experiment-003-output.txt
+   ```
+   - Test max_neighbors = [24, 32, 48, 64]
+   - Expected: 24 gives 30% smaller index with <2% recall loss
+   - Document in `experiments/experiment-003-max-neighbors.md`
+
+2. **Run Experiment 004: scaling test** (~90 min)
+   ```bash
+   npm run bench -- --profile=profiles/scaling-test.json > \
+     ../experiments/experiment-004-output.txt
+   ```
+   - Test [10k, 25k, 50k, 100k, 200k] vectors
+   - Find crossover point where DiskANN beats brute-force
+   - Extrapolate to 500k+ for large-scale recommendations
+   - Document in `experiments/experiment-004-scaling.md`
+
+3. **Update defaults based on results**
+   - If max_neighbors=24 works: Update DEFAULT_MAX_NEIGHBORS in src/diskann_api.c
+   - Update PARAMETERS.md with measured results
+   - Update README.md with dataset size recommendations
+
+4. **Add cache instrumentation** (optional, for future debugging)
+   - Add hit/miss counters to diskann_insert.c
+   - Log cache stats after build
+   - Validates cache effectiveness assumptions
+
+**Expected Timeline:**
+- max_neighbors sweep: 25 min
+- Scaling test: 90 min
+- Analysis + documentation: 30 min
+- Update defaults + docs: 15 min
+- **Total: ~2.5 hours to complete optimization**
+
+**Success Criteria (from TPP):**
+- Build time: <150s for 25k vectors (currently 432s - NOT MET)
+- Recall: ≥93% @ k=10 (currently 99.2% - EXCEEDED)
+- Index size: Reasonable for production (currently 988MB/25k = 39MB/1k - BORDERLINE)
+
+**Blockers:** None
+
+**Risk:** May not hit <150s target without more aggressive optimization (parameter tuning alone may not be enough)
+
+## Artifacts
+
+- **Documentation:** PARAMETERS.md, benchmarks/TUNING-GUIDE.md, experiments/README.md
+- **Benchmark profiles:** param-sweep-*.json, scaling-test.json
+- **Experiment docs:** experiment-001-cache-hash-optimization.md
+- **Benchmark results:** results-2026-02-12T01-49-40-607Z.json (insert_list=200)
+- **Benchmark results:** results-2026-02-12T01-58-12-079Z.json (insert_list=100)
+
+---
+
+**For Next Engineer:**
+
+You have excellent documentation infrastructure now. Before running expensive benchmarks, DOCUMENT YOUR HYPOTHESIS in `experiments/`. The framework is ready - use it.
+
+The low-hanging fruit is **max_neighbors=24** (30% smaller index). Test it first. Then run scaling test to prove DiskANN's value at 100k+.
+
+Good luck! 🚀
+
 **Commands for Next Session:**
 
 ```bash