feat: add SQLite persistent storage for benchmark results (#26)#34
feat: add SQLite persistent storage for benchmark results (#26)#34Sugaria0427 wants to merge 1 commit into
Conversation
Replaces flat JSON/CSV log storage with a queryable SQLite database while maintaining full backward compatibility. New files: - utils/storage.py — Storage class wrapping Python stdlib sqlite3. Schema: runs (run_id, timestamp, config_json) + results (run_id FK, backend, turn, metric, value). API: save_run, get_run, list_runs, compare_runs. - utils/migrate_legacy_logs.py — one-shot idempotent migration script that imports existing experiment_logs/*.json into SQLite. Modified files: - evaluation/logger.py — log_run() now calls Storage().save_run() alongside existing JSON+CSV writes. list_runs() queries SQLite first, falls back to filesystem scan. Fixes pre-existing has_llm_eval bug in _append_csv_summary. - tests/test_pipeline.py — 6 new tests (4 Storage CRUD + 2 logger integration). - CHANGELOG.md — documented the new feature. - .gitignore — added experiment_logs/memorylens.db. Closes Neal006#26
Code Review — PR #34 (SQLite persistent storage)Thanks for the contribution! The overall design is clean and the backward-compatibility approach (JSON/CSV unchanged, SQLite as opt-in) is exactly right. I found 3 bugs that will cause crashes in production and 2 lower-priority issues that need addressing before merge. 🔴 Bug 1 — Duplicate
|
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | 🔴 High | utils/storage.py |
Duplicate results rows corrupt get_run() on repeated run_id |
| 2 | 🔴 High | evaluation/logger.py |
IndexError when rows is empty in _append_csv_summary |
| 3 | 🔴 High | evaluation/logger.py |
Storage() connections never closed → file handle leak |
| 4 | 🟡 Medium | utils/storage.py |
None values in metric arrays break arithmetic downstream |
| 5 | 🟡 Low | tests/test_pipeline.py |
Integration tests leak connections and use raw SQL for cleanup |
Once the three 🔴 issues are fixed I'm happy to approve. The design and test coverage are solid — these are all mechanical fixes.
Summary
Replace flat JSON/CSV log storage with a queryable SQLite database for benchmark results, while maintaining full backward compatibility.
Related issue
Closes #26
Type of change
How was this tested?
Checklist