Problem
Current logging captures Quarto processing output but misses detailed Python execution information:
Missing from logs:
- Processing progress (file counts, validation results)
- Timing metrics (start/end times, duration per phase)
- Performance statistics (items/second, parallel worker utilization)
- Error details and warnings
Why it matters:
- Hard to debug issues in production runs
- Can't track performance improvements over time
- No baseline for optimization work
- Missing operational monitoring data
Proposed Solution
Add Python logging module to both .qmd scripts for structured logging and benchmark tracking.
Key components:
- Python logging module with file and console handlers
- Benchmark tracking CSV:
benchmarks/performance_log.csv
- Columns: date, mode, total_files, valid_files, errors, duration_sec, items_per_sec
- Append after each run for historical comparison
Capture:
- Phase timings (validation, item creation, collection update)
- File counts (total, valid, skipped, errors)
- Performance metrics (throughput, worker efficiency)
- Memory/resource usage (optional)
Implementation Plan
Related
- Phase 4: Benchmarking & Monitoring (task_plan.md)
- NewGraphEnvironment/sred-2025-2026#8 (performance tracking)
Priority
Medium - Not blocking current work, but important for production monitoring and optimization evidence. Best tackled during Phase 3-4 (VM automation setup).
Problem
Current logging captures Quarto processing output but misses detailed Python execution information:
Missing from logs:
Why it matters:
Proposed Solution
Add Python logging module to both .qmd scripts for structured logging and benchmark tracking.
Key components:
benchmarks/performance_log.csvCapture:
Implementation Plan
stac_create_collection.qmdstac_create_item.qmdbenchmarks/directory and CSV schemaRelated
Priority
Medium - Not blocking current work, but important for production monitoring and optimization evidence. Best tackled during Phase 3-4 (VM automation setup).