Skip to content

Add structured logging and performance benchmarking #6

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

Current logging captures Quarto processing output but misses detailed Python execution information:

Missing from logs:

  • Processing progress (file counts, validation results)
  • Timing metrics (start/end times, duration per phase)
  • Performance statistics (items/second, parallel worker utilization)
  • Error details and warnings

Why it matters:

  • Hard to debug issues in production runs
  • Can't track performance improvements over time
  • No baseline for optimization work
  • Missing operational monitoring data

Proposed Solution

Add Python logging module to both .qmd scripts for structured logging and benchmark tracking.

Key components:

  • Python logging module with file and console handlers
  • Benchmark tracking CSV: benchmarks/performance_log.csv
  • Columns: date, mode, total_files, valid_files, errors, duration_sec, items_per_sec
  • Append after each run for historical comparison

Capture:

  • Phase timings (validation, item creation, collection update)
  • File counts (total, valid, skipped, errors)
  • Performance metrics (throughput, worker efficiency)
  • Memory/resource usage (optional)

Implementation Plan

  • Add logging module to stac_create_collection.qmd
  • Add logging module to stac_create_item.qmd
  • Create benchmarks/ directory and CSV schema
  • Log key events: start, phase transitions, completion
  • Append performance summary to benchmarks CSV
  • Update CLAUDE.md with logging patterns
  • Test with Phase 1 (10 items) to verify log output

Related

  • Phase 4: Benchmarking & Monitoring (task_plan.md)
  • NewGraphEnvironment/sred-2025-2026#8 (performance tracking)

Priority

Medium - Not blocking current work, but important for production monitoring and optimization evidence. Best tackled during Phase 3-4 (VM automation setup).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions