Skip to content

fix: Bus event first ingestion skips historical events due to high-water mark #108

Description

@evansenter

Problem

The ingest_bus_events() function uses MAX(event_id) from the bus_events table as a high-water mark. On the first run, it applies a timestamp-based cutoff (days=7 default), ingests only recent events, then sets the high-water mark to the latest event ID. All subsequent runs — even with days=365 — only look for events with id > high_water_mark, permanently skipping older events.

This was discovered immediately after merging #107: startup ingestion with days=7 ingested 109 of 2,696 events, and a follow-up days=365 call found nothing new.

Workaround

Clear bus_events and re-ingest:

storage.execute_write('DELETE FROM bus_events')
ingest_bus_events(storage, days=365)

Fix options

  1. Use a larger startup default — Change startup ingestion to days=365 so first run captures full history
  2. Separate first-run logic — When bus_events is empty, ignore the days parameter and ingest everything
  3. Always use timestamp cutoff — Don't use high-water mark at all; rely on INSERT OR IGNORE for dedup (slower but correct)

Option 2 seems cleanest: if the table is empty, ingest everything regardless of days.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions