Redesign replay logging for audit/debug/replay and parallel runs

A redesign of the replay log is needed so it is actually useful for audit, debugging, and re-running, and so it remains safe when multiple harmonization runs happen in parallel.

**Current behavior and gaps:**

The replay logger is a global logger named “ReplayLogger” that clears handlers every time it’s configured, so parallel runs can interfere. The event format is minimal—only `{ action, dataset }`—with no run metadata, schema version, timestamps, or before/after values. Logging occurs once per rule per dataset (before transform), which is insufficient for step-by-step replay or debugging. The replay helper in `utils/transformations.py` replays rules without preserving ordering across datasets, and the log does not record input/output paths or other audit-relevant metadata.

**Why this matters:**

A replay log should be self-contained, robust, and parallel-safe, and it should support auditing, debugging, and re-running. The current implementation does not.

**Desired behavior:**

- Replay log is self-contained and robust for audit/debug/re-run.
- Safe for multiple parallel runs (no shared global logger, no shared file).
- Includes input/output file paths and rules file path in metadata.
- Can optionally include before/after values (toggleable).
- Uses row identifiers (preferred) rather than row index.
- Schema version naming format: `vN.N.N`.
- Log is sufficient on its own for replay, though the rules file may exist.

**Potential direction (not final):**

- Introduce a per-run `ReplayLogger` class (no shared global handlers).
- One log file per run; fail if it already exists unless overwrite is allowed.
- JSONL schema with `run_start`, `operation`, and `run_end` events.
- `operation` events include dataset, rule serialization, row identifier, and optional before/after values.
- Update replay tool to use this schema and validate version.

**Acceptance criteria:**

- Parallel runs do not interfere.
- Log schema is versioned and documented.
- Tests cover overwrite behavior, optional before/after values, and replay parsing.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign replay logging for audit/debug/replay and parallel runs #75

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Redesign replay logging for audit/debug/replay and parallel runs #75

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions