fix: batch citations resolve against the universal ledger by KadenMc · Pull Request #25 · KadenMc/agentic-experiments

KadenMc · 2026-05-26T18:31:22Z

Summary

In 0.6.0 the job-id citation check was correctly taught to consult both the local signac project AND .aexp/ledger/<id>.json, but the batch-citation check still went through aexp.linking.list_batches(), which walks the local signac project only. The result: a finding citing {type: batch, experiment_id: E005, selector: {condition: X}} spuriously emitted finding.empty_batch whenever the matching terminal runs lived only in the ledger (e.g., backfilled from another machine).

The fix is purely additive: build an (experiment_id, condition) -> [job_ids] lookup from ledger entries and count those matches in the batch-resolution branch before falling through to the elsewhere/empty paths.

Surfaced by

The electricrag laptop validate during the 0.6 migration. After the cluster backfilled 20 ledger entries (including the 3 E005 batch members F005 cites), the laptop validator still emitted 3 finding.empty_batch errors — exactly because none of the cited E005 jobs had local workspaces; they existed only in the just-pulled ledger.

No behavioral change for existing cases

Local-only batch matches still resolve via list_batches.
Elsewhere (per-machine index) matches still emit finding.absent_batch_runs warnings.
Empty/broken batches with no matches anywhere still emit finding.empty_batch errors (or warnings under --strict-runs=warn).

Only new: ledger-only batch matches now correctly count as "here."

Regression test

tests/test_validate_cross_machine.py::test_validator_batch_citation_resolves_via_ledger_only writes a ledger entry with no corresponding local workspace and asserts a batch citation pointing at that entry's (experiment_id, condition) resolves cleanly.

The test's registered_machine field is documented inline as informational only — the validator does not consult it. This makes the Phase 2 design choice explicit: the ledger is universal; machine identity is provenance, not a query filter. (If we ever want machine-aware filtering for batch citations, it would be a selector extension — {type: batch, ..., selector: {condition: X, registered_machine: cluster}} — handled in _selector_matches. Separate feature, separate discussion.)

Version

0.6.0 → 0.6.1 (patch). PyPI history goes 0.5.0 → 0.6.1 since 0.6.0 was tagged in source but never published.

Test plan

poetry run pytest tests/test_validate_cross_machine.py tests/test_validate.py tests/test_ledger.py — 69 passed locally
poetry run ruff check src/ tests/ — clean
After merge: on the consumer (electricrag laptop), aexp validate drops from 4 errors → 1 error (just the remaining d84ce97... never-ran job citation, which is a data issue, not a validator bug).

When 0.6.0 shipped the cross-machine ledger, the job-id citation check was correctly taught to consult both the local signac project AND `.aexp/ledger/<id>.json` (via `known_job_ids = ledger_ids | local_ids`). But the batch-citation check still went through `aexp.linking.list_batches()`, which walks the local signac project only. The result: a finding citing a batch selector like `{type: batch, experiment_id: E005, selector: {condition: X}}` would spuriously emit `finding.empty_batch` whenever the matching terminal runs lived only in the ledger (e.g., backfilled from another machine). This was the exact symptom surfaced when migrating electricrag from aexp 0.2.1 -> 0.6.0: after the cluster backfilled 20 ledger entries including 3 E005 batch members, the laptop validator continued to emit `finding.empty_batch` for all 3 F005 batch citations because none of the cited E005 jobs had local workspaces — they existed only in the just-pulled ledger. The fix is purely additive in `_check_finding_citations`: 1. Build a `(experiment_id, condition) -> [job_ids]` lookup from ledger entries (one pass through `list_ledger_job_ids` + `load_ledger_entry`). 2. In the batch resolution branch, count ledger matches before falling through to the elsewhere/empty paths. No behavioral change for any existing case: - Local-only batch matches still resolve as before via list_batches. - Elsewhere (per-machine index) matches still emit `finding.absent_batch_runs` warnings. - Empty/broken batches with no matches anywhere still emit `finding.empty_batch` errors (or warnings under --strict-runs=warn). Only new: ledger-only batch matches now correctly count as "here." The new regression test `test_validator_batch_citation_resolves_via_ledger_only` writes a ledger entry with NO corresponding local workspace and asserts a batch citation pointing at that entry's (experiment_id, condition) resolves cleanly. The test's `registered_machine` field is informational only (commented inline) — the validator does not consult it during resolution. This makes the Phase 2 design choice explicit: the ledger is universal; machine identity is provenance, not a query filter. Version: 0.6.0 -> 0.6.1 (patch). PyPI history goes 0.5.0 -> 0.6.1 since 0.6.0 was tagged in source but never published.

KadenMc merged commit c27e2f6 into main May 26, 2026
6 checks passed

KadenMc deleted the fix/validator-batch-ledger-resolution branch May 26, 2026 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: batch citations resolve against the universal ledger#25

fix: batch citations resolve against the universal ledger#25
KadenMc merged 1 commit into
mainfrom
fix/validator-batch-ledger-resolution

KadenMc commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KadenMc commented May 26, 2026

Summary

Surfaced by

No behavioral change for existing cases

Regression test

Version

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant