fix: batch citations resolve against the universal ledger#25
Merged
Conversation
When 0.6.0 shipped the cross-machine ledger, the job-id citation check
was correctly taught to consult both the local signac project AND
`.aexp/ledger/<id>.json` (via `known_job_ids = ledger_ids | local_ids`).
But the batch-citation check still went through
`aexp.linking.list_batches()`, which walks the local signac project
only. The result: a finding citing a batch selector like
`{type: batch, experiment_id: E005, selector: {condition: X}}` would
spuriously emit `finding.empty_batch` whenever the matching terminal
runs lived only in the ledger (e.g., backfilled from another machine).
This was the exact symptom surfaced when migrating electricrag from
aexp 0.2.1 -> 0.6.0: after the cluster backfilled 20 ledger entries
including 3 E005 batch members, the laptop validator continued to
emit `finding.empty_batch` for all 3 F005 batch citations because
none of the cited E005 jobs had local workspaces — they existed only
in the just-pulled ledger.
The fix is purely additive in `_check_finding_citations`:
1. Build a `(experiment_id, condition) -> [job_ids]` lookup from
ledger entries (one pass through `list_ledger_job_ids` +
`load_ledger_entry`).
2. In the batch resolution branch, count ledger matches before
falling through to the elsewhere/empty paths.
No behavioral change for any existing case:
- Local-only batch matches still resolve as before via list_batches.
- Elsewhere (per-machine index) matches still emit
`finding.absent_batch_runs` warnings.
- Empty/broken batches with no matches anywhere still emit
`finding.empty_batch` errors (or warnings under --strict-runs=warn).
Only new: ledger-only batch matches now correctly count as "here."
The new regression test
`test_validator_batch_citation_resolves_via_ledger_only` writes a
ledger entry with NO corresponding local workspace and asserts a
batch citation pointing at that entry's (experiment_id, condition)
resolves cleanly. The test's `registered_machine` field is
informational only (commented inline) — the validator does not
consult it during resolution. This makes the Phase 2 design choice
explicit: the ledger is universal; machine identity is provenance,
not a query filter.
Version: 0.6.0 -> 0.6.1 (patch). PyPI history goes 0.5.0 -> 0.6.1
since 0.6.0 was tagged in source but never published.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
In 0.6.0 the job-id citation check was correctly taught to consult both the local signac project AND
.aexp/ledger/<id>.json, but the batch-citation check still went throughaexp.linking.list_batches(), which walks the local signac project only. The result: a finding citing{type: batch, experiment_id: E005, selector: {condition: X}}spuriously emittedfinding.empty_batchwhenever the matching terminal runs lived only in the ledger (e.g., backfilled from another machine).The fix is purely additive: build an
(experiment_id, condition) -> [job_ids]lookup from ledger entries and count those matches in the batch-resolution branch before falling through to the elsewhere/empty paths.Surfaced by
The electricrag laptop validate during the 0.6 migration. After the cluster backfilled 20 ledger entries (including the 3 E005 batch members F005 cites), the laptop validator still emitted 3
finding.empty_batcherrors — exactly because none of the cited E005 jobs had local workspaces; they existed only in the just-pulled ledger.No behavioral change for existing cases
list_batches.finding.absent_batch_runswarnings.finding.empty_batcherrors (or warnings under--strict-runs=warn).Only new: ledger-only batch matches now correctly count as "here."
Regression test
tests/test_validate_cross_machine.py::test_validator_batch_citation_resolves_via_ledger_onlywrites a ledger entry with no corresponding local workspace and asserts a batch citation pointing at that entry's(experiment_id, condition)resolves cleanly.The test's
registered_machinefield is documented inline as informational only — the validator does not consult it. This makes the Phase 2 design choice explicit: the ledger is universal; machine identity is provenance, not a query filter. (If we ever want machine-aware filtering for batch citations, it would be a selector extension —{type: batch, ..., selector: {condition: X, registered_machine: cluster}}— handled in_selector_matches. Separate feature, separate discussion.)Version
0.6.0 → 0.6.1 (patch). PyPI history goes 0.5.0 → 0.6.1 since 0.6.0 was tagged in source but never published.
Test plan
poetry run pytest tests/test_validate_cross_machine.py tests/test_validate.py tests/test_ledger.py— 69 passed locallypoetry run ruff check src/ tests/— cleanaexp validatedrops from 4 errors → 1 error (just the remainingd84ce97...never-ran job citation, which is a data issue, not a validator bug).