Skip to content

DM-54879: Add --ignore-existing-metadata-for for when upstream outputs are selectively retained#561

Open
hsinfang wants to merge 4 commits into
mainfrom
tickets/DM-54879
Open

DM-54879: Add --ignore-existing-metadata-for for when upstream outputs are selectively retained#561
hsinfang wants to merge 4 commits into
mainfrom
tickets/DM-54879

Conversation

@hsinfang
Copy link
Copy Markdown
Contributor

@hsinfang hsinfang commented May 12, 2026

Checklist

  • ran Jenkins
  • ran and inspected package-docs build
  • added a release note for user-visible changes to doc/changes

@hsinfang hsinfang force-pushed the tickets/DM-54879 branch from d4f69bd to 4bf23e3 Compare May 12, 2026 22:31
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.80%. Comparing base (6cec3c9) to head (52abd2a).
⚠️ Report is 6 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #561      +/-   ##
==========================================
+ Coverage   88.77%   88.80%   +0.02%     
==========================================
  Files         159      160       +1     
  Lines       22053    22217     +164     
  Branches     2623     2631       +8     
==========================================
+ Hits        19578    19729     +151     
- Misses       1839     1849      +10     
- Partials      636      639       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hsinfang hsinfang force-pushed the tickets/DM-54879 branch from 4bf23e3 to d6ef7aa Compare May 12, 2026 22:45
return zstandard.ZstdCompressionDict(b"")
self.comms.log.info("Training compression dictionary.")
training_inputs: list[bytes] = []
training_inputs: list[bytes | bytearray | memoryview[int]] = []
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious where this is coming from; AFAIK we don't use bytearray or memoryview[int] for any of these.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took it from mypy's suggestion directly, but this has now been fixed by ac0bcb1

Comment thread tests/test_graphBuilder.py Outdated
Comment thread python/lsst/pipe/base/quantum_graph_builder.py Outdated
Comment thread python/lsst/pipe/base/quantum_graph_builder.py Outdated
self.butler = butler.clone(collections=input_collections)
self.output_run = output_run
self.skip_existing_in = skip_existing_in
self.ignore_metadata_for: frozenset[str] = frozenset(ignore_metadata_for)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to raise here if ignore_metadata_for is not empty but skip_existing_in empty (since then ignore_metadata_for would have no effect).

@hsinfang hsinfang force-pushed the tickets/DM-54879 branch 2 times, most recently from 7481f5b to cf8760b Compare June 1, 2026 22:37
The skip_existing_in behavior of QuantumGraphBuilder was previously
only covered through test_separable_pipeline_executor.py, where
SeparablePipelineExecutor drives AllDimensionsQuantumGraphBuilder.
No tests exercised the builder directly at the unit level.
@hsinfang hsinfang force-pushed the tickets/DM-54879 branch from cf8760b to 3e4cff4 Compare June 3, 2026 17:39
hsinfang added 3 commits June 3, 2026 11:13
Daytime AP runs against data produced by Prompt Processing, which does not
retain all output types.  With --skip-existing-in, tasks whose metadata exists
are skipped even when their outputs are absent, confusing downstream tasks.

For tasks listed in the new retry_missing_outputs_for, the completion signal
is changed from task metadata to all non-metadata, non-log outputs existing
in skip_existing_in. If any are absent the quantum runs and regenerates them.

When all outputs are present the quantum is skipped, and
discard_output_in_the_way removes any copies already in output_run, so
the behavior is identical to the standard skip path in that case.
When outputs are absent, the quantum is not skipped and they cannot be
"in the way" in a fresh output_run.

_skip_quantum_if_metadata_exists is renamed to _skip_quantum_if_done,
because the old name baked in the implementation detail (checking metadata).
After retry_missing_outputs_for the function can also skip based on
regular outputs, so the name no longer described the behavior.
SeparablePipelineExecutor is not used by pipetask, but we might
as well extend the same option and get tested there.
@hsinfang hsinfang force-pushed the tickets/DM-54879 branch from 3e4cff4 to 52abd2a Compare June 3, 2026 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants