Problem
Dagu already captures the exact YAML used for a run, and operators can either retry that stored snapshot or start a fresh run with the same parameters. What is missing is a way to correct the captured run spec itself and then continue from the same partially completed run context.
This creates a gap when a failed run only needs a small, compatible change in the stored spec, such as adjusting a command, env value, or executor setting on an existing step. The current workaround is to edit the YAML and start a new run, which loses partial progress and existing run context. Manual step-status edits are also not enough, because they do not update the captured spec or provide a clean retry path from the edited definition.
This should be a scoped feature. Arbitrary edits are not safe once a run already has persisted step state, so the issue is specifically about allowing compatible edits to the captured run spec before retry, not unrestricted YAML mutation.
Expected Behavior
An operator should be able to open the stored spec for a DAG run, make compatible edits to that captured snapshot, and retry from the edited snapshot while preserving the current run context.
For v1, this should be limited to non-structural edits on existing steps. Changes that would make prior step state ambiguous, such as renaming steps, adding or removing steps, or changing dependencies, should be rejected. The retry should preserve prior progress and context from the existing run, including completed step state, outputs needed by downstream steps, run-scoped work artifacts, and step-level messages or approvals. Previous attempts should remain visible in history.
Example
name: nightly-import
params:
- date
steps:
- name: extract
command: ./extract.sh "${date}"
- name: transform
depends: extract
command: ./transform.sh "${date}" --fix-null-handling # edited before retry
- name: load
depends: transform
command: ./load.sh "${date}"
Motivation
This closes an important operational gap between "retry the exact failed run" and "start over as a brand new run." It helps operators recover long-running workflows where earlier steps have already completed successfully and only a targeted fix is needed before continuing.
It also gives Dagu a clearer recovery model for run snapshots: compatible edits can continue the existing run context, while incompatible edits are rejected explicitly. That makes retry behavior more predictable and unblocks workflows where rerunning completed steps is expensive, slow, or has unwanted side effects.
Tests
- PR must include integration tests under intg package.
Problem
Dagu already captures the exact YAML used for a run, and operators can either retry that stored snapshot or start a fresh run with the same parameters. What is missing is a way to correct the captured run spec itself and then continue from the same partially completed run context.
This creates a gap when a failed run only needs a small, compatible change in the stored spec, such as adjusting a command, env value, or executor setting on an existing step. The current workaround is to edit the YAML and start a new run, which loses partial progress and existing run context. Manual step-status edits are also not enough, because they do not update the captured spec or provide a clean retry path from the edited definition.
This should be a scoped feature. Arbitrary edits are not safe once a run already has persisted step state, so the issue is specifically about allowing compatible edits to the captured run spec before retry, not unrestricted YAML mutation.
Expected Behavior
An operator should be able to open the stored spec for a DAG run, make compatible edits to that captured snapshot, and retry from the edited snapshot while preserving the current run context.
For v1, this should be limited to non-structural edits on existing steps. Changes that would make prior step state ambiguous, such as renaming steps, adding or removing steps, or changing dependencies, should be rejected. The retry should preserve prior progress and context from the existing run, including completed step state, outputs needed by downstream steps, run-scoped work artifacts, and step-level messages or approvals. Previous attempts should remain visible in history.
Example
Motivation
This closes an important operational gap between "retry the exact failed run" and "start over as a brand new run." It helps operators recover long-running workflows where earlier steps have already completed successfully and only a targeted fix is needed before continuing.
It also gives Dagu a clearer recovery model for run snapshots: compatible edits can continue the existing run context, while incompatible edits are rejected explicitly. That makes retry behavior more predictable and unblocks workflows where rerunning completed steps is expensive, slow, or has unwanted side effects.
Tests