Skip to content

Support editing a captured DAG run spec before retrying a run #1876

@yottahmd

Description

@yottahmd

Problem

Dagu already captures the exact YAML used for a run, and operators can either retry that stored snapshot or start a fresh run with the same parameters. What is missing is a way to correct the captured run spec itself and then continue from the same partially completed run context.

This creates a gap when a failed run only needs a small, compatible change in the stored spec, such as adjusting a command, env value, or executor setting on an existing step. The current workaround is to edit the YAML and start a new run, which loses partial progress and existing run context. Manual step-status edits are also not enough, because they do not update the captured spec or provide a clean retry path from the edited definition.

This should be a scoped feature. Arbitrary edits are not safe once a run already has persisted step state, so the issue is specifically about allowing compatible edits to the captured run spec before retry, not unrestricted YAML mutation.

Expected Behavior

An operator should be able to open the stored spec for a DAG run, make compatible edits to that captured snapshot, and retry from the edited snapshot while preserving the current run context.

For v1, this should be limited to non-structural edits on existing steps. Changes that would make prior step state ambiguous, such as renaming steps, adding or removing steps, or changing dependencies, should be rejected. The retry should preserve prior progress and context from the existing run, including completed step state, outputs needed by downstream steps, run-scoped work artifacts, and step-level messages or approvals. Previous attempts should remain visible in history.

Example

name: nightly-import
params:
  - date

steps:
  - name: extract
    command: ./extract.sh "${date}"

  - name: transform
    depends: extract
    command: ./transform.sh "${date}" --fix-null-handling # edited before retry

  - name: load
    depends: transform
    command: ./load.sh "${date}"

Motivation

This closes an important operational gap between "retry the exact failed run" and "start over as a brand new run." It helps operators recover long-running workflows where earlier steps have already completed successfully and only a targeted fix is needed before continuing.

It also gives Dagu a clearer recovery model for run snapshots: compatible edits can continue the existing run context, while incompatible edits are rejected explicitly. That makes retry behavior more predictable and unblocks workflows where rerunning completed steps is expensive, slow, or has unwanted side effects.

Tests

  • PR must include integration tests under intg package.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions