Skip to content

Latest commit

 

History

History
482 lines (386 loc) · 18.5 KB

File metadata and controls

482 lines (386 loc) · 18.5 KB
layout doc

Dagu

Workflow Orchestration Engine

Single binary. No external dependencies. Scales from standalone to distributed cluster over gRPC.

Cockpit demo

::: tip Try It Live Explore without installing: Live Demo

Credentials: demouser / demouser :::

What Dagu Does

Dagu is a workflow orchestration engine that runs as a single binary with no external databases or message brokers. Workflows are defined as DAGs (Directed Acyclic Graphs) in YAML. It supports local execution, cron scheduling, queue-based concurrency control, and distributed coordinator/worker execution across multiple machines over gRPC.

All state is stored in local files by default. There is nothing to install besides the binary itself.

Real-World Use Cases

Dagu is useful when scripts, containers, server jobs, or data tasks need visible dependencies, schedules, logs, retries, and a simple way to operate them.

Cron and Legacy Script Management

Run: existing shell scripts, Python scripts, HTTP calls, and scheduled jobs without rewriting them.

Why Dagu fits: dependencies, run status, logs, retries, and history become visible in the Web UI instead of being hidden across crontabs and server log files.

ETL and Data Operations

Run: PostgreSQL or SQLite queries, S3 transfers, jq transforms, validation steps, and reusable sub-workflows.

Why Dagu fits: daily data workflows stay declarative, observable, and easy to retry when one step fails.

Media Conversion

Run: ffmpeg, thumbnail extraction, audio normalization, image processing, and other compute-heavy jobs.

Why Dagu fits: conversion work can run across distributed workers while status, history, logs, and artifacts stay in one persistence layer for monitoring, debugging, and retries.

Infrastructure and Server Automation

Run: SSH backups, cleanup jobs, deploy scripts, patch windows, precondition checks, and lifecycle hooks.

Why Dagu fits: remote operations get schedules, retries, notifications, and per-step logs without requiring operators to SSH into servers for every recovery.

Container and Kubernetes Workflows

Run: Docker images, Kubernetes Jobs, shell glue, and follow-up validation steps.

Why Dagu fits: teams can compose image-based tasks and route them to the right workers without building a custom control plane.

Customer Support Automation

Run: diagnostics, account repair jobs, data checks, and approval-gated support actions.

Why Dagu fits: non-engineers can run reviewed workflows from the Web UI while engineers keep commands, logs, and results traceable.

IoT and Edge Workflows

Run: sensor polling, local cleanup, offline sync, health checks, and device maintenance jobs.

Why Dagu fits: the single binary and file-backed state work well on small devices while still providing visibility through the Web UI.

AI Agent Automation

Run: agent-authored YAML workflows, log analysis, repair steps, and human-reviewed automation.

Why Dagu fits: workflows are plain YAML files, so agents can create and debug them while humans review the definition and run history.

Architecture

Dagu runs in three configurations:

Standalone. A single dagu start-all process runs the HTTP server, scheduler, and executor. Suitable for single-machine deployments.

Coordinator/Worker. The scheduler enqueues jobs to a file-based queue, then dispatches them to a coordinator over gRPC. Workers long-poll the coordinator for tasks, execute DAGs locally, and report status back. Workers can run on separate machines and are routed tasks based on labels. Mutual TLS secures gRPC communication between coordinator and workers.

Headless. Run without the web UI (DAGU_HEADLESS=true). Useful for CI/CD environments or when Dagu is managed through the CLI or API only.

Standalone:

  ┌─────────────────────────────────────────┐
  │  dagu start-all                         │
  │  ┌───────────┐ ┌───────────┐ ┌────────┐│
  │  │ HTTP / UI │ │ Scheduler │ │Executor││
  │  └───────────┘ └───────────┘ └────────┘│
  │  File-based storage (logs, state, queue)│
  └─────────────────────────────────────────┘

Distributed:

  ┌────────────┐                   ┌────────────┐
  │ Scheduler  │                   │ HTTP / UI  │
  │            │                   │            │
  │ ┌────────┐ │                   └─────┬──────┘
  │ │ Queue  │ │  Dispatch (gRPC)        │
  │ │(file)  │ │─────────┐               │
  │ └────────┘ │         │               │
  └────────────┘         ▼               ▼
                    ┌─────────────────────────┐
                    │      Coordinator        │
                    │  (gRPC task dispatch,   │
                    │   worker registry,      │
                    │   health monitoring)    │
                    └────────┬────────────────┘
                             │
                   Poll (gRPC long-polling)
                             │
               ┌─────────────┼─────────────┐
               │             │             │
          ┌────▼───┐    ┌────▼───┐    ┌────▼───┐
          │Worker 1│    │Worker 2│    │Worker N│
          └────┬───┘    └────┬───┘    └────┬───┘
               │             │             │
               └─────────────┴─────────────┘
                 Heartbeat / ReportStatus /
                 StreamLogs (gRPC)

Quick Start

Install

::: code-group

curl -fsSL https://raw.githubusercontent.com/dagucloud/dagu/main/scripts/installer.sh | bash
irm https://raw.githubusercontent.com/dagucloud/dagu/main/scripts/installer.ps1 | iex
docker run --rm -v ~/.dagu:/var/lib/dagu -p 8080:8080 ghcr.io/dagucloud/dagu:latest dagu start-all
brew install dagu
helm repo add dagu https://dagucloud.github.io/dagu
helm repo update
helm install dagu dagu/dagu --set persistence.storageClass=<your-rwx-storage-class>

:::

The script installers run a guided wizard that installs Dagu, adds it to your PATH, sets up a background service, and creates the initial admin account. Homebrew, Docker, and Helm install without the wizard. See the Installation Guide for all options.

Create and Run a Workflow

cat > hello.yaml << 'EOF'
steps:
  - command: echo "Hello from Dagu!"
  - command: echo "Step 2"
EOF

dagu start hello.yaml

Start the Server

dagu start-all

Visit http://localhost:8080

Built-in Step Types

Common built-in step types include:

Step type Purpose
command, shell Local shell commands and scripts
docker, container Run in a Docker container or exec into an existing container
kubernetes, k8s Run a step as a Kubernetes workload
harness Run CLI-based coding agents and custom harness adapters
ssh Remote command execution
sftp Remote file transfer
http HTTP requests
postgres, sqlite SQL queries
redis Redis commands and scripts
s3 S3 object operations
jq JSON transformation
mail Email delivery
archive Archive create/extract
dag Sub-DAG execution
router Route execution to downstream steps by value
template Template rendering
chat LLM chat completion
agent Tool-using agent step

DAGs can also declare reusable step_types that expand to builtin step types at load time. See Custom Step Types and Step Types for the exact configuration surface.

Scheduling and Reliability

Feature Details
Cron scheduling Timezone support, multiple schedule entries per DAG
Overlap policies skip (default), all (queue all), latest (keep only the most recent)
Catch-up scheduling Automatically runs missed intervals when the scheduler was down
Zombie detection Identifies and handles stalled DAG runs (configurable interval, default 45s)
Retry policies Per-step retry with configurable limits, intervals, exit code filtering, exponential/linear/constant backoff
Lifecycle hooks onInit, onSuccess, onFailure, onAbort, onExit, onWait
Preconditions Gate DAG or step execution on shell command results
Queue system File-based persistent queue with configurable concurrency limits per queue
Scheduler HA Lock with stale detection for failover across multiple scheduler instances

Security and Access Control

Authentication

Four authentication modes, configured via DAGU_AUTH_MODE:

Mode Description
none No authentication
basic HTTP Basic authentication
builtin JWT-based authentication with user management, API keys, and per-DAG webhook tokens
OIDC OpenID Connect integration with any compliant identity provider

Role-Based Access Control

When using builtin auth, five roles control access:

Role Capabilities
admin Full access including user management
manager Create, edit, delete, run, stop DAGs; view audit logs
developer Create, edit, delete, run, stop DAGs
operator Run and stop DAGs only (no editing)
viewer Read-only access

API keys can be created with independent role assignments. Audit logging tracks all actions.

TLS and Secrets

  • TLS for the HTTP server (DAGU_CERT_FILE, DAGU_KEY_FILE)
  • Mutual TLS for gRPC coordinator/worker communication (DAGU_PEER_CERT_FILE, DAGU_PEER_KEY_FILE, DAGU_PEER_CLIENT_CA_FILE)
  • Secret management with three providers: environment variables, files, and HashiCorp Vault

Observability

Prometheus Metrics

Dagu exposes Prometheus-compatible metrics at the /metrics endpoint:

Metric Description
dagu_dag_runs_total Total DAG runs by status
dagu_dag_runs_total_by_dag Per-DAG run counts
dagu_dag_run_duration_seconds Histogram of run durations
dagu_dag_runs_currently_running Active DAG runs
dagu_dag_runs_queued_total Queued runs
dagu_queue_wait_time Queue wait time histogram
dagu_uptime_seconds Server uptime

OpenTelemetry

Per-DAG OpenTelemetry tracing configuration with OTLP endpoint, custom headers, resource attributes, and TLS options.

Structured Logging and Notifications

  • JSON or text format logging (DAGU_LOG_FORMAT), per-run log files with separate stdout/stderr capture per step
  • Slack and Telegram bot integration for run status events (succeeded, failed, aborted, waiting, rejected)
  • Email notifications on DAG success, failure, or wait status via SMTP
  • Per-DAG webhook endpoints with token authentication

Distributed Execution

The coordinator/worker architecture distributes DAG execution across multiple machines:

  • Coordinator: gRPC server managing task distribution, worker registry, and health monitoring
  • Workers: Connect to the coordinator, pull tasks via long-polling, execute DAGs locally, stream logs back
  • Worker labels: Route DAGs to specific workers based on labels (e.g., gpu=true, region=us-east-1)
  • Health checks: HTTP health endpoints on coordinator and workers for load balancer integration
  • Queue system: File-based persistent queue with configurable concurrency limits
# Start coordinator
dagu coord

# Start workers (on separate machines)
DAGU_WORKER_LABELS=gpu=true,memory=64G dagu worker

See the Distributed Execution documentation for setup details.

Workflow Examples

Parallel Execution with Dependencies

type: graph
steps:
  - id: extract
    command: ./extract.sh

  - id: transform_a
    command: ./transform_a.sh
    depends: [extract]

  - id: transform_b
    command: ./transform_b.sh
    depends: [extract]

  - id: load
    command: ./load.sh
    depends: [transform_a, transform_b]

Docker Step

steps:
  - name: build
    container:
      image: node:20-alpine
    command: npm run build

Retry with Exponential Backoff

steps:
  - name: flaky-api-call
    command: curl -f https://api.example.com/data
    retry_policy:
      limit: 3
      interval_sec: 10
      backoff: 2
      max_interval_sec: 120
    continue_on:
      failure: true

Scheduling with Overlap Control

schedule:
  - "0 */6 * * *"
overlap_policy: skip
timeout_sec: 3600
handler_on:
  failure:
    command: notify-team.sh
  exit:
    command: cleanup.sh

Sub-DAG Composition

steps:
  - name: extract
    call: etl/extract
    params: "SOURCE=s3://bucket/data.csv"

  - name: transform
    call: etl/transform
    params: "INPUT=${extract.outputs.result}"
    depends: [extract]

  - name: load
    call: etl/load
    params: "DATA=${transform.outputs.result}"
    depends: [transform]

SSH Remote Execution

steps:
  - name: deploy
    type: ssh
    config:
      host: prod-server.example.com
      user: deploy
      key: ~/.ssh/id_rsa
    command: cd /var/www && git pull && systemctl restart app

See Examples for more patterns.

Version-Controlled Workflows

Dagu supports Git sync to keep DAG definitions, agent markdown files, and managed documents version-controlled. Enable DAGU_GITSYNC_ENABLED=true with a repository URL, and Dagu pulls tracked files from a Git branch. Optional auto-sync polls the repository at a configurable interval (default 300s). Supports token and SSH authentication.

See Git Sync for configuration.

CLI Reference

Command Description
dagu start <dag> Execute a DAG
dagu start-all Start HTTP server + scheduler
dagu server Start HTTP server only
dagu scheduler Start scheduler only
dagu coord Start coordinator (distributed mode)
dagu worker Start worker (distributed mode)
dagu stop <dag> Stop a running DAG
dagu restart <dag> Restart a DAG
dagu retry <dag> <run-id> Retry a failed run
dagu dry <dag> Dry run (show what would execute)
dagu status <dag> Show DAG run status
dagu history <dag> Show execution history
dagu validate <dag> Validate DAG YAML
dagu enqueue <dag> Add DAG to the execution queue
dagu dequeue <dag> Remove DAG from the queue
dagu cleanup Clean up old run data
dagu migrate Run database migrations

Full CLI and environment variable reference: CLI | Configuration Reference

Learn More

Architecture and core concepts

Installation and first workflow

YAML syntax, scheduling, execution control

All configuration options

All 18 executor types

Coordinator/worker setup

RBAC, OIDC, API keys, audit logging

Deployment, configuration, operations

Community