docs/index.md at main · dagucloud/docs

layout	doc

Workflow Orchestration Engine

Single binary. No external dependencies. Scales from standalone to distributed cluster over gRPC.

Get Started View Examples

::: tip Try It Live Explore without installing: Live Demo

Credentials: demouser / demouser :::

What Dagu Does

Dagu is a workflow orchestration engine that runs as a single binary with no external databases or message brokers. Workflows are defined as DAGs (Directed Acyclic Graphs) in YAML. It supports local execution, cron scheduling, queue-based concurrency control, and distributed coordinator/worker execution across multiple machines over gRPC.

All state is stored in local files by default. There is nothing to install besides the binary itself.

Real-World Use Cases

Dagu is useful when scripts, containers, server jobs, or data tasks need visible dependencies, schedules, logs, retries, and a simple way to operate them.

Cron and Legacy Script Management

Run: existing shell scripts, Python scripts, HTTP calls, and scheduled jobs without rewriting them.

Why Dagu fits: dependencies, run status, logs, retries, and history become visible in the Web UI instead of being hidden across crontabs and server log files.

ETL and Data Operations

Run: PostgreSQL or SQLite queries, S3 transfers, jq transforms, validation steps, and reusable sub-workflows.

Why Dagu fits: daily data workflows stay declarative, observable, and easy to retry when one step fails.

Media Conversion

Run: ffmpeg, thumbnail extraction, audio normalization, image processing, and other compute-heavy jobs.

Why Dagu fits: conversion work can run across distributed workers while status, history, logs, and artifacts stay in one persistence layer for monitoring, debugging, and retries.

Infrastructure and Server Automation

Run: SSH backups, cleanup jobs, deploy scripts, patch windows, precondition checks, and lifecycle hooks.

Why Dagu fits: remote operations get schedules, retries, notifications, and per-step logs without requiring operators to SSH into servers for every recovery.

Container and Kubernetes Workflows

Run: Docker images, Kubernetes Jobs, shell glue, and follow-up validation steps.

Why Dagu fits: teams can compose image-based tasks and route them to the right workers without building a custom control plane.

Customer Support Automation

Run: diagnostics, account repair jobs, data checks, and approval-gated support actions.

Why Dagu fits: non-engineers can run reviewed workflows from the Web UI while engineers keep commands, logs, and results traceable.

IoT and Edge Workflows

Run: sensor polling, local cleanup, offline sync, health checks, and device maintenance jobs.

Why Dagu fits: the single binary and file-backed state work well on small devices while still providing visibility through the Web UI.

AI Agent Automation

Run: agent-authored YAML workflows, log analysis, repair steps, and human-reviewed automation.

Why Dagu fits: workflows are plain YAML files, so agents can create and debug them while humans review the definition and run history.

Architecture

Dagu runs in three configurations:

Standalone. A single dagu start-all process runs the HTTP server, scheduler, and executor. Suitable for single-machine deployments.

Coordinator/Worker. The scheduler enqueues jobs to a file-based queue, then dispatches them to a coordinator over gRPC. Workers long-poll the coordinator for tasks, execute DAGs locally, and report status back. Workers can run on separate machines and are routed tasks based on labels. Mutual TLS secures gRPC communication between coordinator and workers.

Headless. Run without the web UI (DAGU_HEADLESS=true). Useful for CI/CD environments or when Dagu is managed through the CLI or API only.

Standalone:

  ┌─────────────────────────────────────────┐
  │  dagu start-all                         │
  │  ┌───────────┐ ┌───────────┐ ┌────────┐│
  │  │ HTTP / UI │ │ Scheduler │ │Executor││
  │  └───────────┘ └───────────┘ └────────┘│
  │  File-based storage (logs, state, queue)│
  └─────────────────────────────────────────┘

Distributed:

  ┌────────────┐                   ┌────────────┐
  │ Scheduler  │                   │ HTTP / UI  │
  │            │                   │            │
  │ ┌────────┐ │                   └─────┬──────┘
  │ │ Queue  │ │  Dispatch (gRPC)        │
  │ │(file)  │ │─────────┐               │
  │ └────────┘ │         │               │
  └────────────┘         ▼               ▼
                    ┌─────────────────────────┐
                    │      Coordinator        │
                    │  (gRPC task dispatch,   │
                    │   worker registry,      │
                    │   health monitoring)    │
                    └────────┬────────────────┘
                             │
                   Poll (gRPC long-polling)
                             │
               ┌─────────────┼─────────────┐
               │             │             │
          ┌────▼───┐    ┌────▼───┐    ┌────▼───┐
          │Worker 1│    │Worker 2│    │Worker N│
          └────┬───┘    └────┬───┘    └────┬───┘
               │             │             │
               └─────────────┴─────────────┘
                 Heartbeat / ReportStatus /
                 StreamLogs (gRPC)

Quick Start

Install

::: code-group

curl -fsSL https://raw.githubusercontent.com/dagucloud/dagu/main/scripts/installer.sh | bash

irm https://raw.githubusercontent.com/dagucloud/dagu/main/scripts/installer.ps1 | iex

docker run --rm -v ~/.dagu:/var/lib/dagu -p 8080:8080 ghcr.io/dagucloud/dagu:latest dagu start-all

brew install dagu

helm repo add dagu https://dagucloud.github.io/dagu
helm repo update
helm install dagu dagu/dagu --set persistence.storageClass=<your-rwx-storage-class>

:::

The script installers run a guided wizard that installs Dagu, adds it to your PATH, sets up a background service, and creates the initial admin account. Homebrew, Docker, and Helm install without the wizard. See the Installation Guide for all options.

Create and Run a Workflow

cat > hello.yaml << 'EOF'
steps:
  - command: echo "Hello from Dagu!"
  - command: echo "Step 2"
EOF

dagu start hello.yaml

Start the Server

dagu start-all

Visit http://localhost:8080

Built-in Step Types

Common built-in step types include:

Step type	Purpose
`command`, `shell`	Local shell commands and scripts
`docker`, `container`	Run in a Docker container or exec into an existing container
`kubernetes`, `k8s`	Run a step as a Kubernetes workload
`harness`	Run CLI-based coding agents and custom harness adapters
`ssh`	Remote command execution
`sftp`	Remote file transfer
`http`	HTTP requests
`postgres`, `sqlite`	SQL queries
`redis`	Redis commands and scripts
`s3`	S3 object operations
`jq`	JSON transformation
`mail`	Email delivery
`archive`	Archive create/extract
`dag`	Sub-DAG execution
`router`	Route execution to downstream steps by value
`template`	Template rendering
`chat`	LLM chat completion
`agent`	Tool-using agent step

DAGs can also declare reusable step_types that expand to builtin step types at load time. See Custom Step Types and Step Types for the exact configuration surface.

Scheduling and Reliability

Feature	Details
Cron scheduling	Timezone support, multiple schedule entries per DAG
Overlap policies	`skip` (default), `all` (queue all), `latest` (keep only the most recent)
Catch-up scheduling	Automatically runs missed intervals when the scheduler was down
Zombie detection	Identifies and handles stalled DAG runs (configurable interval, default 45s)
Retry policies	Per-step retry with configurable limits, intervals, exit code filtering, exponential/linear/constant backoff
Lifecycle hooks	`onInit`, `onSuccess`, `onFailure`, `onAbort`, `onExit`, `onWait`
Preconditions	Gate DAG or step execution on shell command results
Queue system	File-based persistent queue with configurable concurrency limits per queue
Scheduler HA	Lock with stale detection for failover across multiple scheduler instances

Security and Access Control

Authentication

Four authentication modes, configured via DAGU_AUTH_MODE:

Mode	Description
`none`	No authentication
`basic`	HTTP Basic authentication
`builtin`	JWT-based authentication with user management, API keys, and per-DAG webhook tokens
OIDC	OpenID Connect integration with any compliant identity provider

Role-Based Access Control

When using builtin auth, five roles control access:

Role	Capabilities
`admin`	Full access including user management
`manager`	Create, edit, delete, run, stop DAGs; view audit logs
`developer`	Create, edit, delete, run, stop DAGs
`operator`	Run and stop DAGs only (no editing)
`viewer`	Read-only access

API keys can be created with independent role assignments. Audit logging tracks all actions.

TLS and Secrets

TLS for the HTTP server (DAGU_CERT_FILE, DAGU_KEY_FILE)
Mutual TLS for gRPC coordinator/worker communication (DAGU_PEER_CERT_FILE, DAGU_PEER_KEY_FILE, DAGU_PEER_CLIENT_CA_FILE)
Secret management with three providers: environment variables, files, and HashiCorp Vault

Observability

Prometheus Metrics

Dagu exposes Prometheus-compatible metrics at the /metrics endpoint:

Metric	Description
`dagu_dag_runs_total`	Total DAG runs by status
`dagu_dag_runs_total_by_dag`	Per-DAG run counts
`dagu_dag_run_duration_seconds`	Histogram of run durations
`dagu_dag_runs_currently_running`	Active DAG runs
`dagu_dag_runs_queued_total`	Queued runs
`dagu_queue_wait_time`	Queue wait time histogram
`dagu_uptime_seconds`	Server uptime

OpenTelemetry

Per-DAG OpenTelemetry tracing configuration with OTLP endpoint, custom headers, resource attributes, and TLS options.

Structured Logging and Notifications

JSON or text format logging (DAGU_LOG_FORMAT), per-run log files with separate stdout/stderr capture per step
Slack and Telegram bot integration for run status events (succeeded, failed, aborted, waiting, rejected)
Email notifications on DAG success, failure, or wait status via SMTP
Per-DAG webhook endpoints with token authentication

Distributed Execution

The coordinator/worker architecture distributes DAG execution across multiple machines:

Coordinator: gRPC server managing task distribution, worker registry, and health monitoring
Workers: Connect to the coordinator, pull tasks via long-polling, execute DAGs locally, stream logs back
Worker labels: Route DAGs to specific workers based on labels (e.g., gpu=true, region=us-east-1)
Health checks: HTTP health endpoints on coordinator and workers for load balancer integration
Queue system: File-based persistent queue with configurable concurrency limits

# Start coordinator
dagu coord

# Start workers (on separate machines)
DAGU_WORKER_LABELS=gpu=true,memory=64G dagu worker

See the Distributed Execution documentation for setup details.

Workflow Examples

Parallel Execution with Dependencies

type: graph
steps:
  - id: extract
    command: ./extract.sh

  - id: transform_a
    command: ./transform_a.sh
    depends: [extract]

  - id: transform_b
    command: ./transform_b.sh
    depends: [extract]

  - id: load
    command: ./load.sh
    depends: [transform_a, transform_b]

Docker Step

steps:
  - name: build
    container:
      image: node:20-alpine
    command: npm run build

Retry with Exponential Backoff

steps:
  - name: flaky-api-call
    command: curl -f https://api.example.com/data
    retry_policy:
      limit: 3
      interval_sec: 10
      backoff: 2
      max_interval_sec: 120
    continue_on:
      failure: true

Scheduling with Overlap Control

schedule:
  - "0 */6 * * *"
overlap_policy: skip
timeout_sec: 3600
handler_on:
  failure:
    command: notify-team.sh
  exit:
    command: cleanup.sh

Sub-DAG Composition

steps:
  - name: extract
    call: etl/extract
    params: "SOURCE=s3://bucket/data.csv"

  - name: transform
    call: etl/transform
    params: "INPUT=${extract.outputs.result}"
    depends: [extract]

  - name: load
    call: etl/load
    params: "DATA=${transform.outputs.result}"
    depends: [transform]

SSH Remote Execution

steps:
  - name: deploy
    type: ssh
    config:
      host: prod-server.example.com
      user: deploy
      key: ~/.ssh/id_rsa
    command: cd /var/www && git pull && systemctl restart app

See Examples for more patterns.

Version-Controlled Workflows

Dagu supports Git sync to keep DAG definitions, agent markdown files, and managed documents version-controlled. Enable DAGU_GITSYNC_ENABLED=true with a repository URL, and Dagu pulls tracked files from a Git branch. Optional auto-sync polls the repository at a configurable interval (default 300s). Supports token and SSH authentication.

See Git Sync for configuration.

CLI Reference

Command	Description
`dagu start <dag>`	Execute a DAG
`dagu start-all`	Start HTTP server + scheduler
`dagu server`	Start HTTP server only
`dagu scheduler`	Start scheduler only
`dagu coord`	Start coordinator (distributed mode)
`dagu worker`	Start worker (distributed mode)
`dagu stop <dag>`	Stop a running DAG
`dagu restart <dag>`	Restart a DAG
`dagu retry <dag> <run-id>`	Retry a failed run
`dagu dry <dag>`	Dry run (show what would execute)
`dagu status <dag>`	Show DAG run status
`dagu history <dag>`	Show execution history
`dagu validate <dag>`	Validate DAG YAML
`dagu enqueue <dag>`	Add DAG to the execution queue
`dagu dequeue <dag>`	Remove DAG from the queue
`dagu cleanup`	Clean up old run data
`dagu migrate`	Run database migrations

Full CLI and environment variable reference: CLI | Configuration Reference

Community

GitHub Discord Bluesky Issues

FilesExpand file tree

index.md

Latest commit

History