| layout | doc |
|---|
Single binary. No external dependencies. Scales from standalone to distributed cluster over gRPC.
::: tip Try It Live Explore without installing: Live Demo
Credentials: demouser / demouser
:::
Dagu is a workflow orchestration engine that runs as a single binary with no external databases or message brokers. Workflows are defined as DAGs (Directed Acyclic Graphs) in YAML. It supports local execution, cron scheduling, queue-based concurrency control, and distributed coordinator/worker execution across multiple machines over gRPC.
All state is stored in local files by default. There is nothing to install besides the binary itself.
Dagu is useful when scripts, containers, server jobs, or data tasks need visible dependencies, schedules, logs, retries, and a simple way to operate them.
Run: existing shell scripts, Python scripts, HTTP calls, and scheduled jobs without rewriting them.
Why Dagu fits: dependencies, run status, logs, retries, and history become visible in the Web UI instead of being hidden across crontabs and server log files.
Run: PostgreSQL or SQLite queries, S3 transfers, jq transforms, validation steps, and reusable sub-workflows.
Why Dagu fits: daily data workflows stay declarative, observable, and easy to retry when one step fails.
Run: ffmpeg, thumbnail extraction, audio normalization, image processing, and other compute-heavy jobs.
Why Dagu fits: conversion work can run across distributed workers while status, history, logs, and artifacts stay in one persistence layer for monitoring, debugging, and retries.
Run: SSH backups, cleanup jobs, deploy scripts, patch windows, precondition checks, and lifecycle hooks.
Why Dagu fits: remote operations get schedules, retries, notifications, and per-step logs without requiring operators to SSH into servers for every recovery.
Run: Docker images, Kubernetes Jobs, shell glue, and follow-up validation steps.
Why Dagu fits: teams can compose image-based tasks and route them to the right workers without building a custom control plane.
Run: diagnostics, account repair jobs, data checks, and approval-gated support actions.
Why Dagu fits: non-engineers can run reviewed workflows from the Web UI while engineers keep commands, logs, and results traceable.
Run: sensor polling, local cleanup, offline sync, health checks, and device maintenance jobs.
Why Dagu fits: the single binary and file-backed state work well on small devices while still providing visibility through the Web UI.
Dagu runs in three configurations:
Standalone. A single dagu start-all process runs the HTTP server, scheduler, and executor. Suitable for single-machine deployments.
Coordinator/Worker. The scheduler enqueues jobs to a file-based queue, then dispatches them to a coordinator over gRPC. Workers long-poll the coordinator for tasks, execute DAGs locally, and report status back. Workers can run on separate machines and are routed tasks based on labels. Mutual TLS secures gRPC communication between coordinator and workers.
Headless. Run without the web UI (DAGU_HEADLESS=true). Useful for CI/CD environments or when Dagu is managed through the CLI or API only.
Standalone:
┌─────────────────────────────────────────┐
│ dagu start-all │
│ ┌───────────┐ ┌───────────┐ ┌────────┐│
│ │ HTTP / UI │ │ Scheduler │ │Executor││
│ └───────────┘ └───────────┘ └────────┘│
│ File-based storage (logs, state, queue)│
└─────────────────────────────────────────┘
Distributed:
┌────────────┐ ┌────────────┐
│ Scheduler │ │ HTTP / UI │
│ │ │ │
│ ┌────────┐ │ └─────┬──────┘
│ │ Queue │ │ Dispatch (gRPC) │
│ │(file) │ │─────────┐ │
│ └────────┘ │ │ │
└────────────┘ ▼ ▼
┌─────────────────────────┐
│ Coordinator │
│ (gRPC task dispatch, │
│ worker registry, │
│ health monitoring) │
└────────┬────────────────┘
│
Poll (gRPC long-polling)
│
┌─────────────┼─────────────┐
│ │ │
┌────▼───┐ ┌────▼───┐ ┌────▼───┐
│Worker 1│ │Worker 2│ │Worker N│
└────┬───┘ └────┬───┘ └────┬───┘
│ │ │
└─────────────┴─────────────┘
Heartbeat / ReportStatus /
StreamLogs (gRPC)
::: code-group
curl -fsSL https://raw.githubusercontent.com/dagucloud/dagu/main/scripts/installer.sh | bashirm https://raw.githubusercontent.com/dagucloud/dagu/main/scripts/installer.ps1 | iexdocker run --rm -v ~/.dagu:/var/lib/dagu -p 8080:8080 ghcr.io/dagucloud/dagu:latest dagu start-allbrew install daguhelm repo add dagu https://dagucloud.github.io/dagu
helm repo update
helm install dagu dagu/dagu --set persistence.storageClass=<your-rwx-storage-class>:::
The script installers run a guided wizard that installs Dagu, adds it to your PATH, sets up a background service, and creates the initial admin account. Homebrew, Docker, and Helm install without the wizard. See the Installation Guide for all options.
cat > hello.yaml << 'EOF'
steps:
- command: echo "Hello from Dagu!"
- command: echo "Step 2"
EOF
dagu start hello.yamldagu start-allVisit http://localhost:8080
Common built-in step types include:
| Step type | Purpose |
|---|---|
command, shell |
Local shell commands and scripts |
docker, container |
Run in a Docker container or exec into an existing container |
kubernetes, k8s |
Run a step as a Kubernetes workload |
harness |
Run CLI-based coding agents and custom harness adapters |
ssh |
Remote command execution |
sftp |
Remote file transfer |
http |
HTTP requests |
postgres, sqlite |
SQL queries |
redis |
Redis commands and scripts |
s3 |
S3 object operations |
jq |
JSON transformation |
mail |
Email delivery |
archive |
Archive create/extract |
dag |
Sub-DAG execution |
router |
Route execution to downstream steps by value |
template |
Template rendering |
chat |
LLM chat completion |
agent |
Tool-using agent step |
DAGs can also declare reusable step_types that expand to builtin step types at load time. See Custom Step Types and Step Types for the exact configuration surface.
| Feature | Details |
|---|---|
| Cron scheduling | Timezone support, multiple schedule entries per DAG |
| Overlap policies | skip (default), all (queue all), latest (keep only the most recent) |
| Catch-up scheduling | Automatically runs missed intervals when the scheduler was down |
| Zombie detection | Identifies and handles stalled DAG runs (configurable interval, default 45s) |
| Retry policies | Per-step retry with configurable limits, intervals, exit code filtering, exponential/linear/constant backoff |
| Lifecycle hooks | onInit, onSuccess, onFailure, onAbort, onExit, onWait |
| Preconditions | Gate DAG or step execution on shell command results |
| Queue system | File-based persistent queue with configurable concurrency limits per queue |
| Scheduler HA | Lock with stale detection for failover across multiple scheduler instances |
Four authentication modes, configured via DAGU_AUTH_MODE:
| Mode | Description |
|---|---|
none |
No authentication |
basic |
HTTP Basic authentication |
builtin |
JWT-based authentication with user management, API keys, and per-DAG webhook tokens |
| OIDC | OpenID Connect integration with any compliant identity provider |
When using builtin auth, five roles control access:
| Role | Capabilities |
|---|---|
admin |
Full access including user management |
manager |
Create, edit, delete, run, stop DAGs; view audit logs |
developer |
Create, edit, delete, run, stop DAGs |
operator |
Run and stop DAGs only (no editing) |
viewer |
Read-only access |
API keys can be created with independent role assignments. Audit logging tracks all actions.
- TLS for the HTTP server (
DAGU_CERT_FILE,DAGU_KEY_FILE) - Mutual TLS for gRPC coordinator/worker communication (
DAGU_PEER_CERT_FILE,DAGU_PEER_KEY_FILE,DAGU_PEER_CLIENT_CA_FILE) - Secret management with three providers: environment variables, files, and HashiCorp Vault
Dagu exposes Prometheus-compatible metrics at the /metrics endpoint:
| Metric | Description |
|---|---|
dagu_dag_runs_total |
Total DAG runs by status |
dagu_dag_runs_total_by_dag |
Per-DAG run counts |
dagu_dag_run_duration_seconds |
Histogram of run durations |
dagu_dag_runs_currently_running |
Active DAG runs |
dagu_dag_runs_queued_total |
Queued runs |
dagu_queue_wait_time |
Queue wait time histogram |
dagu_uptime_seconds |
Server uptime |
Per-DAG OpenTelemetry tracing configuration with OTLP endpoint, custom headers, resource attributes, and TLS options.
- JSON or text format logging (
DAGU_LOG_FORMAT), per-run log files with separate stdout/stderr capture per step - Slack and Telegram bot integration for run status events (
succeeded,failed,aborted,waiting,rejected) - Email notifications on DAG success, failure, or wait status via SMTP
- Per-DAG webhook endpoints with token authentication
The coordinator/worker architecture distributes DAG execution across multiple machines:
- Coordinator: gRPC server managing task distribution, worker registry, and health monitoring
- Workers: Connect to the coordinator, pull tasks via long-polling, execute DAGs locally, stream logs back
- Worker labels: Route DAGs to specific workers based on labels (e.g.,
gpu=true,region=us-east-1) - Health checks: HTTP health endpoints on coordinator and workers for load balancer integration
- Queue system: File-based persistent queue with configurable concurrency limits
# Start coordinator
dagu coord
# Start workers (on separate machines)
DAGU_WORKER_LABELS=gpu=true,memory=64G dagu workerSee the Distributed Execution documentation for setup details.
type: graph
steps:
- id: extract
command: ./extract.sh
- id: transform_a
command: ./transform_a.sh
depends: [extract]
- id: transform_b
command: ./transform_b.sh
depends: [extract]
- id: load
command: ./load.sh
depends: [transform_a, transform_b]steps:
- name: build
container:
image: node:20-alpine
command: npm run buildsteps:
- name: flaky-api-call
command: curl -f https://api.example.com/data
retry_policy:
limit: 3
interval_sec: 10
backoff: 2
max_interval_sec: 120
continue_on:
failure: trueschedule:
- "0 */6 * * *"
overlap_policy: skip
timeout_sec: 3600
handler_on:
failure:
command: notify-team.sh
exit:
command: cleanup.shsteps:
- name: extract
call: etl/extract
params: "SOURCE=s3://bucket/data.csv"
- name: transform
call: etl/transform
params: "INPUT=${extract.outputs.result}"
depends: [extract]
- name: load
call: etl/load
params: "DATA=${transform.outputs.result}"
depends: [transform]steps:
- name: deploy
type: ssh
config:
host: prod-server.example.com
user: deploy
key: ~/.ssh/id_rsa
command: cd /var/www && git pull && systemctl restart appSee Examples for more patterns.
Dagu supports Git sync to keep DAG definitions, agent markdown files, and managed documents version-controlled. Enable DAGU_GITSYNC_ENABLED=true with a repository URL, and Dagu pulls tracked files from a Git branch. Optional auto-sync polls the repository at a configurable interval (default 300s). Supports token and SSH authentication.
See Git Sync for configuration.
| Command | Description |
|---|---|
dagu start <dag> |
Execute a DAG |
dagu start-all |
Start HTTP server + scheduler |
dagu server |
Start HTTP server only |
dagu scheduler |
Start scheduler only |
dagu coord |
Start coordinator (distributed mode) |
dagu worker |
Start worker (distributed mode) |
dagu stop <dag> |
Stop a running DAG |
dagu restart <dag> |
Restart a DAG |
dagu retry <dag> <run-id> |
Retry a failed run |
dagu dry <dag> |
Dry run (show what would execute) |
dagu status <dag> |
Show DAG run status |
dagu history <dag> |
Show execution history |
dagu validate <dag> |
Validate DAG YAML |
dagu enqueue <dag> |
Add DAG to the execution queue |
dagu dequeue <dag> |
Remove DAG from the queue |
dagu cleanup |
Clean up old run data |
dagu migrate |
Run database migrations |
Full CLI and environment variable reference: CLI | Configuration Reference
Architecture and core concepts
Installation and first workflow
YAML syntax, scheduling, execution control
All configuration options
All 18 executor types
Coordinator/worker setup
RBAC, OIDC, API keys, audit logging
Deployment, configuration, operations

