Skip to content

feat(runbooks): runbook execution engine + pup workflows PoC#146

Merged
platinummonkey merged 10 commits intomainfrom
feat/runbooks-poc
Mar 31, 2026
Merged

feat(runbooks): runbook execution engine + pup workflows PoC#146
platinummonkey merged 10 commits intomainfrom
feat/runbooks-poc

Conversation

@platinummonkey
Copy link
Copy Markdown
Collaborator

@platinummonkey platinummonkey commented Mar 3, 2026

Summary

Runbook execution engine + pup workflows command, proposed in #143. Adds two new command groups with no new dependencies.

pup runbooks

YAML runbooks live in ~/.config/pup/runbooks/. Each file defines sequential steps that mix pup commands, shell tools, Datadog Workflow triggers, HTTP calls, and interactive confirm gates.

pup runbooks list [--tag=key:value ...]   # discover by tag
pup runbooks describe <name>              # show steps + vars
pup runbooks run <name> [--arg K=V ...]   # execute
pup runbooks validate <name>              # lint without running
pup runbooks import <path-or-url>         # fetch into runbooks dir

Example runbook (~/.config/pup/runbooks/hello.yaml):

name: hello
description: Test runbook
vars:
  NAME:
    default: world
steps:
  - name: Say hello
    kind: shell
    run: echo "Hello, {{ NAME }}!"
  - name: List monitors
    kind: pup
    run: monitors list --limit=3

Output while running:

runbook: hello  (2 steps)  2026-03-03 00:16:08 UTC
  Test runbook

[1/2] Say hello  (shell)  2026-03-03 00:16:08 UTC
  $ echo "Hello, pup!"
  stdout:
Hello, pup!
  ✓ done  10ms  ·  next: step 2/2 — List monitors (pup)

[2/2] List monitors  (pup)  2026-03-03 00:16:08 UTC
  $ monitors list --limit=3
  stdout:
{ ... }
  ✓ done  320ms  ·  last step

✓ done  hello  2/2 steps  330ms  2026-03-03 00:16:08 UTC

pup workflows

Full CRUD + execution for Datadog Workflow Automation via the typed SDK client:

pup workflows get <workflow-id>
pup workflows create --file=workflow.json
pup workflows update <workflow-id> --file=workflow.json
pup workflows delete <workflow-id>
pup workflows run <workflow-id> [--payload '{"k":"v"}'] [--payload-file=f.json] [--wait] [--timeout=5m]
pup workflows instances list <workflow-id> [--limit=10] [--page=0]
pup workflows instances get <workflow-id> <instance-id>
pup workflows instances cancel <workflow-id> <instance-id>

--wait polls every 2 s until terminal state. Requires DD_API_KEY + DD_APP_KEY.

Step kinds

Kind What it does
pup Shells out to the current pup binary with --output json; supports poll: loops
shell sh -c "..." with template rendering; surfaces stderr even on success
datadog-workflow POST trigger + auto-poll to terminal state
confirm Prompts [y/N]; bypassed in agent mode
http Authenticated requests to DD API paths or external URLs

HTTP step

Supports all methods (GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS), configurable content types for both request and response, file-based bodies, and binary response handling.

- name: Upload CSV
  kind: http
  method: POST
  url: /api/v2/reference-tables/{{ TABLE_ID }}/rows
  body_file: /tmp/rows.csv        # reads raw bytes from file
  content_type: text/csv

- name: Fetch JSON
  kind: http
  method: GET
  url: /api/v1/slo/{{ SLO_ID }}
  # content_type and accept default to application/json

- name: Download binary report
  kind: http
  method: GET
  url: https://internal/reports/{{ ID }}.pdf
  accept: application/pdf
  output_file: /tmp/report-{{ ID }}.pdf   # writes raw bytes to disk

Request fields:

Field Default Purpose
method GET Any HTTP method
body Inline body template (rendered before sending)
body_file Read body from file (template-rendered path); takes precedence over body
content_type application/json when body set; application/octet-stream when body_file set Request Content-Type
accept application/json Request Accept header
headers Additional headers; values are template-rendered, merged with template headers

Response decoding:

  • output_file set → raw bytes written to disk; step output is "written N bytes to <path>"
  • *json* → pretty-printed JSON
  • text/*, *yaml*, *csv*, *xml*, *html* → raw UTF-8 string
  • Binary without output_file → actionable error suggesting output_file

For DD API paths (starting with /), auth headers are added automatically. External URLs are sent without auth.

Reusable step templates

Common step patterns live in ~/.config/pup/runbooks/_templates/<name>.yaml. Steps reference them with template:; step fields override template fields; headers are merged per-key.

# _templates/dd-get.yaml
kind: http
method: GET
on_failure: warn
steps:
  - name: Fetch SLO
    template: dd-get
    url: /api/v1/slo/{{ SLO_ID }}
    on_failure: fail          # overrides template

Templates support all step fields including the new content_type, accept, body_file, and output_file. The _templates/ directory and any _-prefixed files are excluded from pup runbooks list.

Control flow

  • {{ VAR }} and {{ VAR | default: "x" }} template substitution in all string fields
  • on_failure: warn | confirm | fail per step
  • when: always | on_success to run cleanup steps after failure
  • optional: true to swallow errors silently
  • capture: VAR_NAME to pipe step stdout into a variable for later steps
  • poll: { interval, timeout, until } with conditions: empty, status == X, value < N, decreasing

Reference runbooks

Three annotated examples in docs/examples/runbooks/:

  • deploy-service.yaml — SLO check → incident gate → DD Workflow trigger → monitor poll → Slack notify
  • incident-triage.yaml — fetch incident → search logs → check monitors → auto-mitigation workflow → shell diagnostics
  • maintenance-window.yaml — create downtime (capture ID) → drain → metric poll → confirm → delete downtime

Not in this PoC

  • Parallel step execution
  • Step retry logic
  • Remote runbook registry / sync
  • Web UI

Platform support

pup runbooks is native-only, excluded from wasm builds via #[cfg(not(target_arch = "wasm32"))]. pup workflows uses the typed SDK and works on all targets.

Testing

cargo build

mkdir -p ~/.config/pup/runbooks
cat > ~/.config/pup/runbooks/hello.yaml <<'YAML'
name: hello
description: Test runbook
vars:
  NAME:
    default: world
steps:
  - name: Say hello
    kind: shell
    run: echo "Hello, {{ NAME }}!"
YAML

pup runbooks list
pup runbooks validate hello
pup runbooks run hello --arg NAME=pup

All existing tests pass (cargo test, cargo clippy -- -D warnings, cargo fmt --check).


Discussion: #143

🤖 Generated with Claude Code

platinummonkey and others added 4 commits March 2, 2026 18:03
Implements the runbooks PoC as specified:

- `pup runbooks list/describe/run/validate/import` — execute YAML
  runbooks from ~/.config/pup/runbooks/ with {{ VAR }} templating,
  sequential step execution, poll loops, and confirm gates
- `pup workflows run/instances list/get` — trigger Datadog Workflows
  and poll to completion via raw REST (POST/GET /api/v2/workflows/...)

New files:
- src/runbooks/mod.rs     — Runbook, Step, VarDef, PollConfig types
- src/runbooks/template.rs — {{ VAR }} and | default: "x" rendering
- src/runbooks/loader.rs  — scan runbooks dir, load/import by name
- src/runbooks/engine.rs  — sequential executor with polling, confirm,
                             on_failure handling, variable capture
- src/commands/runbooks.rs — list/describe/run/validate/import CLI
- src/commands/workflows.rs — trigger + watch, instances list/get
- docs/examples/runbooks/  — deploy-service, incident-triage,
                              maintenance-window reference templates

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…hints

Each step now shows:
- Header: ► Step N/M  ·  <name>  ·  <kind>  [HH:MM:SS]  with command preview
- Labeled sections: ── stdout ── and ── stderr ── blocks wrapping output
- Footer: ✓/✗/⊘  <elapsed>  ·  next: step N/M — <name> (<kind>)
- Summary line with total elapsed time and pass/fail count

Shell steps surface non-empty stderr even on success so warnings
from curl, grep, etc. aren't silently dropped.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Remove RULE constant and all long separator lines
- Timestamps now show full UTC date+time: 2026-03-02 18:11:01 UTC
- Step header: [N/M] <name>  (<kind>)  <timestamp>
- Output labeled with indented "stdout:" / "stderr:" markers
- Summary line: ✓/⚠ done  <name>  N/M steps  <elapsed>  <timestamp>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ch = "wasm32"))]

Neither feature is compatible with wasm targets:
- loader.rs uses std::fs and reqwest::Client::new()
- engine.rs uses tokio::process::Command and chrono::Utc
- both require dirs-based config path resolution (native-only)

Gated items:
- mod runbooks; in main.rs
- pub mod runbooks/workflows; in commands/mod.rs
- Commands::Runbooks/Workflows variants and their subcommand enums
- dispatch arms in main_inner()

Verified: native build, wasm32-wasip2 (wasi feature), and
wasm32-unknown-unknown --lib (browser feature) all pass.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
platinummonkey and others added 5 commits March 30, 2026 18:37
- workflows.rs: take main's typed SDK implementation (get/create/update/
  delete/run/instance_list/instance_get/instance_cancel) over the
  runbooks branch's raw HTTP prototype
- commands/mod.rs: drop wasm32 cfg gate on pub mod workflows (no longer
  needed with typed SDK impl)
- main.rs: combine module declarations (runbooks + skills + tunnel),
  keep Runbooks command variant, use main's Workflows doc + dispatch,
  add LlmObs and ReferenceTables variants from main; remove duplicate
  WorkflowActions/WorkflowInstanceActions enums and stale Workflows
  dispatch arm introduced by HEAD
- runbooks/loader.rs: replace serde_yaml with serde_norway (project std)
- runbooks/engine.rs: pass empty query slice to raw_get (signature
  gained a query param in main)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…headers

- Step gains `body` (JSON template string) and `headers` (key/value map,
  templates rendered) fields alongside the existing `url` and `method`
- client: add raw_request(cfg, method, path, body, extra_headers) covering
  GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS — any method reqwest supports;
  returns Null for 204/empty responses
- engine: rewrite execute_http to render body + header templates, dispatch
  to raw_request for DD API paths (/...) and plain reqwest for external URLs;
  both paths honour all methods and extra headers

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Suggested in #143 (comment): --arg SERVICE=payments reads
more naturally than --set for passing runtime variables to a runbook.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Templates live in ~/.config/pup/runbooks/_templates/<name>.yaml and define
any subset of step fields. A step references one with `template: <name>`;
its own fields take precedence over the template's (step wins on all scalar
fields; headers are merged per-key so step headers override template headers
with the same key).

  # _templates/check-slo.yaml
  kind: pup
  run: slos get --id={{ SLO_ID }}
  on_failure: warn

  # runbook step
  steps:
    - name: Verify production SLO
      template: check-slo          # inherits kind + run + on_failure
      on_failure: fail             # overrides template's on_failure

- StepTemplate: new all-optional struct for template deserialization
- Step: kind now has #[serde(default)] so it can be omitted when a
  template supplies it; new `template: Option<String>` field
- loader: templates_dir(), load_template(), apply_template(), resolve_steps()
  helpers; load_runbook() now resolves templates before returning;
  list_runbooks() skips the _templates/ subdir and any _-prefixed files

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Prefix unused error binding with _ in runbooks engine match arm
- fmt reformatted loader.rs

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@platinummonkey platinummonkey marked this pull request as ready for review March 30, 2026 23:47
Request side:
- `content_type` — sets Content-Type header (default: application/json
  when `body` is set, application/octet-stream when `body_file` is set)
- `body_file` — read request body from a file (template-rendered path);
  takes precedence over `body`; intended for binary or large payloads
- `accept` — controls the Accept header (default: application/json)

Response side:
- JSON responses are pretty-printed as before
- YAML, CSV, plain-text and other text/* types are returned as-is UTF-8
- Binary responses require `output_file` to be set; an actionable error
  is returned if it isn't
- `output_file` — write the raw response bytes to a file (template-rendered
  path); returns "written N bytes to <path>" as step output

client: raw_request now takes raw bytes + explicit content_type/accept and
returns HttpResponse { content_type, bytes } instead of serde_json::Value.
engine: execute_http builds body bytes per content-type then delegates
response decoding to decode_http_response.
Both StepTemplate and the fill! macro in loader updated with the four new fields.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@platinummonkey platinummonkey merged commit 03dbf37 into main Mar 31, 2026
11 checks passed
@platinummonkey platinummonkey deleted the feat/runbooks-poc branch March 31, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request product:automation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant