Durable workflow orchestration in Go. Write workflows as plain functions and the engine takes care of persistence, retries, signals, timers, and replay across restart and worker failure.
client.RegisterWorkflow("MonthlyReport", func(ctx sdk.WorkflowContext, _ any) (any, error) {
ctx.QueueActivity("LoadRows", "users")
ctx.Sleep(24 * time.Hour)
ctx.QueueActivity("Mail", "report-ready")
return "queued", nil
})Inspired by Temporal and Cadence. Single Go SDK, single binary, Postgres plus Redis. Apache 2.0.
Status: alpha, working toward
v0.1.0. SeeBACKLOG.mdfor a current breakdown of what is shipped, what is in progress, and what is planned. The full design lives indocs/PRD.md.
- State survives anything. Every workflow transition is persisted to Postgres before the RPC returns. Engine restart, worker crash, network blip: the workflow resumes.
- Replay on yield. Workflow functions are deterministic over their event history. When a worker dies the engine re-dispatches the task; the function re-runs against the same history with recorded commands turned into no-ops. The standard Temporal-style model, minus the platform.
- Small to operate. No coordinator, no sidecar. Active-passive HA is a Postgres advisory lock. Observability is a
/metricsendpoint and an OTLP exporter.
Three processes, two stateful dependencies.
- Engine. gRPC frontend, durable state in Postgres, task dispatch through Redis Streams, timer manager, metrics on
:9090/metrics. Run more than one for active-passive HA; the Postgres advisory lock elects the leader. - Worker. Your code. Imports the SDK, registers workflows and activities, opens a bidi gRPC stream to the engine, executes tasks, acks completion. Stateless and horizontally scalable.
- Dashboard. A Go binary that serves the embedded React SPA. The SPA talks to the engine over gRPC; there is no direct database access from the dashboard.
The two data stores hold the entire system of record:
- Postgres keeps
workflow_executions,workflow_events(the event log),timers, and the leader advisory lock. Schema lives undermigrations/. - Redis Streams carries work in flight. Consumer groups give workers exclusive task delivery; the engine's reclaim loop catches anything a crashed worker did not ack.
The longer walkthrough is in the docs site (web/apps/site/src/content/docs/architecture.mdx), and the design intent is in docs/PRD.md.
How a single workflow lives across the engine's event log, two dispatches, and a one-hour Sleep:
The workflow function runs twice. The first dispatch yields on Sleep; the engine writes TimerScheduled and frees the worker. When the timer fires, the engine appends TimerFired and re-dispatches the task. On the second run, the SDK replays past the recorded commands and reaches the next side-effecting call. Detail in docs/architecture/replay.
Requires Docker and Docker Compose. Go 1.25+ and pnpm only needed for local development.
git clone https://github.com/edaywalid/sched.git
cd sched
make upmake up brings up Postgres, Redis, runs migrations, then starts the engine, a worker, and the dashboard. Open http://localhost:8080.
To run a demo workflow on boot, set AUTO_START_TEST=true on the worker service.
make deps # tidy + generate
make proto # protobuf
make migrate-up # apply schema (needs SCHED_POSTGRES_DSN)
make run-engine # in one terminal
make run-worker # in another
make web-dev # dashboard SPA (Vite, port 5173)Leave SCHED_POSTGRES_DSN unset to fall back to an in-memory store. Useful for unit experiments; state is lost on restart.
A workflow is a Go function. Inside it the SDK exposes a few durable primitives.
client.RegisterWorkflow("ApprovalGate", func(ctx sdk.WorkflowContext, input any) (any, error) {
name, payload, err := ctx.WaitForSignal(24 * time.Hour)
if err != nil {
return nil, err
}
if name != "approve" {
return "rejected", nil
}
ctx.QueueActivity("Process", payload)
return "approved", nil
})Activities are plain functions too:
client.RegisterActivity("Process", func(ctx sdk.ActivityContext, input any) (any, error) {
ctx.Heartbeat(nil)
return doWork(input), nil
})Retries follow the engine's default RetryPolicy. Per-activity policy
overrides are planned (roadmap).
Start one over gRPC or the dashboard:
curl -X POST http://localhost:8080/api/workflows \
-H 'Content-Type: application/json' \
-d '{"workflowName":"ApprovalGate","input":"req-42"}'Run multiple engine processes against the same Postgres + Redis. Each instance tries to acquire a Postgres advisory lock; whichever holds it is the leader. Standbys idle. Loss of the leader releases the lock; a standby promotes within the retry interval.
services:
engine-1:
image: sched-engine
environment:
SCHED_POSTGRES_DSN: postgres://...
SCHED_LEADER_LOCK_KEY: "6005075820067160625"
engine-2:
image: sched-engine
environment:
SCHED_POSTGRES_DSN: postgres://...
SCHED_LEADER_LOCK_KEY: "6005075820067160625"True multi-active sharded ownership is on the roadmap.
- Logs: structured slog. Default text, set
SCHED_LOG_FORMAT=jsonfor JSON. Level viaSCHED_LOG_LEVEL=debug|info|warn|error. - Metrics: Prometheus exposition on
:${SCHED_METRICS_PORT:-9090}/metrics. Counters for workflows started/completed by status, activities executed by status, queue poll latency, activity duration. - Tracing: OpenTelemetry OTLP/gRPC. Set
SCHED_OTLP_ENDPOINT=otel-collector:4317andSCHED_OTEL_SERVICE_NAME=sched-engine.docker compose --profile tracing upbrings Jaeger up on:16686for a quick look.
Workers can opt in to bidi-streamed task delivery instead of long-polling by setting SCHED_WORKER_STREAMING=true. The engine exposes StreamWorkflowTasks and StreamActivityTasks; the worker opens both streams in parallel.
The engine cancels in-flight polls and drains streams on SIGTERM. Bump the grace period with SCHED_SHUTDOWN_GRACE_SECONDS (default 8).
All flags are environment variables.
| Variable | Component | Default | Purpose |
|---|---|---|---|
ENGINE_PORT |
engine | 50051 |
gRPC listen port |
SCHED_POSTGRES_DSN |
engine | (in-memory) | Postgres DSN; leave unset for ephemeral mode |
SCHED_LEADER_LOCK_KEY |
engine | 0x53636845644c6431 |
Postgres advisory lock key for leader election |
SCHED_METRICS_PORT |
engine | 9090 |
Prometheus listen port |
SCHED_LOG_LEVEL |
all | info |
debug, info, warn, error |
SCHED_LOG_FORMAT |
all | text |
text or json |
SCHED_OTLP_ENDPOINT |
all | (unset, no traces) | OTLP/gRPC collector endpoint |
SCHED_OTEL_SERVICE_NAME |
all | per-binary | Service name in traces |
SCHED_SHUTDOWN_GRACE_SECONDS |
engine | 8 |
Grace period on SIGTERM |
ENGINE_ADDRESS |
worker, dashboard | localhost:50051 |
gRPC address of the engine |
TASK_QUEUE |
worker | default |
Task queue name to poll |
SCHED_WORKER_STREAMING |
worker | false |
Opt in to bidi-streamed task dispatch |
AUTO_START_TEST |
worker | false |
Demo mode: register and start test workflows |
DASHBOARD_PORT |
dashboard | 8080 |
HTTP port |
cmd/
engine/ # gRPC server binary
sdk/ # demo worker binary; consumes the SDK
dashboard/ # embeds the React SPA, proxies to the engine
engine/ # core orchestrator (workflow + activity + timer logic)
sdk/ # Go SDK consumed by workers
queue/ # Redis Streams task queue
proto/ # gRPC contract
internal/store/ # sqlc-generated persistence layer
migrations/ # golang-migrate up/down SQL
web/
apps/dashboard # React SPA embedded in cmd/dashboard
apps/site # TanStack Start landing + docs + blog (host yourself)
packages/design # shared design tokens, fonts, Logo, Motion helpers
docs/PRD.md # north-star design doc; partially implemented
make deps # go mod tidy + go generate
make proto # regenerate gRPC stubs
make sqlc-gen # regenerate Postgres queries
make migrate-new NAME=add_shards
make test # unit + integration (needs Postgres + Redis)
make lint # golangci-lint
make web-dev # dashboard dev server (port 5173)
make site-dev # landing + docs dev server (port 3000)
make web-build # production SPA bundle into web_distCI runs go build, go vet, go test -race, and golangci-lint against Postgres 16 and Redis 7 services on every push and PR.
See BACKLOG.md for the current state of every workstream
and docs/PRD.md for the north-star architecture.
Apache License 2.0. See LICENSE.