Skip to content

Elevarq/Arq-Signals

Arq Signals

Arq Signals is a read-only PostgreSQL diagnostic collector. It runs on your infrastructure, collects statistics from your databases, and packages them as portable snapshots. No data leaves your machine. No AI. No cloud. Just structured evidence from the views PostgreSQL already exposes.

From Elevarq — PostgreSQL tools for engineering teams.

CI License: BSD-3-Clause Go Report Card

Read-only by design — three independent enforcement layers prevent any write operations. Unsafe roles (superuser, replication) are blocked before collection starts. Every SQL query is in the source.

No cloud, no phone-home — all data stays on your machine. No telemetry, no analytics, no external network calls.

No AI inside — Arq Signals is a pure data collector. No language models, no scoring, no recommendations. What you collect is what you get.

Built for restricted environments — runs airgapped, as a non-root container, with no internet dependency. Suitable for networks where third-party monitoring agents are not permitted.


Try it in 2 minutes

git clone https://github.com/elevarq/arq-signals.git
cd arq-signals
docker compose -f examples/docker-compose.yml up -d

This starts Arq Signals alongside PostgreSQL 16 with a pre-configured monitoring role. Collection begins automatically.

# Trigger an immediate collection
curl -X POST http://localhost:8081/collect/now \
  -H "Authorization: Bearer test-token"

# Download your first snapshot
curl -o snapshot.zip http://localhost:8081/export \
  -H "Authorization: Bearer test-token"

# Inspect the contents
unzip -l snapshot.zip

Your snapshot contains raw PostgreSQL statistics in structured JSON — nothing more. See examples/snapshot-example/ for what the output looks like.


Why Arq Signals exists

Every PostgreSQL instance exposes diagnostic data through built-in statistics views. But collecting this data consistently, safely, and in a format you can actually use takes tooling that most teams end up building themselves.

Arq Signals handles the collection part so you don't have to. It connects with a read-only role, runs approved SQL queries on a schedule, and writes structured results to local storage. When you need the data elsewhere, it packages everything as a portable ZIP snapshot.

The project is open source because we think data collection should be transparent. You can read every SQL query Arq Signals will run. You can audit the binary. You own the output.

What Arq Signals does

  • Connects to one or more PostgreSQL instances (14+)
  • Runs 73 read-only diagnostic collectors covering:
    • Server configuration, identity, and cluster fingerprint
    • Session activity and connection pressure
    • Table, index, and I/O statistics (incl. pg_stat_io / pg_stat_wal)
    • Schema metadata: columns, constraints, indexes, sequences, triggers, views, materialised views, partitions, functions
    • Storage placement: tablespaces, per-relation storage, per- attribute storage
    • Query intelligence (via pg_stat_statements, self-filtered to exclude Signals' own probe queries and scoped to the connected database)
    • Transaction wraparound risk and prepared-transaction age
    • Vacuum and autovacuum health
    • In-flight operation progress — six pg_stat_progress_* collectors covering vacuum, analyze, create_index, cluster, basebackup, copy
    • Index hygiene — derived findings for unused, invalid, redundant, and duplicate indexes
    • Bloat estimation — statistical table and index bloat without pgstattuple (runs on managed PG)
    • Replication: physical (pg_stat_replication), slot risk (pg_replication_slots), and logical-slot health (pg_stat_replication_slots)
    • Checkpoint, background writer, and checkpointer pressure
    • Storage growth, largest relations, temp I/O pressure
    • Per-role / per-database configuration overrides
    • Foreign data wrappers and partition topology
    • Vector / pgvector column inventory
    • Role capabilities and login-role surface
  • Stores results locally in SQLite as structured NDJSON
  • Schedules collection with configurable cadences (5m to 7d per query)
  • Packages snapshots as portable ZIP archives
  • Exposes a local HTTP API for triggering collection, pausing / resuming targets, and reloading configuration
  • Provides a CLI (arqctl) for operations, including pre-flight diagnostics (arqctl doctor) and classified connection-test (arqctl connect test)
  • Per-target sensitivity profiles, per-class retention, and a per-target circuit breaker for operator safety during incidents

Specification & Test-Driven Development (STDD)

Arq Signals is developed using STDD — a methodology where the specification and tests define the system, and code is a replaceable artifact that must satisfy both.

The repository contains:

  • Formal specification — 104 numbered requirements covering collection, safety, configuration, API, persistence, and diagnostics (specification.md)
  • Acceptance tests — 240+ test cases derived directly from the specification (per-collector acceptance files under specifications/collectors/, plus the cross-cutting acceptance-tests.md)
  • Traceability matrix — every requirement mapped to executable tests with evidence classification (behavioral, structural, or integration) (traceability.md)
  • Language-neutral contracts — API and configuration schemas defined as appendices, independent of the Go implementation (Appendix A, Appendix B)

This approach matters for a tool that connects to production databases. Every safety guarantee — read-only enforcement, role validation, credential handling — is formally specified, tested, and traceable. You can verify the claims without reading the implementation.

Why DBAs trust Arq Signals

  • All PostgreSQL queries execute inside READ ONLY transactions, enforced at three independent layers
  • Role safety validation blocks superuser, replication, and bypassrls roles before any query runs
  • Defensive session timeouts (statement_timeout, lock_timeout, idle_in_transaction_session_timeout) prevent runaway queries
  • The collector never performs write operations on PostgreSQL — this is enforced by static SQL linting, session configuration, and transaction access mode
  • Credentials are never stored in snapshots, export metadata, API responses, or log output
  • If an unsafe role override is used, it is explicitly recorded in export metadata with the specific bypassed checks
  • The entire safety model is formally specified and covered by 800+ automated tests across the module

For the full safety model, see docs/runtime-safety-model.md.

Examples

Example Description
Local safe role Recommended production setup with arq_signals monitoring role
Local superuser override Dev/test setup with postgres superuser (unsafe override)
Docker Container build, run, and export workflow
Docker Compose Quick start with PostgreSQL 16
Helm Kubernetes deployment with the starter Helm chart
Snapshot inspection How to inspect and understand export output
Snapshot example Static reference snapshot for offline review

Supported PostgreSQL versions

Arq Signals has first-class catalog support for PostgreSQL 14, 15, 16, 17, and 18. Each major has its own catalog file (internal/pgqueries/catalog_pgN.go) that carries the SQL needed when a pg_stat_* view's column shape differs from the version- agnostic default. Logical collector IDs (e.g. pg_stat_io_v1) stay stable across majors — only the SQL underneath changes.

A per-cycle discovery probe runs first on each target and returns the server's version, server_version_num, installed extensions, current database, and current user. Catalog selection is driven by that probe, not by configured assumption. Version-specific collectors (e.g. checkpointer_stats_v1 on PG 17+) and extension-gated collectors (e.g. pg_stat_statements_v1) are included or skipped automatically.

PostgreSQL 19 is treated as experimental: the daemon falls back to the highest supported catalog (PG 18) and logs a startup warning so the experimental status is visible. PostgreSQL versions below 14 are out of scope.

Security model

Arq Signals is local-first by design:

  • No data egress. The daemon writes snapshots only to local storage (a SQLite file under database.path) and serves them via the HTTP API on the operator's listener. There is no outbound telemetry, no LLM call, no analytics ping. The repository's boundary tests assert this at every build.
  • Read-only PostgreSQL access. Three layers: a static SQL linter rejects DDL/DML at registration; the session is set to default_transaction_read_only=on; every collector query runs inside a BEGIN ... READ ONLY transaction. Roles with rolsuper, rolreplication, or rolbypassrls are refused.
  • No secrets in artifacts. Passwords, API tokens, and DSNs never appear in logs, exports, or the metrics endpoint. A central audit-event denylist filters secret-shaped attribute keys before any slog record is emitted (R078).
  • Off-by-default surfaces. The Prometheus /metrics endpoint (R079), the high-sensitivity collector pack (R075), and the per-collector export view (R080) are all opt-in and have no effect unless explicitly enabled in signals.yaml.

Verifying a release

Releases are published as multi-arch container images at ghcr.io/elevarq/arq-signals with cosign keyless signatures, an SPDX SBOM (OCI attestation and downloadable file), and a SLSA build provenance attestation (mode=max).

Quick signature verification:

cosign verify ghcr.io/elevarq/arq-signals:<VERSION> \
  --certificate-identity-regexp='github.com/Elevarq/Arq-Signals/.github/workflows/release.yml@' \
  --certificate-oidc-issuer='https://token.actions.githubusercontent.com'

Inspect the SBOM (registry attestation):

cosign download sbom ghcr.io/elevarq/arq-signals:<VERSION> > sbom.spdx.json

Confirm the image is multi-arch:

docker buildx imagetools inspect ghcr.io/elevarq/arq-signals:<VERSION>

Full operator checklist (provenance, Trivy re-scan, OCI labels, etc.): docs/release-verification.md.

Installation

Docker Compose (recommended for trying)

git clone https://github.com/elevarq/arq-signals.git
cd arq-signals
docker compose -f examples/docker-compose.yml up -d

Docker (bring your own PostgreSQL)

docker run -d --name arq-signals \
  -e ARQ_SIGNALS_TARGET_HOST=host.docker.internal \
  -e ARQ_SIGNALS_TARGET_USER=arq_monitor \
  -e ARQ_SIGNALS_TARGET_DBNAME=postgres \
  -e ARQ_SIGNALS_TARGET_PASSWORD_ENV=PG_PASSWORD \
  -e PG_PASSWORD=your_password \
  -e ARQ_ALLOW_INSECURE_PG_TLS=true \
  -e ARQ_ENV=dev \
  -v arq-data:/data \
  -p 8081:8081 \
  ghcr.io/elevarq/arq-signals:latest

Build from source

git clone https://github.com/elevarq/arq-signals.git
cd arq-signals
make build    # produces bin/arq-signals and bin/arqctl
./bin/arq-signals --config signals.yaml

See examples/signals.yaml for a complete annotated configuration file.

Recommended PostgreSQL role

Arq Signals is designed to run using a dedicated monitoring role, not the PostgreSQL superuser. For production use, create a role such as arq_signals and grant the pg_monitor predefined role:

CREATE ROLE arq_signals LOGIN;
GRANT pg_monitor TO arq_signals;
GRANT CONNECT ON DATABASE your_database TO arq_signals;

-- Optional: enable query-level statistics
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

The default postgres role is a superuser and will be rejected by the safety model unless the operator explicitly enables unsafe override mode (ARQ_SIGNALS_ALLOW_UNSAFE_ROLE=true). This behavior is intentional — it prevents accidental execution with elevated privileges in production.

For a full discussion of the role posture — including what additional access (if any) the high-sensitivity collector pack needs and how to audit the role — see docs/postgres-role.md.

Optional: Prometheus metrics

Arq Signals can expose an opt-in /metrics endpoint with operational counters and gauges (collection outcomes, export outcomes, sqlite persistence health, high-sensitivity gate state). The endpoint never publishes collected PostgreSQL data. Off by default; enable with signals.metrics_enabled: true. See docs/prometheus.md for the safety scope guarantees and docs/metrics-consumer-guide.md for the full metric inventory, scrape configuration, and recommended alerting rules.

Using Arq Signals

Trigger a collection

# Via CLI
arqctl collect now

# Via API
curl -X POST http://localhost:8081/collect/now \
  -H "Authorization: Bearer $ARQ_SIGNALS_API_TOKEN"

Export snapshots

# Via CLI
arqctl export --output snapshot.zip

# Via API
curl -o snapshot.zip http://localhost:8081/export \
  -H "Authorization: Bearer $ARQ_SIGNALS_API_TOKEN"

Pre-flight diagnostics

# Check config, store path, target reachability, role safety,
# collector prerequisites, and snapshot freshness in one pass.
arqctl doctor

# Test one connection (or an ad-hoc DSN) with classified
# failure reasons: ok / dns / tcp / tls / auth / startup / role /
# password_resolve / config.
arqctl connect test prod-db
arqctl connect test --dsn "host=db.example.com port=5432 dbname=app user=arq sslmode=require password_env=APP_DB_PW"

Pause / resume a target during an incident

# Stop collecting from a target without taking the daemon down.
# State is in-memory; daemon restart resumes all targets.
arqctl collect pause --target=prod-db --reason="investigating incident #4321"

# Bring it back.
arqctl collect resume --target=prod-db

The daemon also auto-pauses (state open) a target that fails 3 consecutive collection cycles and auto-recovers after a cooldown (default 5 minutes). See R097 in features/arq-signals/specification.md for the full state-machine spec.

Reload configuration without restart

# SIGHUP path
kill -HUP $(pidof arq-signals)

# HTTP path
curl -X POST http://localhost:8081/reload \
  -H "Authorization: Bearer $ARQ_SIGNALS_API_TOKEN"

v1 reload scope is the target list — add / remove / modify connection params or collectors.profile. poll_interval, retention, and circuit thresholds remain set-at-construction (documented as future scope).

Check status

arqctl status

Snapshot format

Arq Signals produces snapshots in the arq-snapshot.v1 format:

snapshot.zip
├── metadata.json          # collector version, timestamp, PG version
├── query_catalog.json     # which queries were executed
├── query_runs.ndjson      # execution metadata (timing, row counts, errors)
├── query_results.ndjson   # the actual data (one JSON object per row)
└── snapshots.ndjson       # legacy combined format

Example metadata.json:

{
  "schema_version": "arq-snapshot.v1",
  "collector_version": "0.1.0",
  "collector_commit": "abc1234",
  "collected_at": "2026-03-14T10:30:00Z",
  "instance_id": "a1b2c3d4e5f6"
}

Example query_results.ndjson (one line per query):

{"run_id":"01JD...","payload":[{"name":"max_connections","setting":"100","unit":"","source":"configuration file"},{"name":"shared_buffers","setting":"16384","unit":"8kB","source":"configuration file"}]}

The format is versioned. Breaking changes will bump schema_version.

A complete example snapshot is available at examples/snapshot-example/ — you can inspect exactly what Arq Signals collects without running it.

Collected signals

Arq Signals includes 73 read-only collectors. Grouped by domain:

  • Baseline & runtime — server config, sessions, databases, tables, indexes, table / index I/O, query stats (pg_stat_statements)
  • Schema model — columns, constraints, indexes, partitions, sequences, schemas, triggers, views, materialised views, functions, planner stats, extended statistics, vector columns
  • Definitions — view, materialised-view, function, and trigger definitions (DDL bodies)
  • Storage placement — tablespaces, per-relation storage, per-attribute storage
  • In-flight operations — six pg_stat_progress_* collectors (vacuum, analyze, create_index, cluster, basebackup, copy)
  • Index hygiene — derived findings: unused, invalid, redundant, duplicate
  • Bloat estimation — statistical table-bloat and index-bloat estimates without pgstattuple
  • Wraparound risk — XID age at database / relation level, freeze blockers, prepared-transaction age
  • Vacuum / checkpointer / bgwriter — autovacuum health, checkpointer stats (PG 17+), bgwriter pressure
  • Replicationpg_stat_replication, pg_replication_slots, pg_stat_replication_slots (logical slot health)
  • Operational pressure — connection utilisation, blocking locks, long-running transactions, idle-in-transaction offenders, temp I/O, lock summary
  • Identity & configuration — server identity, cluster identity (network fingerprint), extension inventory, role capabilities, login roles, per-role / per-database GUC overrides
  • Foreign data wrappers — wrappers, servers, user mappings, foreign tables

Collectors requiring unavailable extensions or unsupported PostgreSQL versions are silently skipped and surface with a reason in collector_status.json. Replication collectors return empty results on standalone instances.

See docs/collectors.md for the full inventory with query IDs, PostgreSQL sources, and cadences. Every query is visible in internal/pgqueries/.

API

Method Path Auth Description
GET /health No Liveness probe, always 200
GET /status Bearer Collector status, targets, last collection
POST /collect/now Bearer Trigger immediate collection (optional JSON body to narrow targets)
GET /export Bearer Download snapshot ZIP

Set ARQ_SIGNALS_API_TOKEN to configure the bearer token. If unset, a random token is generated at startup and logged (fingerprint only; the value is never logged).

POST /collect/now examples

The body is optional. An empty / missing body keeps the historical "collect every enabled target" behaviour. When present, the body may carry an optional targets subset, an optional request_id correlation identifier, and an optional reason label.

# 1. No body — collect every enabled target.
curl -s -X POST http://127.0.0.1:8081/collect/now \
  -H "Authorization: Bearer ${ARQ_SIGNALS_API_TOKEN}"

# 2. Narrow to a subset of configured targets.
curl -s -X POST http://127.0.0.1:8081/collect/now \
  -H "Authorization: Bearer ${ARQ_SIGNALS_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"targets":["prod-main"]}'

# 3. Caller-supplied correlation id and reason.
curl -s -X POST http://127.0.0.1:8081/collect/now \
  -H "Authorization: Bearer ${ARQ_SIGNALS_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
        "targets":    ["prod-main", "prod-reporting"],
        "request_id": "scheduled_run_2026_04_25",
        "reason":     "automated_cycle"
      }'

A successful response (HTTP 202):

{
  "status": "collection triggered",
  "request_id": "scheduled_run_2026_04_25",
  "accepted_targets": ["prod-main", "prod-reporting"]
}

A rejection (HTTP 400) — invalid target name:

{
  "error": "one or more targets cannot be collected",
  "accepted_targets": ["prod-main"],
  "rejected_targets": [
    {"name": "does-not-exist", "reason": "unknown_target"}
  ]
}

The cycle is not triggered when any target was rejected.

For the full request schema, validation rules, and audit-trace behaviour, see docs/control-plane.md.

Control plane support

POST /collect/now accepts an optional JSON body that lets a caller narrow the cycle to a configured + enabled subset of targets. The configured target list in signals.yaml is the authoritative ceiling — no caller can introduce a database name that wasn't already configured.

Two optional correlation fields ride along with the request:

  • request_id (regex ^[A-Za-z0-9_-]{1,32}$) — caller-supplied correlation identifier. When absent, Arq Signals generates a ULID.
  • reason (regex ^[A-Za-z0-9_-]{1,64}$) — short tag-style label surfaced in audit events.

Every accepted request produces a deterministic audit trace keyed by request_id:

collect_now_requested  →  collection_started  →  collection_completed   (per target)

Validation failures emit collect_now_rejected; requests that queue but can't run (channel full, or cycle overlap) emit collect_now_dropped. See "Audit guarantees" below.

Operators who want the commercial Arq control plane to drive this endpoint additionally enable Mode B authentication — see the next section. The endpoint itself works in both modes.

Reference: docs/control-plane.md.

Authentication modes

Arq Signals supports two modes, configured by signals.mode in signals.yaml (default standalone).

Standalone mode (default)

A single bearer token (api.token) authorises every request. Matched-token audit events carry actor=local_operator. This is the only mode every open-source deployment needs to know about.

Managed mode (mode: arq_managed)

Adds a second bearer token, the Arq control-plane token, distinct from api.token. The matched token determines the audit identity:

Bearer matched actor
api.token local_operator
arq_control_plane_token arq_control_plane

The actor is sourced from which token matched — it is never inferred from request shape. A caller holding only api.token cannot acquire the arq_control_plane identity by adding a request_id or any other body field.

The control-plane token is supplied via file (preferred) or environment-variable indirection:

signals:
  mode: arq_managed
  arq_control_plane_token_file: /etc/arq/control-plane.token
  # or:
  # arq_control_plane_token_env: ARQ_CONTROL_PLANE_TOKEN

The file is re-read on every authentication attempt so rotation is a single file-write — no daemon restart required. Token length floor is 32 characters; the two tokens must be distinct (constant- time check at startup).

Mode B has no licence-validation surface in Arq Signals. The collector remains open source; the commercial value lives in the Arq control plane's analysis layer, not in obscured collector behaviour. See docs/authentication.md for the full Mode B model, rotation behaviour, and security posture.

Reference: docs/authentication.md.

Audit guarantees

Arq Signals emits structured slog records keyed audit_event=<name> for every operationally significant lifecycle moment. The contract:

No silent request loss. Every accepted /collect/now request reaches a terminal outcome for its request_id along exactly one of three branches:

Branch Terminal records When
rejected one collect_now_rejected validation failed; cycle never queued
dropped one collect_now_dropped queued but cycle never ran (channel full, or cycle overlap)
ran one collection_started per target + one collection_completed per target cycle ran

The "ran" branch is per-target: a request that narrows to two targets emits two started/completed pairs sharing the same request_id; a request that omits targets emits one pair per enabled target. There is no aggregate "cycle complete" record. If a request_id appears on collect_now_requested but the audit log shows no records on any of the three branches, that's a bug.

Token values never logged. A centralised denylist filter in internal/safety/audit.go rejects audit attributes whose key contains password, secret, api_token, token, dsn, connection_string, payload, or query_result. A small hand-curated allow-list overrides the substring match for keys that carry only metadata about a configured value (booleans / fingerprints), never the secret value itself — as of today the allow-list has exactly one entry, the boolean arq_control_plane_token_configured on the mode_configured startup event.

Correlation by request_id. When a caller supplies (or the daemon generates) a request_id, that value is propagated through to every per-target collection_started / collection_completed audit record so the full sequence is greppable as one trail.

For the full event catalogue, attribute schemas, and the secret-handling proof points, see docs/audit-model.md.

Security and data handling

Read-only enforcement (three layers)

  1. Static linting — every SQL query is validated at startup. DDL (CREATE, ALTER, DROP), DML (INSERT, UPDATE, DELETE), and dangerous functions (pg_terminate_backend, pg_sleep) cause the process to abort immediately.
  2. Session-level — all connections set default_transaction_read_only=on.
  3. Per-query — each query runs inside BEGIN ... READ ONLY.

Role safety validation (fail-closed)

Before collecting from any target, Arq Signals validates the connected role's safety posture. Collection is blocked if the role has:

  • Superuser privileges (rolsuper=true)
  • Replication privileges (rolreplication=true)
  • Bypass RLS privileges (rolbypassrls=true)

This is enforced by default with no configuration needed. Use a dedicated monitoring role with pg_monitor for safe collection. See docs/runtime-safety-model.md for details.

Credentials

  • Passwords are read from file or environment variable at connection time
  • Passwords are never cached in memory beyond a single connection attempt
  • Passwords are never written to SQLite
  • Passwords never appear in snapshots or exports
  • Password rotation is supported (re-read on each new connection)

API tokens

  • Both bearer tokens (the local api.token and the optional Mode B arq_control_plane_token) are compared in constant time via crypto/subtle.
  • Token values never appear in audit logs, metrics, error messages, or HTTP responses. The auto-generated api.token logs only its SHA-256 fingerprint at startup.
  • Audit-attribute filtering is centralised: a denylist on attribute key names (password, secret, api_token, token, dsn, connection_string, payload, query_result) drops any record whose key contains a denylisted substring before it leaves the process. A small hand-curated allow-list permits a single configuration-status boolean (arq_control_plane_token_configured) on the mode_configured startup event — never a token value.
  • The control-plane token (when configured) is re-read from file on every authentication attempt. Rotation is a single file-write; no daemon restart is required. See docs/authentication.md.

Network

  • Arq Signals makes no outbound network connections except to your PostgreSQL targets
  • No telemetry, no analytics, no phone-home
  • The HTTP API binds to loopback by default (127.0.0.1:8081)

Container hardening

When deployed via Docker, Arq Signals runs as a non-root user (UID 10001) on a minimal Alpine 3.21 base. The image contains BusyBox (used by the wget-based healthcheck and tini init) and no Bash, sh or other full shell beyond BusyBox's ash applet. For deployments that require a shell-free runtime, build against a distroless base — the binary is statically linked and CGO-free so it runs without glibc.

Configuration reference

Arq Signals reads configuration from (in order):

  1. --config flag
  2. /etc/arq/signals.yaml
  3. ./signals.yaml

Environment variables override file-based config. See examples/signals.yaml for a complete annotated example.

Environment variable Description Default
ARQ_ENV Environment: dev, lab, prod dev
ARQ_ALLOW_INSECURE_PG_TLS Allow weak TLS in non-prod false
ARQ_SIGNALS_ALLOW_UNSAFE_ROLE Allow unsafe role attributes (lab/dev only) false
ARQ_SIGNALS_TARGET_HOST PostgreSQL host --
ARQ_SIGNALS_TARGET_PORT PostgreSQL port 5432
ARQ_SIGNALS_TARGET_DBNAME Database name postgres
ARQ_SIGNALS_TARGET_USER Username --
ARQ_SIGNALS_TARGET_NAME Target name default
ARQ_SIGNALS_TARGET_PASSWORD_FILE Path to password file --
ARQ_SIGNALS_TARGET_PASSWORD_ENV Env var containing the password --
ARQ_SIGNALS_TARGET_PGPASS_FILE Path to pgpass file --
ARQ_SIGNALS_TARGET_SSLMODE TLS mode --
ARQ_SIGNALS_POLL_INTERVAL Collection interval 5m
ARQ_SIGNALS_RETENTION_DAYS Days to retain data 30
ARQ_SIGNALS_LOG_LEVEL Log level: debug, info, warn, error info
ARQ_SIGNALS_LOG_JSON JSON log format false
ARQ_SIGNALS_MAX_CONCURRENT_TARGETS Max parallel targets 4
ARQ_SIGNALS_TARGET_TIMEOUT Per-target timeout 60s
ARQ_SIGNALS_QUERY_TIMEOUT Per-query timeout 10s
ARQ_SIGNALS_LISTEN_ADDR API listen address 127.0.0.1:8081
ARQ_SIGNALS_DB_PATH SQLite database path /data/arq-signals.db
ARQ_SIGNALS_WRITE_TIMEOUT API write timeout 180s
ARQ_SIGNALS_API_TOKEN Bearer token for API auth auto-generated

Architecture and scope

Arq Signals is the open-source collection layer of the Arq platform. It is a complete, standalone tool — not a crippled free tier.

┌─────────────────┐
│   Arq Signals   │  Collects diagnostic signals from PostgreSQL.
│  (open source)  │  Produces portable snapshots. This repository.
└────────┬────────┘
         │ snapshot (ZIP / NDJSON)
         ▼
┌─────────────────┐
│       Arq       │  Analyzes signals. Scores health. Generates
│    (private)    │  findings and recommendations.
└────────┬────────┘
         │ findings
         ▼
┌─────────────────┐
│ Arq Workbench   │  Presents results to engineers.
│    (private)    │  Interactive UI for DBA workflows.
└─────────────────┘

The snapshot format (arq-snapshot.v1) is the stable contract between layers. Each layer is independently deployable and separately maintained.

Arq Signals is fully usable on its own. You do not need Arq or Arq Workbench to collect, export, or inspect your PostgreSQL diagnostics. Many teams use Arq Signals purely for data collection, feeding the snapshots into their own scripts, dashboards, or analysis workflows.

What stays out of Arq Signals — by design

The boundary between Signals and the rest of the platform is intentional, not accidental:

Capability Where it lives Why not in Signals
Database analysis Arq Interpretation is a separate concern from evidence collection
Health scoring Arq Scoring requires domain judgment that evolves independently
AI / LLM Arq Language models are not needed for safe data collection
Recommendations Arq Remediation advice requires analysis context
Cloud services None No component phones home or uploads data
Telemetry None No usage tracking exists anywhere in the platform

This separation keeps the collector small, auditable, and safe to run in restricted environments where third-party analysis tools may not be permitted.

Project status

Arq Signals v0.5.0 — the collection engine, safety model, and snapshot format are stable and tested (800+ automated tests, 104 STDD requirements). Smoke-tested against PostgreSQL 14, 15, 16, 17, and 18. Released container images are published to GHCR and Docker Hub with SBOM (SPDX) and SLSA provenance.

Roadmap:

  • Kubernetes deployment examples
  • Community-contributed collectors
  • bloat_exact_v1 / index_bloat_exact_v1pgstattuple-gated precision variants of the existing statistical bloat collectors

Development methodology

This project follows STDD — Specification & Test-Driven Development. Specifications and tests define correct behavior. Implementation is written to satisfy those rules. The development policy is defined in CLAUDE.md.

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines and GOVERNANCE.md for project governance.

In scope: new collectors, bug fixes, performance, documentation. Out of scope: analysis, scoring, AI (those belong in a downstream analyzer).

Project resources

Related

  • Elevarq — PostgreSQL tools for engineering teams
  • Arq — commercial PostgreSQL intelligence platform; Arq Signals is its open-source collection layer
  • pgAgroal Container — production-ready container distribution of pgagroal, a high-performance PostgreSQL connection pooler

License

BSD-3-Clause. See LICENSE.

Free to use, modify, and distribute for any purpose, including commercial use.

About

Read-only PostgreSQL diagnostic collector. No AI, no cloud, no write operations. Collects statistics from pg_stat views and packages them as portable snapshots.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages