Skip to content

observability: declarative tags + baggage propagation across traces and audit #192

Description

@initializ-mk

Summary

Add a forge.yaml observability.tags block for static deployment metadata (team, cost center, environment, region, owner, deploy SHA, etc.) and propagate the same KVs onto both OTel resource attributes (every span) and audit events (new top-level tags field). Optionally stamp inbound W3C Baggage header values into audit.fields.baggage so per-request metadata (tenant, case_id, priority) is filterable in both observability streams.

This closes a small but real gap: operators today can pivot trace ↔ audit by `trace_id`, but can't filter audit by "team" or "tenant" without joining against trace data. After this, the same KVs are present in both streams natively.

Background — what Forge has today

Already wired:

Gaps:

  1. No ergonomic forge.yaml block — operators hand-write `OTEL_RESOURCE_ATTRIBUTES=team=platform,cost_center=ABC-123,...` as a comma-delimited env string.
  2. Baggage values aren't stamped onto audit events, so trace ↔ audit cross-filter on "team" / "tenant" doesn't work natively. You can pivot by `trace_id` but not by the tag itself.

Design

Three OTel mechanisms, picked by metadata lifetime

Lifetime Mechanism Example
Static per deployment OTel resource attributes team, cost_center, environment, region, deploy_sha
Dynamic per request OTel baggage tenant, case_id, priority
Per operation Span attributes (code-coupled) retry attempt #, chosen model version

This issue adds declarative config for the first two. Span attributes are out of scope (operators who need them write Go code).

forge.yaml shape

observability:
  tags:                              # static deployment metadata
    team: platform
    cost_center: ABC-123
    environment: production
    region: us-east-1
    owner: alice@example.com
    deploy_sha: 7d2f1ab
  tracing:
    enabled: true
    propagate_baggage_to_audit: true # NEW: stamp baggage KVs into audit.fields.baggage; default false (opt-in)

At runtime Forge:

  1. Builds `OTEL_RESOURCE_ATTRIBUTES` from `observability.tags` (k=v,k=v,...) and appends to any value already in the operator's env (preserve operator overrides; don't overwrite). Strands and Node sidecars inherit via the curated OTEL_* passthrough.
  2. Stamps `tags` on every `AuditEvent` (new top-level field, `omitempty`, identical `map[string]string` shape). One audit-side helper `AuditLogger.WithTags(map[string]string)` mirrors the existing `WithTenancy` pattern from FWS-7.
  3. Extracts inbound `Baggage` header via the composite propagator (already happens). When `propagate_baggage_to_audit: true`, marshals baggage KVs onto `audit.fields.baggage` for the request scope. Default off — keeps audit JSON byte-stable for pre-fix consumers.

Cardinality + size guardrails

  • Resource attributes are stamped on every span — high cardinality explodes index sizes in Tempo/Honeycomb/Datadog. Validate at config-load time: ≤ ~12 known values per tag key. Document the trade.
  • Baggage rides in an HTTP header — W3C spec is 8 KB total, ≤ 64 entries, ≤ 256-byte values. Forge rejects inbound `Baggage` over the header limit with a typed error (do NOT silently truncate — that breaks downstream services that did size-check correctly).
  • Audit `tags` budget — default 4 KB, hard cap 16 KB. Tag-explosion attempts (huge multi-value tags) get truncated with `tags_truncated: true` field so consumers detect.
  • PII concerns — baggage propagates downstream uncontrolled. Document that operators should use opaque IDs (`case_id=CASE-7821`), not raw PII. Add a `tags`-side warning in the validator when a value looks email-shaped.

Concept separation (don't conflate fields)

Concept Field Use for
Hard tenancy identity `org_id`, `workspace_id` SIEM partitioning, tenant boundary
Agent identity `entity_id`, `entity_type` Which deployed agent emitted this
Workflow correlation `workflow_id`, `stage_id`, `step_id`, `invocation_caller` FWS-2 orchestrator coordination
Free-form deployment metadata `tags` (new) Team / cost / env / region / owner / SHA
Per-request operator metadata `fields.baggage` (new) Tenant / case / priority — varies per call

Tags = static, low-cardinality, deployment-defined. Baggage = dynamic, per-request, propagates downstream.

Implementation scope

  • `forge-core/types/config.go` — `ObservabilityConfig.Tags map[string]string` + `Tracing.PropagateBaggageToAudit bool` fields.
  • `forge-core/validate/forge_config.go` — validate keys (lowercase kebab-case, non-PII-shaped), values (≤ 256 bytes), total size (≤ 4 KB), cardinality hint.
  • `forge-cli/runtime/runner.go` — at startup, build `OTEL_RESOURCE_ATTRIBUTES` from tags, append to existing env value; call `auditLogger.WithTags(tags)`.
  • `forge-core/runtime/audit_schema.go` — `AuditEvent.Tags map[string]string \`json:"tags,omitempty"\``; `AuditLogger.WithTags(...)` helper.
  • `forge-core/runtime/audit.go` — when `PropagateBaggageToAudit` set, `EmitFromContext` reads `baggage.FromContext(ctx)` and marshals into `fields.baggage`.
  • Schema version stays at `"1.0"` per FWS-8 policy (additive optional fields).

Tests

  • `TestObservabilityTags_StampedOnAuditEvents` — every event in a fake invocation carries the configured tags.
  • `TestObservabilityTags_StampedOnSpans` — verify resource attributes pass through (via `OTEL_RESOURCE_ATTRIBUTES`) using a `tracetest.SpanRecorder`.
  • `TestObservabilityTags_AppendedToExistingEnv` — operator-set `OTEL_RESOURCE_ATTRIBUTES=foo=bar` not clobbered; Forge appends.
  • `TestObservabilityTags_ValidatedAtConfigLoad` — bad keys, oversized values, total budget exceed cases each return a typed config error.
  • `TestBaggagePropagation_FlowsIntoAuditFields` — inbound `Baggage` header lands in `audit.fields.baggage` when `propagate_baggage_to_audit: true`.
  • `TestBaggagePropagation_OffByDefault` — same case with the flag off; `fields.baggage` absent (back-compat pin).
  • `TestBaggagePropagation_RejectsOversizedHeader` — 8 KB+ baggage header returns 400, no silent truncation.

Docs

  • `docs/security/audit-logging.md` — new "Tags" section + "Baggage propagation" subsection. Update event-field table to include `tags` and `fields.baggage`.
  • `docs/core-concepts/observability-tracing.md` — new "Resource attributes via observability.tags" subsection; cross-link to baggage handling in "End-to-end propagation".
  • `docs/reference/forge-yaml-schema.md` — `observability.tags` shape + `tracing.propagate_baggage_to_audit` flag.
  • Recipe: how operators search by tag in Tempo / Honeycomb / Datadog / Grafana.

Out of scope

  • Span attributes from config. Per-operation metadata stays in code (it's by definition not declarative).
  • Tag-based egress / auth decisions. Tags are observability metadata only; never an authorization signal.
  • Auto-derivation of tags from K8s downward API. Operators can wire that via env interpolation in forge.yaml if they want (`team: ${POD_LABEL_TEAM}`); first-class K8s integration is a separate enhancement.
  • Per-tag access control (which audit consumers see which tags). Tags are uniformly visible to anyone with access to the audit stream. A redaction layer would be its own feature.

Risk

Low. Both new fields are additive optional and `omitempty` — pre-fix audit consumers see byte-identical JSON when the operator sets no tags and leaves the baggage flag off. Schema version unchanged. The only behavior change is an extra `OTEL_RESOURCE_ATTRIBUTES` env entry, which OTel SDKs handle natively.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions