Skip to content

Add opt-in Sentry error tracking #25

@geekforbrains

Description

@geekforbrains

Goal

Give Operator users visibility into runtime errors without tailing logs. Errors like transport crashes, LLM failures after exhausting the fallback chain, tool execution exceptions, and memory system failures should surface automatically when Sentry is configured.

Design

Opt-in via environment variable

If SENTRY_DSN is set (typically in ~/.operator/.env), Sentry initializes automatically. No config file changes required for the common case.

Optional config for power users

Add an optional sentry block under runtime in operator.yaml for fine-tuning:

  • dsn_env — env var name to read DSN from (defaults to SENTRY_DSN)
  • environment — Sentry environment tag (defaults to production)
  • sample_rate — error sample rate (defaults to 1.0)
  • traces_sample_rate — performance tracing (defaults to 0.0, off)

Optional dependency

Add sentry-sdk as an optional dependency (pip install operator-ai[sentry]). Operator runs fine without it — if the SDK isn't installed, Sentry init is silently skipped even if a DSN is present.

Implementation

New module: sentry.py
A small module (~50 lines) that handles SDK initialization, before_send PII stripping, and a helper to capture exceptions with contextual tags.

Init point
Call the init function early in async_main(), after config is loaded and env file is processed, but before transports start.

Contextual tags on every captured error:

  • agent — which agent was active
  • model — which LLM model was being called
  • transport — transport type (slack, etc.)
  • conversation_id — for correlating related errors (not message content)
  • tool — tool name if the error occurred during tool execution

These align with the existing set_run_context() pattern.

Capture points (5 locations):

  1. _run_conversation catch-all in main.py — unhandled agent errors
  2. _on_done transport crash callback in main.py — transport failures
  3. run_agent in agent.py after all models in the fallback chain fail — LLM exhaustion
  4. Tool execution error handler in agent.py — individual tool failures
  5. Memory search failure in main.py — memory system errors

Privacy

  • send_default_pii set to False
  • before_send hook strips message content from exception context — only metadata (agent name, model, conversation ID) is sent
  • API keys and tokens are never attached to events

Out of scope

  • Performance tracing (off by default, can be enabled via config but no custom spans planned)
  • Sentry cron monitoring for jobs (could be a follow-up)
  • Any dashboard or alerting setup — that's the user's Sentry project config

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions