Goal
Give Operator users visibility into runtime errors without tailing logs. Errors like transport crashes, LLM failures after exhausting the fallback chain, tool execution exceptions, and memory system failures should surface automatically when Sentry is configured.
Design
Opt-in via environment variable
If SENTRY_DSN is set (typically in ~/.operator/.env), Sentry initializes automatically. No config file changes required for the common case.
Optional config for power users
Add an optional sentry block under runtime in operator.yaml for fine-tuning:
dsn_env — env var name to read DSN from (defaults to SENTRY_DSN)
environment — Sentry environment tag (defaults to production)
sample_rate — error sample rate (defaults to 1.0)
traces_sample_rate — performance tracing (defaults to 0.0, off)
Optional dependency
Add sentry-sdk as an optional dependency (pip install operator-ai[sentry]). Operator runs fine without it — if the SDK isn't installed, Sentry init is silently skipped even if a DSN is present.
Implementation
New module: sentry.py
A small module (~50 lines) that handles SDK initialization, before_send PII stripping, and a helper to capture exceptions with contextual tags.
Init point
Call the init function early in async_main(), after config is loaded and env file is processed, but before transports start.
Contextual tags on every captured error:
agent — which agent was active
model — which LLM model was being called
transport — transport type (slack, etc.)
conversation_id — for correlating related errors (not message content)
tool — tool name if the error occurred during tool execution
These align with the existing set_run_context() pattern.
Capture points (5 locations):
_run_conversation catch-all in main.py — unhandled agent errors
_on_done transport crash callback in main.py — transport failures
run_agent in agent.py after all models in the fallback chain fail — LLM exhaustion
- Tool execution error handler in
agent.py — individual tool failures
- Memory search failure in
main.py — memory system errors
Privacy
send_default_pii set to False
before_send hook strips message content from exception context — only metadata (agent name, model, conversation ID) is sent
- API keys and tokens are never attached to events
Out of scope
- Performance tracing (off by default, can be enabled via config but no custom spans planned)
- Sentry cron monitoring for jobs (could be a follow-up)
- Any dashboard or alerting setup — that's the user's Sentry project config
Goal
Give Operator users visibility into runtime errors without tailing logs. Errors like transport crashes, LLM failures after exhausting the fallback chain, tool execution exceptions, and memory system failures should surface automatically when Sentry is configured.
Design
Opt-in via environment variable
If
SENTRY_DSNis set (typically in~/.operator/.env), Sentry initializes automatically. No config file changes required for the common case.Optional config for power users
Add an optional
sentryblock underruntimeinoperator.yamlfor fine-tuning:dsn_env— env var name to read DSN from (defaults toSENTRY_DSN)environment— Sentry environment tag (defaults toproduction)sample_rate— error sample rate (defaults to1.0)traces_sample_rate— performance tracing (defaults to0.0, off)Optional dependency
Add
sentry-sdkas an optional dependency (pip install operator-ai[sentry]). Operator runs fine without it — if the SDK isn't installed, Sentry init is silently skipped even if a DSN is present.Implementation
New module:
sentry.pyA small module (~50 lines) that handles SDK initialization,
before_sendPII stripping, and a helper to capture exceptions with contextual tags.Init point
Call the init function early in
async_main(), after config is loaded and env file is processed, but before transports start.Contextual tags on every captured error:
agent— which agent was activemodel— which LLM model was being calledtransport— transport type (slack, etc.)conversation_id— for correlating related errors (not message content)tool— tool name if the error occurred during tool executionThese align with the existing
set_run_context()pattern.Capture points (5 locations):
_run_conversationcatch-all inmain.py— unhandled agent errors_on_donetransport crash callback inmain.py— transport failuresrun_agentinagent.pyafter all models in the fallback chain fail — LLM exhaustionagent.py— individual tool failuresmain.py— memory system errorsPrivacy
send_default_piiset toFalsebefore_sendhook strips message content from exception context — only metadata (agent name, model, conversation ID) is sentOut of scope