Skip to content

Generic ProviderToolError → session cleanup hook #25

@hrhrng

Description

@hrhrng

Problem

Across providers, agents lose access to a resource silently — without a webhook telling us. Currently each provider hits this and the cleanup is missing or ad-hoc.

Concrete cases observed:

Provider Loss-of-access scenario Webhook? Current behavior
Slack /kick @bot from a channel (per_channel session) ❌ Slack doesn't deliver member_left_channel for bot self-leave (verified end-to-end on staging in #24) Channel-session row stays active, scheduleWakeup keeps firing, every wake → not_in_channel 401, agent burns tokens
Linear Bot unassigned from issue / issue archived under bot partial (some events) not yet wired
GitHub App removed from repo / repo deleted partial not yet wired
MCP servers (generic) Token revoked upstream ❌ no webhook every tool call 401s; no cleanup

The unifying observation: the agent's tool call IS the signal. When a provider returns an error semantically equivalent to "you no longer have access to this resource", that should:

  1. Mark the session/scope row closed in the provider's persistence (e.g. Slack: slack_thread_sessions.status = completed)
  2. Cancel any pending scheduleWakeup for that session
  3. Stop the agent for this turn (don't just retry the same call into a wall)

Proposed shape

A generic harness hook, not provider-specific RPC. Sketch:

  • Each provider tool wrapper declares a small predicate: "does this error mean lost-access for the current resource?" (e.g. Slack: error code in {channel_not_found, not_in_channel, account_inactive, token_revoked}; Linear: 404 on the assigned issue + bot not in members; etc.)
  • When the predicate matches, the harness calls a single internal API: closeSession({ reason: 'lost_access', detail })
  • That handler routes to the provider's own closeSessionForScope(...) (already implemented in Slack via sessionScopes.updateStatus(... 'completed') + clearPendingScan(...)); other providers add similar
  • closeSession also cancels pending scheduleWakeups owned by this session

This avoids:

  • Each provider re-inventing a service-binding RPC back to integrations
  • Slack-specific code paths in the agent harness
  • Three rewrites (Slack → Linear → GitHub) before being forced to abstract

Why not now / why a separate PR

PR #24 is per_channel-scoped and end-to-end green on staging. The right design for this hook needs the Linear and GitHub cases at the table — doing a Slack-only version now means rewriting on the second provider. Better to:

Acceptance criteria

  • Predicate interface defined in agent harness (likely apps/agent/src/harness/)
  • Slack tool wrapper implements predicate for at least: channel_not_found, not_in_channel, account_inactive, token_revoked
  • Linear tool wrapper implements predicate for the equivalent loss-of-access errors
  • GitHub tool wrapper implements predicate for the equivalent loss-of-access errors
  • When predicate matches, session is marked closed in provider persistence
  • Pending scheduleWakeups for the closed session are cancelled
  • Unit tests per provider for the predicate; integration test in agent harness for the full close path
  • Verified in staging: bot kicked from Slack channel → on next wakeup, agent's call 401s, scope row flips to completed, no further wakeups fire

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions