Skip to content

Proposal: GuardrailProvider protocol for tool call interception #7405

Description

@uchibeke

Summary

Propose a GuardrailProvider protocol that intercepts tool calls before execution, enabling policy-based approval, audit logging, and argument sanitization. This plugs into the existing BaseTool.run_json() and Workbench.call_tool() paths without breaking backward compatibility.

Motivation

AutoGen currently has no standardized hook point between an agent deciding to call a tool and the tool executing. The community has raised this gap from multiple angles:

Issue #5891 tackles the approval surface specifically but scopes it to a boolean gate. A GuardrailProvider generalizes this to support argument rewriting, structured denial reasons, audit metadata, and composable policy chains -- all concerns raised across the issues above.

Proposed Interface

from __future__ import annotations

from abc import abstractmethod
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Mapping, Protocol, Sequence, runtime_checkable

from autogen_core import CancellationToken


class Decision(Enum):
    ALLOW = "allow"
    DENY = "deny"
    MODIFY = "modify"


@dataclass
class GuardrailResult:
    """Outcome of a guardrail evaluation."""

    decision: Decision
    reason: str | None = None
    modified_args: Mapping[str, Any] | None = None  # only when Decision.MODIFY
    metadata: dict[str, Any] = field(default_factory=dict)  # audit trail data


@runtime_checkable
class GuardrailProvider(Protocol):
    """Intercepts tool calls before execution for policy enforcement."""

    @abstractmethod
    async def evaluate(
        self,
        *,
        tool_name: str,
        args: Mapping[str, Any],
        agent_name: str | None = None,
        call_id: str | None = None,
        cancellation_token: CancellationToken | None = None,
    ) -> GuardrailResult:
        """Evaluate whether a tool call should proceed.

        Args:
            tool_name: Name of the tool being invoked.
            args: Arguments the agent wants to pass.
            agent_name: Identity of the calling agent, if known.
            call_id: Correlation ID for the tool call.
            cancellation_token: For cooperative cancellation.

        Returns:
            GuardrailResult indicating allow, deny, or modify.
        """
        ...

Integration Points

1. BaseTool.run_json() -- tool-level guard

Minimal change to run_json() in BaseTool:

async def run_json(
    self,
    args: Mapping[str, Any],
    cancellation_token: CancellationToken,
    call_id: str | None = None,
) -> Any:
    effective_args = args

    for provider in self._guardrail_providers:
        result = await provider.evaluate(
            tool_name=self._name,
            args=effective_args,
            call_id=call_id,
            cancellation_token=cancellation_token,
        )
        if result.decision == Decision.DENY:
            return f"Tool call denied: {result.reason or 'policy violation'}"
        if result.decision == Decision.MODIFY and result.modified_args is not None:
            effective_args = result.modified_args

    validated = self._args_type.model_validate(effective_args)
    return await self.run(validated, cancellation_token)

2. Workbench.call_tool() -- workbench-level guard

For MCP and dynamic tool sources, guardrails can wrap call_tool() at the workbench layer, covering tools that do not subclass BaseTool.

3. AssistantAgent -- agent-level guard

Pass providers to AssistantAgent which forwards them to its tools, consistent with the pattern proposed in #5891 for approval_func.

Constructor Addition to BaseTool

def __init__(
    self,
    args_type: Type[ArgsT],
    return_type: Type[ReturnT],
    name: str,
    description: str,
    strict: bool = False,
    guardrail_providers: Sequence[GuardrailProvider] = (),  # new, optional
) -> None:

Fully backward compatible -- existing tools and subclasses are unaffected.

Design Rationale

Decision Why
Protocol, not ABC Matches AutoGen's use of runtime_checkable protocols; avoids forcing inheritance
Decision enum with MODIFY Addresses #5891's open question about parameter modification vs. simple approval
metadata on result Supports audit trail requirements from #5921 (AIAM)
Keyword-only evaluate() args Future-proof; new fields can be added without breaking implementations
Composable chain Multiple providers run in sequence; any DENY short-circuits

Example: Rate-Limiting Provider

import time
from collections import defaultdict

class RateLimitGuardrail:
    def __init__(self, max_calls: int = 10, window_seconds: float = 60.0):
        self._max = max_calls
        self._window = window_seconds
        self._calls: dict[str, list[float]] = defaultdict(list)

    async def evaluate(
        self, *, tool_name, args, agent_name=None, call_id=None, cancellation_token=None
    ) -> GuardrailResult:
        now = time.monotonic()
        recent = [t for t in self._calls[tool_name] if now - t < self._window]
        if len(recent) >= self._max:
            return GuardrailResult(
                decision=Decision.DENY,
                reason=f"Rate limit: {self._max} calls per {self._window}s exceeded",
            )
        self._calls[tool_name] = [*recent, now]
        return GuardrailResult(decision=Decision.ALLOW)

Relationship to Existing Work

A reference implementation of policy-based tool guardrails using this interface pattern is available in the APort Agent Guardrails project.

Scope and Non-Goals

This proposal covers tool call interception only. It does not cover:

Next Steps

  1. Gather feedback on Protocol vs. ABC and the MODIFY decision variant.
  2. Decide whether to integrate with Support Approval Func in BaseTool in AgentChat #5891's approval_func or supersede it.
  3. Prototype in autogen-core with tests against FunctionTool and McpWorkbench.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions