Skip to content

Add guided decision points — hybrid execution with contextual tool narrowing #102

@dgenio

Description

@dgenio

Context / Problem

ChainWeaver currently plans two execution modes for tool chains:

  1. Fully deterministic — schema compatibility proves the next step, no LLM needed (Add conditional branching support with safe predicate evaluation #9, Design and implement DAG-based flow model with topological execution #10)
  2. Full agent routing — fall back to standard LLM-driven tool selection (all tools visible)

This is a binary: either ChainWeaver handles the routing entirely, or the LLM does. But there's a powerful middle ground: guided decisions, where ChainWeaver narrows the LLM's choice set and optimizes the decision prompt based on schema analysis.

The problem with full LLM routing

When an agent has 100 tools and calls Tool A, the LLM must:

  • Read all 100 tool descriptions (context window bloat)
  • Interpret Tool A's output
  • Choose the next tool from 100 candidates (high error surface)

In reality, if ChainWeaver knows the schema compatibility graph (#77), it can determine that only 6 of those 100 tools accept Tool A's output type. Presenting 6 choices instead of 100 to the LLM is:

  • Fewer tokens in context (6 descriptions vs 100)
  • Lower error rate (LLM is empirically better at picking from fewer candidates)
  • Faster inference (smaller context = faster generation)
  • Still flexible (the LLM makes a real decision based on output content)

Proposal

1. Three-tier execution model

Extend the executor to support a hybrid execution model:

Tier Condition ChainWeaver behavior
Deterministic Single valid successor (schema match + flow definition) Execute next step directly, zero LLM involvement
Guided Multiple valid successors known from schema analysis Present narrowed tool set + optimized prompt to LLM for decision
Open No schema analysis possible or tool graph unknown Fall back to full agent routing (all tools visible)

2. GuidedDecisionPoint model

class GuidedDecisionPoint(BaseModel):
    """
    A point in a flow where the LLM must choose the next step,
    but the choice set is narrowed by schema compatibility.
    """
    after_step: str                              # Step ID after which the decision occurs
    candidate_tools: list[str]                   # Valid next tools (schema-compatible)
    decision_prompt: str | None = None           # Optimized prompt for the LLM
    context_fields: list[str] = []               # Which output fields are relevant for the decision
    fallback_tool: str | None = None             # Default if LLM fails to choose

3. Narrowed choice presentation

When the executor hits a GuidedDecisionPoint:

# Instead of:
# "Here are 100 tools. Pick one."

# ChainWeaver presents:
# "Tool 'fetch_customer' returned a customer record with fields
#  {name, email, tier, last_order_date}.
#  Choose the next action from these compatible tools:
#  1. enrich_customer — Adds demographic data to a customer record
#  2. send_email — Sends a templated email to a customer
#  3. calculate_ltv — Computes lifetime value from order history
#  4. check_churn_risk — Predicts churn probability from activity
#  5. update_crm — Updates CRM record with new customer data
#  6. generate_report — Creates a formatted customer summary"

4. Decision callback interface

class DecisionCallback(Protocol):
    """
    Interface for LLM-backed decision-making at guided decision points.
    ChainWeaver provides the narrowed context; the callback returns the choice.
    """
    def decide(
        self,
        context: dict[str, Any],
        candidates: list[ToolCandidate],
        prompt: str,
    ) -> str:
        """Return the tool_name to execute next."""
        ...

@dataclass
class ToolCandidate:
    name: str
    description: str
    input_schema_summary: dict[str, str]  # field_name → type description
    compatibility_score: float             # How well the upstream output matches

5. Auto-generation from compatibility graph

ChainAnalyzer (#77) already computes which tools can follow which. Extend it:

class ChainAnalyzer:
    def guided_decision_points(
        self,
        after_tool: str,
        *,
        max_candidates: int = 10,
        min_compatibility: float = 0.5,
    ) -> GuidedDecisionPoint:
        """
        Generate a GuidedDecisionPoint for a given tool,
        including optimized prompt and candidate list.
        """

6. Executor integration

class FlowExecutor:
    def __init__(
        self,
        registry: FlowRegistry,
        *,
        decision_callback: DecisionCallback | None = None,  # NEW
    ) -> None: ...

When a flow step has a GuidedDecisionPoint (or when the executor detects that the next step requires a choice), it invokes decision_callback.decide() with the narrowed context.

7. Prompt optimization

The guided decision prompt can be further optimized by:

  • Including only the output fields relevant to the decision (not the full context)
  • Using the description optimizer (Add offline LLM-assisted tool description optimizer #100) to make candidate descriptions maximally discriminative within this specific choice set
  • Adding schema type information so the LLM understands structural compatibility
  • Providing a confidence threshold: "If no tool clearly fits, return 'none' to fall back to open routing"

Acceptance Criteria

  • GuidedDecisionPoint model exists with candidate list, optional prompt, and context fields
  • DecisionCallback protocol defines the interface for LLM-backed decisions
  • ToolCandidate includes name, description, and compatibility score
  • ChainAnalyzer.guided_decision_points() generates narrowed choice sets from the compatibility graph
  • FlowExecutor accepts an optional decision_callback and invokes it at guided decision points
  • Narrowed candidate lists include compatibility scores (from schema analysis)
  • Default prompt template is generated automatically from tool metadata
  • Custom prompt override is supported via GuidedDecisionPoint.decision_prompt
  • Fallback to open routing when decision_callback returns None or is not set
  • StepRecord records whether a step was deterministic, guided, or open
  • At least 8 test cases: deterministic path (no callback invoked), guided path (callback invoked with narrowed set), open fallback, custom prompt, compatibility filtering, max candidates limit, no valid candidates, callback error handling
  • No LLM dependency — DecisionCallback is a protocol, not tied to any provider

Out of Scope

  • Specific LLM provider implementations of DecisionCallback
  • Auto-tuning of compatibility thresholds
  • Multi-step lookahead (choosing based on 2+ steps ahead)
  • Caching of LLM decisions for identical contexts

Dependencies

Relationship to Existing Issues

This is a new execution paradigm that sits between:

It extends the value proposition from "eliminate LLM calls entirely" to "make unavoidable LLM calls cheaper, faster, and more accurate."

Notes

  • The three-tier model (deterministic → guided → open) mirrors how compilers optimize: fully resolved → constrained choice → runtime dispatch.
  • Empirically, LLMs make significantly better tool selection decisions when choosing from 5-10 candidates vs 50-100+. This is the same principle behind RAG retrieval narrowing.
  • The compatibility_score enables the LLM to trust ChainWeaver's analysis: "These 6 tools are structurally compatible; pick the semantically best one."
  • This feature makes ChainWeaver useful even when flows AREN'T fully deterministic — which dramatically expands the addressable use cases.
  • Consider tracking "guided decision accuracy" over time: if the LLM consistently picks the same tool at a decision point, that's a signal to promote it to a deterministic step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ai-friendlyDesigned for AI-assisted implementationarea:compilerFlow compilation and optimizationarea:executorFlow execution enginecomplexity:complexSignificant effort, design review neededpriority:highMust address first within the milestonesize:LLarge effort (3-5 days)type:featureNew feature or capability

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions