Skip to content

[Improvement] Enforce send_message tool use; evaluate dropped_text_retries necessity #30

@Rustam-Z

Description

@Rustam-Z

Summary

Two related questions about how pyclaudir ensures outbound messages actually reach the user:

  1. How can we enforce that Luna always calls send_message (or reply_to_message) instead of producing bare text content blocks that silently disappear?
  2. Is dropped_text_retries still necessary once enforcement is in place?

Problem

Claude Code turns can end with:

  • A send_message / reply_to_message tool call → message delivered to user ✓
  • A plain text content block → silently dropped, user sees nothing ✗

The current workaround is dropped_text_retries: when the harness detects a turn ended with text but no outbound tool call, it retries the turn. This is a recovery mechanism, not prevention.

Issues with relying on retries:

  • Burns tokens on a second (and possibly third) turn for something that should have worked the first time
  • The retry may still produce text if the model is confused about its role
  • Adds latency
  • Doesn't surface the failure clearly in logs

Option A — tool_choice enforcement (API-level)

The Anthropic API supports tool_choice to force the model to call a specific tool:

# Force any tool call (model picks which one)
tool_choice={"type": "any"}

# Force a specific tool
tool_choice={"type": "tool", "name": "send_message"}

Tradeoff: Forcing send_message specifically would break turns where the model legitimately needs to call read_memory, query_db, or other tools first before sending. Forcing {"type": "any"} just guarantees some tool is called, not that a message is eventually sent.

Possible hybrid: Use tool_choice: any only on the final turn after tool calls are complete — but this requires the harness to know when a turn is "final," which it currently doesn't.


Option B — System prompt enforcement

Add an explicit rule to system.md:

If you produce a text content block instead of send_message, the user sees nothing. Always deliver via send_message or reply_to_message.

This is already partially in the system prompt ("If you produce a text content block instead of send_message, the user sees nothing") but the model still occasionally drifts.

Improvement: Add a post-turn self-check step — before ending the turn, confirm a send_message or reply_to_message was called if a reply was warranted. This is behavioral, not structural.


Option C — Harness-level post-turn check (keep retries, improve detection)

Instead of retrying blindly, make the harness smarter:

def check_turn_completion(turn_result):
    has_outbound = any(
        call.tool_name in ("send_message", "reply_to_message", "send_photo", "add_reaction")
        for call in turn_result.tool_calls
    )
    had_text = bool(turn_result.text_content.strip())
    
    if had_text and not has_outbound:
        # Inject the dropped text back as a system note and retry once
        return retry_with_context(f"Your previous turn produced text but no send_message call. The text was: {turn_result.text_content[:500]}. Please call send_message now.")
    
    return turn_result

This makes retries targeted (inject the lost text) rather than blind (just re-run the turn).


Questions to resolve

  1. Does Claude Code's API support tool_choice passthrough from the harness, or does CC manage tool calling internally?
  2. Is dropped_text_retries currently catching real failures at meaningful rate, or is it rarely triggered?
  3. Should add_reaction, send_photo, edit_message count as "valid outbound" (they are — user sees something), or does only send_message / reply_to_message count?

Recommendation (tentative)

  • Short term: Keep dropped_text_retries but make retries targeted (Option C) — inject the lost text, retry once with context.
  • Medium term: If CC supports tool_choice: any, enable it to guarantee at minimum some tool call fires per turn.
  • Long term: Evaluate whether dropped_text_retries rate drops to near-zero after the targeted retry improvement — if yes, remove it.

Reported by

Rustam, 2026-05-10.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions