Skip to content

feat: asymmetric trigger detection — reduce false positives for negative queries#5

Merged
melanie531 merged 1 commit intoaws-samples:mainfrom
melanie531:feat/trigger-detection-improvements
Mar 19, 2026
Merged

feat: asymmetric trigger detection — reduce false positives for negative queries#5
melanie531 merged 1 commit intoaws-samples:mainfrom
melanie531:feat/trigger-detection-improvements

Conversation

@melanie531
Copy link
Contributor

Problem

Skills with common-word names (e.g. text-summary, acme-compliance) suffer high false-positive rates in trigger evaluation. When a should_trigger=false query is evaluated, the agent often casually mentions the skill name in its text output (e.g. "I could use text-summary here") without actually activating the skill. The current code treats any text mention as a trigger, causing negative queries to fail.

This was observed in CI pipelines where:

  • acme-compliance trigger score dropped to 50/100
  • text-summary trigger eval failed entirely

Solution

Refactor _detect_skill_trigger_from_parsed() into _classify_trigger_signal() that returns signal strength:

Signal Meaning Examples
"tool" Strong — agent used a tool to activate the skill Read SKILL.md, Bash script execution, Skill tool
"text" Weak — agent mentioned skill name in text output "Using text-summary to..."
"none" No trigger detected

Asymmetric detection logic:

  • should_trigger=true queries: both tool and text signals count as triggers (unchanged)
  • should_trigger=false queries: only tool signals count — text mentions are ignored

This means a negative query won't fail just because the agent casually mentioned the skill name.

Backward Compatibility

  • _detect_skill_trigger_from_parsed() still exists and returns bool (wraps _classify_trigger_signal)
  • _detect_skill_trigger() unchanged
  • All 663 existing tests pass + 12 new tests added

Tests Added

  • TestClassifyTriggerSignal (8 tests) — signal classification for tool/text/none
  • TestAsymmetricTriggerDetection (3 tests) — asymmetric behavior for positive vs negative queries

…gative queries

Refactor _detect_skill_trigger_from_parsed into _classify_trigger_signal
that returns signal strength ('tool', 'text', or 'none') instead of a
boolean.

Key change: for should_trigger=false queries, only 'tool' signals
(Read SKILL.md, Bash script execution, Skill tool invocation) count as
triggers. Text-only mentions of the skill name are ignored, since agents
commonly mention skill names (e.g. 'text-summary', 'compliance') in
their output without intending to activate the skill.

For should_trigger=true queries, both 'tool' and 'text' signals count,
preserving existing behavior.

This fixes false positives that caused trigger eval failures for skills
with common-word names (like text-summary, acme-compliance).
@melanie531 melanie531 merged commit c5639a5 into aws-samples:main Mar 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant