Skip to content

fix: copy skill resources to workspace and improve trigger detection#2

Merged
melanie531 merged 1 commit intoaws-samples:mainfrom
melanie531:fix/workspace-skill-files
Mar 18, 2026
Merged

fix: copy skill resources to workspace and improve trigger detection#2
melanie531 merged 1 commit intoaws-samples:mainfrom
melanie531:fix/workspace-skill-files

Conversation

@melanie531
Copy link
Contributor

Problem

Two issues that significantly impact functional and trigger evaluation scores:

1. Skill scripts/references not available in functional eval workspace

_execute_eval_pair() in functional.py creates a temporary workspace directory for each eval run. However, only the eval case files are copied into this workspace — the skill's own scripts/, references/, and assets/ directories are not copied.

When SKILL.md instructs the agent to run a script (e.g., python3 scripts/check.py), the file does not exist in the workspace. This causes:

  • Script execution failures
  • Lower functional scores because assertions checking for script output fail
  • No meaningful difference between with-skill and without-skill runs

2. Trigger detection fails for all --append-system-prompt injected skills

ClaudeRunner injects skill content via --append-system-prompt, meaning the agent receives the skill instructions in its system prompt. However, _detect_skill_trigger_from_parsed() primarily looks for Read tool calls targeting SKILL.md to detect skill activation.

Since the agent already has the skill content in its system prompt, it never needs to read SKILL.md from disk. This results in:

  • All positive trigger queries failing (agent used the skill knowledge but didn't read the file)
  • All negative trigger queries passing (correctly detected as not triggered)
  • Exactly 50% trigger score for any skill with balanced positive/negative queries

Fix

Functional workspace (functional.py)

  • Copy scripts/, references/, and assets/ from the skill directory into the temp workspace
  • Also copy SKILL.md so the agent can read it if needed
  • This allows the agent to actually execute the scripts described in SKILL.md

Trigger detection (trigger.py)

  • Add word-boundary matching for skill name in agent text output
  • Add detection of skill script filenames referenced in agent text output
  • These complement the existing tool-call detection, covering cases where the agent demonstrates skill awareness through its response text

Testing

  • 6 new unit tests added (1 functional + 5 trigger)
  • All 651 existing tests continue to pass
  • No changes to public API or CLI interface

Evidence

Tested with 4 skills in a CI/CD pipeline. Before this fix:

Metric Score Reason
Trigger 50% All positive queries fail (SKILL.md never read)
Functional 28-36% Scripts not found in workspace

The trigger score of exactly 50% is the signature of this bug — it means the detection mechanism is essentially random (only negative queries pass).

…d workspace hint

Three issues that significantly impact functional and trigger scores:

1. Functional eval creates a temp workspace but doesn't copy the skill's
   scripts/, references/, or assets/ directories into it. When the agent
   tries to execute skill scripts (e.g., 'python3 scripts/check.py'),
   the files don't exist, causing script-based assertions to fail.

   Fix: Copy scripts/, references/, assets/, and SKILL.md from the skill
   directory into the with-skill workspace. Use separate workspaces for
   with-skill and without-skill runs to prevent contamination.

2. Trigger detection only recognizes skill activation through Read tool
   calls targeting SKILL.md. However, ClaudeRunner injects skill content
   via --append-system-prompt, so the agent never reads SKILL.md from disk.

   Fix: Add word-boundary matching for skill name in agent text output,
   and path-level matching (scripts/{filename}) for script references.
   Bare filenames like 'check.py' no longer trigger false positives.

3. The system prompt injection doesn't tell the agent that skill scripts
   are available in the working directory, so the agent may not attempt
   to execute them even when they're present.

   Fix: Append a workspace hint to the injected system prompt informing
   the agent that scripts/ is available in the working directory.

Also removes 'Skill' from --allowedTools since it refers to Claude Code's
~/.claude/commands/ mechanism which is not used with --append-system-prompt.

All 652 tests pass (8 new tests added).
@melanie531 melanie531 force-pushed the fix/workspace-skill-files branch from 8a03bde to 274ee1f Compare March 18, 2026 23:02
@melanie531 melanie531 merged commit b1c01aa into aws-samples:main Mar 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant