You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: copy skill resources to workspace, improve trigger detection, add workspace hint
Three issues that significantly impact functional and trigger scores:
1. Functional eval creates a temp workspace but doesn't copy the skill's
scripts/, references/, or assets/ directories into it. When the agent
tries to execute skill scripts (e.g., 'python3 scripts/check.py'),
the files don't exist, causing script-based assertions to fail.
Fix: Copy scripts/, references/, assets/, and SKILL.md from the skill
directory into the with-skill workspace. Use separate workspaces for
with-skill and without-skill runs to prevent contamination.
2. Trigger detection only recognizes skill activation through Read tool
calls targeting SKILL.md. However, ClaudeRunner injects skill content
via --append-system-prompt, so the agent never reads SKILL.md from disk.
Fix: Add word-boundary matching for skill name in agent text output,
and path-level matching (scripts/{filename}) for script references.
Bare filenames like 'check.py' no longer trigger false positives.
3. The system prompt injection doesn't tell the agent that skill scripts
are available in the working directory, so the agent may not attempt
to execute them even when they're present.
Fix: Append a workspace hint to the injected system prompt informing
the agent that scripts/ is available in the working directory.
Also removes 'Skill' from --allowedTools since it refers to Claude Code's
~/.claude/commands/ mechanism which is not used with --append-system-prompt.
All 652 tests pass (8 new tests added).
0 commit comments