fix: add --dangerously-skip-permissions for CI/CD environments by melanie531 · Pull Request #3 · aws-samples/sample-agent-skill-eval

melanie531 · 2026-03-19T02:18:22Z

Problem

In CI/CD environments (CodeBuild, GitHub Actions, etc.), Claude Code has no user configuration (~/.claude/settings.json). Without permission bypass, Claude Code prompts for confirmation before executing Bash commands.

In -p (print) mode, this causes tool calls to be silently skipped — the agent never executes Bash commands like python3 scripts/check.py, resulting in:

Agent cannot run skill scripts → functional score is artificially low
With-skill and without-skill runs produce nearly identical results
Functional scores stuck around 28-36% regardless of skill quality

Evidence

Same skill (acme-compliance) tested in two environments:

Environment	Permission Config	Functional Score
EC2 (has `~/.claude/settings.json`)	`skipDangerousModePermissionPrompt: true`	90%
CodeBuild (fresh install, no config)	Default (prompts for permission)	31%

The 59-point gap is entirely explained by Claude Code's inability to execute Bash commands in the CI/CD environment.

Fix

Add --dangerously-skip-permissions to both _build_cmd_with_skill() and _build_cmd_without_skill() in ClaudeRunner.

This is safe because:

Eval runs in isolated temp workspaces (no access to real user data)
The agent is already sandboxed by --allowedTools
CI/CD environments are ephemeral containers
This matches Claude Code docs: "Recommended only for sandboxes with no internet access"

Testing

652 tests pass (1 updated to verify the flag is present)
No changes to public API or CLI interface

In CI/CD environments (CodeBuild, GitHub Actions, etc.), Claude Code has no user configuration (~/.claude/settings.json). Without permission bypass, Claude Code prompts for confirmation before executing Bash commands, which in -p (print) mode causes the tool call to be skipped. This results in: - Agent never executing skill scripts (e.g., python3 scripts/check.py) - Functional scores being artificially low (agent can't use skill tools) - Identical with-skill and without-skill results Fix: Add --dangerously-skip-permissions to both with-skill and without-skill Claude Code invocations. This is safe because: 1. Eval runs in isolated temp workspaces (no access to real data) 2. The agent is already sandboxed by --allowedTools 3. CI/CD environments are ephemeral containers This matches the Claude Code documentation recommendation for sandboxed environments without internet access.

melanie531 merged commit e38fedf into aws-samples:main Mar 19, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add --dangerously-skip-permissions for CI/CD environments#3

fix: add --dangerously-skip-permissions for CI/CD environments#3
melanie531 merged 1 commit intoaws-samples:mainfrom
melanie531:fix/bypass-permissions

melanie531 commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

melanie531 commented Mar 19, 2026

Problem

Evidence

Fix

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant