Skip to content

scaffold,validator: stop fabricating example values, guard against URL/ID hallucinations#5

Open
jakejimenez wants to merge 1 commit into
mainfrom
fix/scaffold-hallucinations
Open

scaffold,validator: stop fabricating example values, guard against URL/ID hallucinations#5
jakejimenez wants to merge 1 commit into
mainfrom
fix/scaffold-hallucinations

Conversation

@jakejimenez
Copy link
Copy Markdown
Owner

Summary

Fixes the hallucinations surfaced in the failing transcript where:

  • nlci show me the ecli commands produced ecli update flags 987654321
  • nlci send a get request to https://slackdown.com/ produced curl --json '{"key":"value"}' https://api.example.com
  • nlci ecli help produced ecli config get https://example.com

Root cause: the scaffold enricher fabricated few-shot example values (https://api.example.com, 987654321, ripgrep, localhost:8080) and the system prompt told the model to copy examples verbatim when they look close to the intent. The model obeyed; users got garbage.

This PR implements Fix A + Fix C from the analysis. Fix B (--help passthrough for Issue 3) is deferred.

Changes

Fix A — scaffold uses bracketed placeholders (internal/definition/scaffold_enrichment.go)

Every fabricated literal becomes a <placeholder>:

Before After
https://example.com, https://api.example.com <url>
987654321, 42 <id>
ripgrep, nginx, postgresql <package>, <image>, <service>
index.html, upload.txt <file>
localhost:8080 <host>
'{"key":"value"}' <json>
"sample", "json" (search queries) <query>

These placeholders are self-policing: the existing placeholderREs in validator.go rejects any <...> token in generated output, so if the model copies the placeholder verbatim, validation fails and the agent's retry loop steers it to substitute the user's actual values.

Also added a flagless GET example to curl's request capability so the model has a non-JSON template available when intent doesn't ask for a body.

Fix C — validator catches identifier hallucinations (internal/validator/validator.go)

Validate now takes intent and rejects:

  • URLs (https?://…) in the command that don't appear in the intent
  • Long digit runs (≥4 digits) in the command that don't appear in the intent

The agent already has the intent in scope at the validation call site. SDK callers get the better behavior transparently — passing an empty intent skips the check.

Verification

go test ./...                    # all green
go vet ./...                     # clean

End-to-end against the failing transcript (with stale scaffolds removed):

Intent Before After
show me the ecli commands ecli update flags 987654321 ecli update flags --help
send a get request to https://slackdown.com/ curl --json '{"key":"value"}' https://api.example.com URL is now slackdown.com; --json retention is a model-bias residual outside this scope
ecli help ecli config get https://example.com now error-loops cleanly (no fabricated output); deterministic help passthrough deferred to Fix B

Scaffold sanity:

rm -f ~/.config/nlci/definitions/{ecli,curl}.nlci.yaml
nlci ecli show me the commands     # auto-init produces clean scaffold
grep -E '987654321|example\.com|localhost:8080|ripgrep|nginx|owner/repo' \
  ~/.config/nlci/definitions/{ecli,curl}.nlci.yaml
# (no output — placeholder-only)

Tests added

  • internal/validator/validator_test.go — 7 new cases: URL not in intent rejected, URL in intent accepted (case-insensitive), digit-ID rejected, digit-ID in intent accepted, short port numbers ignored, empty intent skips check.
  • internal/definition/scaffold_enrichment_test.goTestCommandTreeExamples_NoFabricatedValues (every action family), TestCommandTreeExamples_UsePlaceholders, TestCapabilityExamples_NoFabricatedValues (every curl-shaped capability). Regex-asserts no example.com, no localhost:, no 987654321, no fabricated package names.

🤖 Generated with Claude Code

…L/ID hallucinations

The scaffold enricher fabricated few-shot example values like
'https://api.example.com', '987654321', 'ripgrep', and 'localhost:8080'.
The model copied those values verbatim into generated commands, producing
real-looking but completely wrong output (e.g., curl POSTing JSON to
api.example.com when the user asked for a GET to a different URL; ecli
emitting 'update flags 987654321' for any vaguely matching intent).

Two changes:

1. internal/definition/scaffold_enrichment.go — every fabricated literal
   becomes a bracketed placeholder (<url>, <id>, <package>, <file>,
   <json>, <key>, etc.). The validator already rejects any <...> token
   in generated output, so the placeholders are self-policing: if the
   model copies them verbatim, validation fails and the agent's retry
   loop steers the model to substitute the user's actual values. Also
   added a flagless GET example to curl's "request" capability so the
   model has a non-JSON template to copy when intent doesn't ask for a
   body.

2. internal/validator/validator.go — Validate now takes intent as a
   parameter and rejects URLs or long digit runs (≥4) in the command
   that don't appear in the intent. This is the runtime safety net for
   any hallucination that slips past Fix A — including drift in
   third-party scaffolds and SDK callers.

Verified end-to-end against the user's failing transcript:
  - 'nlci show me the ecli commands' no longer hallucinates "987654321"
  - 'nlci send a get request to https://slackdown.com/' produces a
    command targeting slackdown.com instead of api.example.com

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant