scaffold,validator: stop fabricating example values, guard against URL/ID hallucinations#5
Open
jakejimenez wants to merge 1 commit into
Open
scaffold,validator: stop fabricating example values, guard against URL/ID hallucinations#5jakejimenez wants to merge 1 commit into
jakejimenez wants to merge 1 commit into
Conversation
…L/ID hallucinations The scaffold enricher fabricated few-shot example values like 'https://api.example.com', '987654321', 'ripgrep', and 'localhost:8080'. The model copied those values verbatim into generated commands, producing real-looking but completely wrong output (e.g., curl POSTing JSON to api.example.com when the user asked for a GET to a different URL; ecli emitting 'update flags 987654321' for any vaguely matching intent). Two changes: 1. internal/definition/scaffold_enrichment.go — every fabricated literal becomes a bracketed placeholder (<url>, <id>, <package>, <file>, <json>, <key>, etc.). The validator already rejects any <...> token in generated output, so the placeholders are self-policing: if the model copies them verbatim, validation fails and the agent's retry loop steers the model to substitute the user's actual values. Also added a flagless GET example to curl's "request" capability so the model has a non-JSON template to copy when intent doesn't ask for a body. 2. internal/validator/validator.go — Validate now takes intent as a parameter and rejects URLs or long digit runs (≥4) in the command that don't appear in the intent. This is the runtime safety net for any hallucination that slips past Fix A — including drift in third-party scaffolds and SDK callers. Verified end-to-end against the user's failing transcript: - 'nlci show me the ecli commands' no longer hallucinates "987654321" - 'nlci send a get request to https://slackdown.com/' produces a command targeting slackdown.com instead of api.example.com Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the hallucinations surfaced in the failing transcript where:
nlci show me the ecli commandsproducedecli update flags 987654321nlci send a get request to https://slackdown.com/producedcurl --json '{"key":"value"}' https://api.example.comnlci ecli helpproducedecli config get https://example.comRoot cause: the scaffold enricher fabricated few-shot example values (
https://api.example.com,987654321,ripgrep,localhost:8080) and the system prompt told the model to copy examples verbatim when they look close to the intent. The model obeyed; users got garbage.This PR implements Fix A + Fix C from the analysis. Fix B (
--helppassthrough for Issue 3) is deferred.Changes
Fix A — scaffold uses bracketed placeholders (
internal/definition/scaffold_enrichment.go)Every fabricated literal becomes a
<placeholder>:https://example.com,https://api.example.com<url>987654321,42<id>ripgrep,nginx,postgresql<package>,<image>,<service>index.html,upload.txt<file>localhost:8080<host>'{"key":"value"}'<json>"sample","json"(search queries)<query>These placeholders are self-policing: the existing
placeholderREsinvalidator.gorejects any<...>token in generated output, so if the model copies the placeholder verbatim, validation fails and the agent's retry loop steers it to substitute the user's actual values.Also added a flagless GET example to curl's
requestcapability so the model has a non-JSON template available when intent doesn't ask for a body.Fix C — validator catches identifier hallucinations (
internal/validator/validator.go)Validatenow takesintentand rejects:https?://…) in the command that don't appear in the intentThe agent already has the intent in scope at the validation call site. SDK callers get the better behavior transparently — passing an empty intent skips the check.
Verification
End-to-end against the failing transcript (with stale scaffolds removed):
show me the ecli commandsecli update flags 987654321ecli update flags --helpsend a get request to https://slackdown.com/curl --json '{"key":"value"}' https://api.example.comslackdown.com;--jsonretention is a model-bias residual outside this scopeecli helpecli config get https://example.comScaffold sanity:
Tests added
internal/validator/validator_test.go— 7 new cases: URL not in intent rejected, URL in intent accepted (case-insensitive), digit-ID rejected, digit-ID in intent accepted, short port numbers ignored, empty intent skips check.internal/definition/scaffold_enrichment_test.go—TestCommandTreeExamples_NoFabricatedValues(every action family),TestCommandTreeExamples_UsePlaceholders,TestCapabilityExamples_NoFabricatedValues(every curl-shaped capability). Regex-asserts noexample.com, nolocalhost:, no987654321, no fabricated package names.🤖 Generated with Claude Code