feat(tools): generic input-shape repair for tool calls (validate-then-repair)#2635
Merged
trungutt merged 4 commits intodocker:mainfrom May 6, 2026
Merged
Conversation
Open-weights models (DeepSeek, Qwen, GLM) repeat the same small set of shape mistakes when calling tools whose schema includes an array field: sending a JSON-stringified array, a bare scalar, or a single-key object placeholder where the schema expects an array. Today every tool surface that doesn't have a custom UnmarshalJSON fails strictly on these and the model wastes a turn retrying with the same mistake. This generalises the validate-then-repair design we already use for edit_file (PRs docker#2452 and docker#2144) to every tool that goes through tools.NewHandler. Strict json.Unmarshal still runs first; only when it fails does the repair layer walk the destination type's fields and attempt four narrow fixes at the exact paths the schema disagreed at. Successful repairs emit a tool_input_repaired log entry tagged with the tool name and the repairs applied so per-(model, tool) repair rates can be tracked.
Replace direct NumField/Field iteration with reflect.VisibleFields, which walks promoted fields from embedded structs the same way encoding/json does when marshaling. The previous loop silently skipped them. Affects ReferencesArgs/RenameArgs/CallHierarchyArgs/TypeHierarchyArgs in pkg/tools/builtin/lsp.go which all embed PositionArgs. Promoted fields are primitive scalars today so only repairDropNull would have applied, but the loop should still visit them so future []string fields lifted through embedding inherit the repair layer automatically. Also resolves the modernize lint complaint on the same loop.
rumpl
approved these changes
May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Today every tool in docker-agent has a single, strict, one-shot JSON parse for its arguments (
tools.NewHandler,pkg/tools/tools.go:17). If a model sends arguments that don't match the Go struct exactly, the call fails and the model retries — usually with the same mistake, because the error message it sees is unreadable JSON noise.This PR teaches
NewHandlera small, named set of repairs for the four shape mistakes open-weights models repeat across every tool that has an array field. It does not change any tool's contract or schema. Valid inputs are not touched.Why now
We already do this — but only for
edit_file. PR #2452 and PR #2144 added validate-then-repair logic for theeditsfield and for malformed JSON syntax. Those repairs are bespoke. The patterns they catch are general — the same four shape errors show up across every[]stringfield we expose:read_multiple_files(paths)create_directory/remove_directory(paths)search_files_content(excludePatterns)fetch(urls)create_todos(descriptions)dependenciesEach is silently brittle today: any model that sends
paths: "foo"instead ofpaths: ["foo"]gets a schema error and a wasted turn. The framing in this thread argues — credibly, in our experience — that this is harness design, not model capability. Anthropic and OpenAI eat the cost of strict contracts invisibly because they've memorised every JSON contract during pretraining; open models pay it loudly and get dismissed for it.This PR generalises the pattern we already use for
edit_fileso every tool benefits.How it works
The handler does the obvious thing first: try a strict
json.Unmarshalinto the typed struct. If that succeeds — the 95% case for valid input — the typed function is called immediately, with zero overhead and no behaviour change.If the strict parse fails, the repair layer walks the destination struct's fields by reflection and looks for a small, fixed catalogue of mismatches between each typed field and the corresponding raw JSON value:
paths: \"[\\\"a\\\",\\\"b\\\"]\"becomespaths: [\"a\",\"b\"]. We tryjson.Unmarshalon the string and accept it only if it parses as an array.paths: \"foo\"becomespaths: [\"foo\"]. Only fires for primitive element kinds; we don't guess struct construction.paths: {\"path\": \"foo\"}becomespaths: [\"foo\"]. Restricted to objects with exactly one entry whose value matches the slice's element kind.n: nullis dropped from the payload so the type's zero value wins. (Mostly a no-op in Go's stdlib but matters when a field has a customUnmarshalJSON.)If at least one repair fires we re-marshal the payload, retry the strict parse, and on success emit a
tool_input_repairedlog entry tagged with the tool name and the list of repairs applied. If the retry still fails we surface the original error so the model sees the schema's complaint rather than a synthesised one from the repair layer.Why validate-then-repair (not preprocess-then-validate)
The naive approach would be to walk every input and normalise things that look broken before parsing. That's silently corrupting: imagine a
write_filecall where the model wants to write[1,2,3]to a file. Preprocessing strings-that-parse-as-JSON would turncontentinto a real array and we'd write garbage to disk.Validate-then-repair avoids this because the schema is the prior. The repair only runs at field paths where the strict parse already failed. If
contentis typed asstring, parsing succeeds with\"[1,2,3]\"as a string and we never touch it. Ifpathsis typed as[]string, the parse fails with a type mismatch atpaths, and only there do we try unwrapping. The validator localises the bug for us; the repair layer only spends budget at confirmed mismatches.Why ordering matters
The four repairs are run in a fixed order. The load-bearing one is that the stringified-array unwrap must run before the bare-string-wrap, otherwise a literal stringified array would be wrapped as a single element and we'd silently corrupt the input (
'[\"a\",\"b\"]'would become['[\"a\",\"b\"]']). There's a dedicated test (TestRepair_OrderingPreventsDoubleWrap) that pins this invariant.What this is not
edit_file'soldTextand was closed as too dangerous; that objection does not apply here because input-shape repair never modifies content fields.