Feat/tool descriptions and confidence, MCP server by bs258q · Pull Request #21 · cactus-compute/needle

bs258q · 2026-05-13T19:48:31Z

feat: tool descriptions, confidence tracking, and no-match fallback

What

Three independent improvements to constrained decoding and inference:

1. Confidence threshold + no-match fallback

Always return best guess (default — no behavior change)

result = generate(model, params, tokenizer, query, tools)

Return no-match sentinel when model is uncertain

result = generate(..., threshold=0.5)

→ '{"match":false,"confidence":0.31}' when confidence < threshold

Get confidence score alongside result

result, conf = generate(..., return_confidence=True)

needle playground — launches a local web UI for testing and fine-tuning.
browser/workbench/workbench.html) — starts Needle as a local MCP tool router for agent workflows.
browser/workbench/workbench.html) — runs a single inference request with tool support.
Add tokenizer training pipeline and extract src/tokenizer.py #6. Tool description support

Include a "description" field in each tool object for better semantic
matching. The encoder already tokenizes the full tools JSON string, so
descriptions flow through automatically — no structural change required.

{"name": "get_weather", "description": "Get current weather for a location", "parameters": {"location": "string"}}

Confidence = softmax max probability over valid name tokens at the first
constrained step. Useful for production: escalate uncertain calls to a
cloud LLM instead of silently returning a wrong tool.

- Fix constrained decoding for flat parameter format {"key": "string"} — previously only JSON Schema {"properties": {"key": {...}}} worked; now both formats populate the param trie correctly - Add confidence tracking to ConstrainedDecoder: captures softmax max probability over valid name tokens at the first IN_NAME step - Add threshold/no-match fallback to generate(): returns {"match":false,"confidence":0.31} when confidence < threshold - Add return_confidence param to generate() for tuple return - Tool descriptions already work via encoder (no structural change needed) - Add comprehensive unit tests covering Trie, ToolConstraints, JsonStateMachine, apply_constraints, and ConstrainedDecoder confidence Signed-off-by: bs258q <bs258q@gmail.com>

Signed-off-by: bs258q <bs258q@gmail.com>

… calling Signed-off-by: bs258q <bs258q@gmail.com>

bs258q added 2 commits May 13, 2026 12:50

chore: remove test files

7f98329

Signed-off-by: bs258q <bs258q@gmail.com>

bs258q force-pushed the feat/tool-descriptions-and-confidence branch from 984f7a4 to 7f98329 Compare May 13, 2026 19:50

fix: for showing confidence scores

10619b9

Signed-off-by: bs258q <bs258q@gmail.com>

bs258q force-pushed the feat/tool-descriptions-and-confidence branch from 9753265 to 10619b9 Compare May 13, 2026 19:59

HTTP MCP endpoint and MCP server to be used by orchestrators for tool…

c181690

… calling Signed-off-by: bs258q <bs258q@gmail.com>

bs258q changed the title ~~Feat/tool descriptions and confidence~~ Feat/tool descriptions and confidence, MCP server May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/tool descriptions and confidence, MCP server#21

Feat/tool descriptions and confidence, MCP server#21
bs258q wants to merge 4 commits into
cactus-compute:mainfrom
bs258q:feat/tool-descriptions-and-confidence

bs258q commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bs258q commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

1. Confidence threshold + no-match fallback

Always return best guess (default — no behavior change)

Return no-match sentinel when model is uncertain

→ '{"match":false,"confidence":0.31}' when confidence < threshold

Get confidence score alongside result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bs258q commented May 13, 2026 •

edited

Loading