fix(bookstack): suppress Qwen3 think blocks, fix citations, prevent flicker#103
Merged
Conversation
6677be8 to
322ed85
Compare
for more information, see https://pre-commit.ci
…INK_END to module level
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root cause
Diagnosed by testing Qwen3_5-122B-A10B directly against the gateway. The model outputs reasoning inside a
<think>...</think>block as regular text content — not a separate Anthropic thinking block. The gateway strips the opening<think>tag but passes</think>through as a plain text delta. This caused three problems:text_clear, which the client couldn't suppress fast enough</think>tag appearing in the answer — the closing tag passed through to the final rendered response(page N)numbers — model saw"id": 61in tool responses and used it as a page number; custom models ignore the system-prompt instruction less reliably than ClaudeFixes
src/aieng_bot/bookstack/agent.py— thinking-aware streamingask_streamnow buffers text silently until</think>is found in the stream, then switches to real-time token streaming for everything after it. Askip_leading_nlflag drops the\n\nseparator that Qwen3 emits as a separate chunk immediately after</think>.</think>→ empty content → tool block → no text ever shown, notext_clearneeded</think>→ answer streams live, token by token</think>seen → full text buffered → burst-emitted as fast 20-char chunks after completionbookstack_agent/ui/app/components/chat-page.tsx— RAF buffering (safety net)text_chunkevents are buffered in a ref and flushed once per animation frame. Iftext_cleararrives before the RAF fires, the buffer is discarded silently. Handles any edge case where a small amount of text is emitted before the model switches to a tool call.src/aieng_bot/bookstack/tools.py— strip numeric IDs fromget_pageRemoved
id,book_id,chapter_id,updated_atfrom theget_pagetool response. Model now only seesname,markdown,url— can't accidentally cite a numeric ID as a page number.src/aieng_bot/bookstack/prompts.py— stronger citation + no-preamble rulesExplicit ban on preamble phrases (
"Now I have all the information…"etc.) and strengthened citation instruction: "NEVER include page numbers, numeric IDs, or any internal identifiers."Verified live against Qwen3_5-122B-A10B
Tool-use turns emit only
tool_useevents (no thinking text). Final answer streams token-by-token from the first post-think token.Test plan
pytest tests/bookstack/test_agent.py)## Sourcessection uses page titles + URLs, no(page N)annotation