Skip to content

fix(bookstack): suppress Qwen3 think blocks, fix citations, prevent flicker#103

Merged
amrit110 merged 4 commits into
mainfrom
fix/bookstack-flicker-and-citations
Jun 16, 2026
Merged

fix(bookstack): suppress Qwen3 think blocks, fix citations, prevent flicker#103
amrit110 merged 4 commits into
mainfrom
fix/bookstack-flicker-and-citations

Conversation

@amrit110

Copy link
Copy Markdown
Member

Root cause

Diagnosed by testing Qwen3_5-122B-A10B directly against the gateway. The model outputs reasoning inside a <think>...</think> block as regular text content — not a separate Anthropic thinking block. The gateway strips the opening <think> tag but passes </think> through as a plain text delta. This caused three problems:

  1. Thinking text visible / flickering — thinking text streamed to the UI before being cleared by text_clear, which the client couldn't suppress fast enough
  2. </think> tag appearing in the answer — the closing tag passed through to the final rendered response
  3. Page citations with (page N) numbers — model saw "id": 61 in tool responses and used it as a page number; custom models ignore the system-prompt instruction less reliably than Claude

Fixes

src/aieng_bot/bookstack/agent.py — thinking-aware streaming

ask_stream now buffers text silently until </think> is found in the stream, then switches to real-time token streaming for everything after it. A skip_leading_nl flag drops the \n\n separator that Qwen3 emits as a separate chunk immediately after </think>.

  • Qwen3 tool-use turns: thinking buffered → </think> → empty content → tool block → no text ever shown, no text_clear needed
  • Qwen3 final answer turns: thinking buffered → </think> → answer streams live, token by token
  • Non-thinking models (Claude, GPT): no </think> seen → full text buffered → burst-emitted as fast 20-char chunks after completion

bookstack_agent/ui/app/components/chat-page.tsx — RAF buffering (safety net)

text_chunk events are buffered in a ref and flushed once per animation frame. If text_clear arrives before the RAF fires, the buffer is discarded silently. Handles any edge case where a small amount of text is emitted before the model switches to a tool call.

src/aieng_bot/bookstack/tools.py — strip numeric IDs from get_page

Removed id, book_id, chapter_id, updated_at from the get_page tool response. Model now only sees name, markdown, url — can't accidentally cite a numeric ID as a page number.

src/aieng_bot/bookstack/prompts.py — stronger citation + no-preamble rules

Explicit ban on preamble phrases ("Now I have all the information…" etc.) and strengthened citation instruction: "NEVER include page numbers, numeric IDs, or any internal identifiers."

Verified live against Qwen3_5-122B-A10B

think text in chunks:   False
</think> tag in chunks: False
</think> in answer:     False

Tool-use turns emit only tool_use events (no thinking text). Final answer streams token-by-token from the first post-think token.

Test plan

  • All 14 agent unit tests pass (pytest tests/bookstack/test_agent.py)
  • Ask a multi-step question in the UI — no thinking text visible, no flicker
  • Final answer streams smoothly token-by-token
  • ## Sources section uses page titles + URLs, no (page N) annotation

@amrit110 amrit110 force-pushed the fix/bookstack-flicker-and-citations branch from 6677be8 to 322ed85 Compare June 16, 2026 18:53
@amrit110 amrit110 merged commit 77aff79 into main Jun 16, 2026
9 checks passed
@amrit110 amrit110 deleted the fix/bookstack-flicker-and-citations branch June 16, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant