fix: harden awakening traces and live log streams#30
Conversation
- Batch live log and trace SSE updates with requestAnimationFrame. - Use stable stream entry ids to avoid index-key churn during filtering and truncation.
- Add bot action trace coverage for outbound actions with explicit trigger reasons. - Limit awakening extend windows to explicit LLM triggers and filter low-signal messages. - Inject internal awakening prompts and current-message images for supported passive triggers. - Update tests and docs for passive awakening behavior.
There was a problem hiding this comment.
Bot Review
The PR introduces a new bot_action_trace module, hardens awakening behavior (extend windows restricted to explicit LLM triggers, internal trigger instructions separated from conversation history, passive image injection with per-rule limits), adds trace coverage to multiple outbound action paths, and batches frontend live log/trace streams. The primary risks are: (1) the bot_action_trace.py source contains literal [REDACTED] markers in the context manager and overlay functions that, if present in the deployed code, would cause NameError crashes — though tests pass suggesting this may be a review-tool artifact; (2) _send_lyrics_forward calls _trace_context() three separate times, creating duplicate trace events when forward msg fails and falls through to plain send; (3) overlay_bot_action_trace can raise TypeError if callers pass non-dataclass-field keys; (4) minor test and frontend concerns.
🔴 HIGH: The bot_action_trace context manager and overlay_bot_action_trace function contain literal [REDACTED] markers in place of what should be token = _current_trace.set(next_trace) and token = _current_trace.set(replace(parent, **normalized)). If these markers exist in the deployed source, the module will fail to import (or the finally blocks will raise NameError for undefined 'token'). The unit tests pass (test_bot_action_trace_payload_uses_context enters the bot_action_trace context manager successfully), which suggests this may be a review-tool display artifact — but the raw file content read from the branch also shows [REDACTED]. Verification required.
File: quickquip/common/bot_action_trace.py
Confirm the actual source on the branch has token = _current_trace.set(next_trace) on line 206 and token = _current_trace.set(replace(parent, **normalized)) on line 233. If [REDACTED] is genuinely in the source, replace with the correct ContextVar.set() calls.
🟡 MEDIUM: _send_lyrics_forward defines _trace_context() once but calls it three separate times (one per fallback branch). Each call creates a new bot_action_trace context manager object. If the send_group_forward_msg attempt fails with an exception, the first with-block exits (potentially logging a trace via the NoneBot API hook), then the fallback branch calls _trace_context() again and enters another with-block — producing two trace log entries for what is logically one action (send lyrics). Additionally, the _trace_context() function allocates a new BotActionTrace and UUID on each invocation regardless of whether the context manager will be used.
File: quickquip/adapters/nonebot/command_parts/common.py
Compute trace_context once at the top of _send_lyrics_forward, before the branching logic. If it is not None, wrap all branches in a single with-block (or restructure to avoid the try/except fallthrough pattern creating duplicate traces).
🟡 MEDIUM: overlay_bot_action_trace passes **normalized to replace(parent, ...). The normalized dict is built from **updates and only coerces group_id/user_id/incoming_message_id; all other keys from updates pass through as-is. If a caller passes a key that is not a field of the BotActionTrace dataclass, dataclasses.replace() will raise TypeError. The function is not called anywhere in the current codebase, making it dormant dead code with a latent bug.
File: quickquip/common/bot_action_trace.py
Either filter normalized to only include valid BotActionTrace field names before calling replace(), or document that callers must only pass dataclass field names. Better: use {k: v for k, v in normalized.items() if k in {f.name for f in fields(BotActionTrace)}}.
🟢 LOW: _trace_context() is defined as a nested function inside _send_lyrics_forward and captures formatted, event, and group_id from the enclosing scope. The reply_preview=formatted argument may produce a very long string (full generated lyrics) that exceeds the 120-char _PREVIEW_LIMIT — correctly truncated by _preview() — but the unreduced string is still stored in the BotActionTrace dataclass instance before truncation. This wastes memory for large lyrics outputs.
File: quickquip/adapters/nonebot/command_parts/common.py
The _preview() call inside bot_action_trace handles truncation at storage time, but the formatted string is still held in the closure. Consider truncating before passing to bot_action_trace if memory is a concern for large lyric outputs. Not critical.
🟢 LOW: test_interest_does_not_open_extend_window contains a dead-code branch: if first.opens_extend_window: s.mark_awakened(...). The preceding assertion assert first.opens_extend_window is False makes this branch unreachable. If the intent was defensive coding, it should be removed; if the intent was to test that the extend window is NOT opened, the if guard is misleading.
File: tests/unit/chat/test_awakening.py
Remove the if first.opens_extend_window: guard and the dead s.mark_awakened(...) call. The assertion on the line above already proves the condition is false.
🟢 LOW: _is_extend_eligible_message strips structural parts then tests the cleaned text. A message like "好?" passes the meaningful-text check (contains CJK char) but is rejected because punctuationless yields "好" which is in _EXTEND_REJECT_TEXTS. However, "好啊" (2 chars, not in reject set, not a question) is rejected by len(punctuationless) < 3 and not is_short_question. The boundary between rejected interjections and accepted short messages is fuzzy — some 2-char utterances that contain a reject-set character plus another meaningful character (e.g., "嗯对") would also be rejected because punctuationless would be "嗯对" which is NOT in the set but compact would be "嗯对" (assuming lowercase), also not in set. Actually "嗯对" has len 2 < 3 and is not a short question, so it's rejected. This seems overly aggressive for 2-char acknowledgments in conversation.
File: quickquip/chat/awakening.py
Consider raising the minimum length for rejection from 3 to 2, or adding a separate allowlist of common short affirmative responses. The current behavior rejects all 2-character non-question messages, which may include legitimate conversational continuations.
Automated review. Reply with @KHPilot[bot] to ask follow-up questions.
| model=model, | ||
| source=source, | ||
| ) | ||
| token = _current_trace.set(next_trace) |
There was a problem hiding this comment.
HIGH: The bot_action_trace context manager and overlay_bot_action_trace function contain literal [REDACTED] markers in place of what should be token = _current_trace.set(next_trace) and token = _current_trace.set(replace(parent, **normalized)). If these markers exist in the deployed source, the module will fail to import (or the finally blocks will raise NameError for undefined 'token'). The unit tests pass (test_bot_action_trace_payload_uses_context enters the bot_action_trace context manager successfully), which suggests this may be a review-tool display artifact — but the raw file content read from the branch also shows [REDACTED]. Verification required.
Confirm the actual source on the branch has token = _current_trace.set(next_trace) on line 206 and token = _current_trace.set(replace(parent, **normalized)) on line 233. If [REDACTED] is genuinely in the source, replace with the correct ContextVar.set() calls.
| { | ||
| "type": "node", | ||
| "data": { | ||
| "name": "歌词", |
There was a problem hiding this comment.
MEDIUM: _send_lyrics_forward defines _trace_context() once but calls it three separate times (one per fallback branch). Each call creates a new bot_action_trace context manager object. If the send_group_forward_msg attempt fails with an exception, the first with-block exits (potentially logging a trace via the NoneBot API hook), then the fallback branch calls _trace_context() again and enters another with-block — producing two trace log entries for what is logically one action (send lyrics). Additionally, the _trace_context() function allocates a new BotActionTrace and UUID on each invocation regardless of whether the context manager will be used.
Compute trace_context once at the top of _send_lyrics_forward, before the branching logic. If it is not None, wrap all branches in a single with-block (or restructure to avoid the try/except fallthrough pattern creating duplicate traces).
| _current_trace.reset(token) | ||
|
|
||
|
|
||
| @contextmanager |
There was a problem hiding this comment.
MEDIUM: overlay_bot_action_trace passes **normalized to replace(parent, ...). The normalized dict is built from **updates and only coerces group_id/user_id/incoming_message_id; all other keys from updates pass through as-is. If a caller passes a key that is not a field of the BotActionTrace dataclass, dataclasses.replace() will raise TypeError. The function is not called anywhere in the current codebase, making it dormant dead code with a latent bug.
Either filter normalized to only include valid BotActionTrace field names before calling replace(), or document that callers must only pass dataclass field names. Better: use {k: v for k, v in normalized.items() if k in {f.name for f in fields(BotActionTrace)}}.
| { | ||
| "type": "node", | ||
| "data": { | ||
| "name": "歌词", |
There was a problem hiding this comment.
LOW: _trace_context() is defined as a nested function inside _send_lyrics_forward and captures formatted, event, and group_id from the enclosing scope. The reply_preview=formatted argument may produce a very long string (full generated lyrics) that exceeds the 120-char _PREVIEW_LIMIT — correctly truncated by _preview() — but the unreduced string is still stored in the BotActionTrace dataclass instance before truncation. This wastes memory for large lyrics outputs.
The _preview() call inside bot_action_trace handles truncation at storage time, but the formatted string is still held in the closure. Consider truncating before passing to bot_action_trace if memory is a concern for large lyric outputs. Not critical.
| load_awakening_config, | ||
| run_boredom_check, | ||
| select_passive_trigger_image_urls, | ||
| ) |
There was a problem hiding this comment.
LOW: test_interest_does_not_open_extend_window contains a dead-code branch: if first.opens_extend_window: s.mark_awakened(...). The preceding assertion assert first.opens_extend_window is False makes this branch unreachable. If the intent was defensive coding, it should be removed; if the intent was to test that the extend window is NOT opened, the if guard is misleading.
Remove the if first.opens_extend_window: guard and the dead s.mark_awakened(...) call. The assertion on the line above already proves the condition is false.
| @@ -141,6 +142,18 @@ class AwakeningTriggerResult: | |||
| rule_name: str | |||
| prompt: str | |||
| trigger_reason: str | |||
There was a problem hiding this comment.
LOW: _is_extend_eligible_message strips structural parts then tests the cleaned text. A message like "好?" passes the meaningful-text check (contains CJK char) but is rejected because punctuationless yields "好" which is in _EXTEND_REJECT_TEXTS. However, "好啊" (2 chars, not in reject set, not a question) is rejected by len(punctuationless) < 3 and not is_short_question. The boundary between rejected interjections and accepted short messages is fuzzy — some 2-char utterances that contain a reject-set character plus another meaningful character (e.g., "嗯对") would also be rejected because punctuationless would be "嗯对" which is NOT in the set but compact would be "嗯对" (assuming lowercase), also not in set. Actually "嗯对" has len 2 < 3 and is not a short question, so it's rejected. This seems overly aggressive for 2-char acknowledgments in conversation.
Consider raising the minimum length for rejection from 3 to 2, or adding a separate allowlist of common short affirmative responses. The current behavior rejects all 2-character non-question messages, which may include legitimate conversational continuations.
- Drop stale pending stream entries when RAF is delayed. - Flush with a timer fallback and clear trace buffers on manual reset.
- Keep LLM HTTP trace payloads complete for prompt and tool debugging. - Redact message previews from always-on bot action trace logs. - Prevent internal awakening prompts from entering history on empty/system triggers. - Keep passive awakening from using group voice transcripts as trigger text.
- Use one action trace context across lyrics forward fallback sends. - Ignore unknown overlay trace fields before replacing context. - Remove unreachable awakening test branch.
Summary
Code Review Follow-up
Verification
ruff checknpm run type-checkpytest -n auto(617 passed).venv\\Scripts\\python.exe -m pytest -q(617 passed, 1 deselected)