Skip to content

[Refactor] improve context compression budgeting#874

Merged
dingyi222666 merged 9 commits into
v1-devfrom
feat/context-compression-refactor
May 24, 2026
Merged

[Refactor] improve context compression budgeting#874
dingyi222666 merged 9 commits into
v1-devfrom
feat/context-compression-refactor

Conversation

@dingyi222666
Copy link
Copy Markdown
Member

@dingyi222666 dingyi222666 commented May 23, 2026

This pr refactors ChatLuna context compression to better account for chat history, tool messages, and agent scratchpad tokens.

New Features

  • Add smarter context compression flow for chat history token budgeting.
  • Support agent scratchpad compression in the legacy executor path.
  • Use actual usage_metadata.input_tokens as the scratchpad compression trigger baseline.

Bug fixes

  • Count AI and tool messages in the same round when deriving baseline token usage.
  • Adjust scratchpad compression threshold so compression triggers closer to the configured token budget.
  • Clean up stale token counter and unused prompt imports found by lint.

Other Changes

  • Streamline context compression formatting and type annotations.
  • Apply review-feedback formatting and warning text cleanup.
  • Validation: yarn lint-fix completed with no errors. Existing max-len warnings remain in read_chat_message.ts.

…rt token counting

- Rewrite infinite_context.ts: class -> function, structured output (summary + recent messages)
- Rewrite infinite_context_chain.ts: class -> simple compressChunk function
- Add scratchpad compression in agent loop (legacy-executor.ts)
- Extract shared countMessageTokens/countMessagesTokens to utils/count_tokens.ts
  with usage_metadata baseline optimization
- Update chat_history.ts and model.ts cropMessages to use baseline optimization
- Fix multimodal warning: 'chatluna-multimodal-service' -> 'multimodal-service'
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b7931988-4764-4c84-9f32-0de8a4ca5d16

📥 Commits

Reviewing files that changed from the base of the PR and between ba1ccd6 and 28562b3.

📒 Files selected for processing (1)
  • packages/extension-usage/client/charts/token-line.ts

Walkthrough

本PR重构ChatLuna的上下文压缩管道,将基于类的InfiniteContextManager改为函数式compressIfNeededAPI,新增token计数工具并引入基线驱动的截断优化,在代理执行中集成scratchpad压缩能力,并更新多模态插件名称。

Changes

无限上下文和Scratchpad压缩系统重构

Layer / File(s) Summary
Token计数工具函数提取与重导出
packages/core/src/llm-core/utils/count_tokens.ts, packages/core/src/llm-core/prompt/system_prompts.ts
新增并导出 countMessageTokenscountMessagesTokens(支持 base64 图片片段移除与 baseline 优化);在 system_prompts 中移除本地实现并重导出这些函数。
压缩链从类式改为函数式设计
packages/core/src/llm-core/chain/infinite_context_chain.ts
ChatLunaInfiniteContextChain 精简为导出函数 compressChunk 并新增 CompressChunkResult,增强压缩提示以要求汇总工具调用及其结果。
无限上下文管理器重构为函数API
packages/core/src/llm-core/chat/infinite_context.ts
移除 InfiniteContextManager,导出 compressIfNeeded 与相关接口;实现过期工具结果占位、轮次分割(保留最多3轮)、transcript 格式化(包含 tool_calls 截断)与压缩结果重计 token 并返回。
ChatInterface集成新压缩API
packages/core/src/llm-core/chat/app.ts
删除 _infiniteContextManager 与工厂方法,直接在 processChatcompressContext 中调用 compressIfNeeded,并在调用点添加独立错误捕获与后续替换/记录逻辑。
Scratchpad压缩实现与集成
packages/core/src/llm-core/agent/legacy-executor.ts
runAgent 每轮工具调用后新增压缩触发:当存在模型且 scratchpad 超过阈值并且最近输入 tokens 超过模型上下文 85% 时,构建 transcript 调用 compressChunk,用带 name: 'infinite_context'HumanMessage 替换 chat_history 并裁剪早期 scratchpad(保留最近3条)。
Token截断中的基线优化机制
packages/core/src/llm-core/platform/model.ts
cropMessages 改为按轮次计数并新增 baseline 搜索/补计逻辑:定位最后一条带已知 usage_metadata.input_tokens 的 AI 消息作为基线并补计该轮以优化轮次截断。

多模态插件名称更新

Layer / File(s) Summary
中间件中的插件名称更新
packages/core/src/middlewares/chat/read_chat_message.ts
图片/GIF/音频处理的插件检测与告警文案中,将建议安装的插件名从 chatluna-multimodal-service 更新为 multimodal-service

前端微调

Layer / File(s) Summary
token-line tooltip 行为调整
packages/extension-usage/client/charts/token-line.ts
将 tooltip 的 skipZero 默认值改为 true,使得值为 0 的项不再渲染 tooltip 行。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 我把对话卷成细线,

摘要替代远方的沉重,
基线护航轮次的边缘,
scratchpad 只留三段清风,
名字换新,跳跃更轻松。

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确反映了本次 PR 的主要目标:重构和改进上下文压缩的预算管理机制。
Description check ✅ Passed 描述详细说明了 PR 的新特性、Bug 修复和其他更改,与代码变更高度相关,内容充分。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/context-compression-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the infinite context management system, moving from a class-based manager to functional utilities like compressIfNeeded and compressChunk. It introduces scratchpad compression for the agent executor to handle long tool-call loops and optimizes token counting by leveraging usage_metadata from previous AI responses as a baseline. Feedback focuses on ensuring that AbortSignal is correctly propagated through the new asynchronous compression paths to prevent unnecessary background processing and addressing a logic error in the token counting optimization that skips valid baseline messages. Additionally, it was noted that compression thresholds should be unified across the codebase.

Comment thread packages/core/src/llm-core/utils/count_tokens.ts Outdated
Comment thread packages/core/src/llm-core/agent/legacy-executor.ts Outdated
Comment thread packages/core/src/llm-core/agent/legacy-executor.ts
Comment thread packages/core/src/llm-core/agent/legacy-executor.ts Outdated
Comment thread packages/core/src/llm-core/chat/infinite_context.ts
Comment thread packages/core/src/llm-core/chat/infinite_context.ts Outdated
Comment thread packages/core/src/llm-core/chat/app.ts
Comment thread packages/core/src/llm-core/agent/legacy-executor.ts Outdated
…n trigger

Instead of estimating tokens by formatting scratchpad text, use the real
input_tokens from the AI message's usage_metadata returned by the LLM call.
This is accurate since it's what the model actually consumed.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9256c50b33

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/core/src/llm-core/chat/infinite_context.ts Outdated
Comment thread packages/core/src/llm-core/prompt/chat_history.ts Outdated
Comment thread packages/core/src/llm-core/prompt/chat_history.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/core/src/llm-core/agent/legacy-executor.ts`:
- Around line 381-395: The compression condition uses only scratchpadTokens
(from formatScratchpadForCount and tokenCounter) against maxTokenLimit * 0.84,
but the actual prompt also includes input['chat_history']; update the check in
legacy-executor.ts to include chat history tokens: format and count
input['chat_history'] (using the same tokenCounter), then compute either
totalTokens = scratchpadTokens + chatHistoryTokens and compare totalTokens to
maxTokenLimit * 0.84, or compute remainingBudget = maxTokenLimit -
chatHistoryTokens and compare scratchpadTokens to remainingBudget * 0.84;
trigger compression when the combined/remaining-based threshold is exceeded
(adjust the existing if that currently tests scratchpadTokens).

In `@packages/core/src/llm-core/chat/infinite_context.ts`:
- Around line 111-123: 当前实现使用 splitMessages() 固定按轮次(1~3)保留最近消息并在
compressIfNeeded() 仅记录 outputTokens 而不再校验
threshold/maxTokenLimit,导致若保留的最近轮次很长仍会超预算并在下次调用失败。请改为按 token 预算从后往前回填最近轮次:在
splitMessages 或 compressIfNeeded 中引入基于 threshold/maxTokenLimit 的预算计算(使用
threshold 和 maxTokenLimit、inputTokens、outputTokens),逐轮累加最近完整轮次直到累加的 tokens
达到预算上限为止;在生成 resultMessages 后重新计算并设置 outputTokens、compressed
标志、remainingMessageCount 和 messages 字段以反映真实压缩结果(引用符号:splitMessages,
compressIfNeeded, resultMessages, outputTokens, threshold, maxTokenLimit,
remainingMessageCount)。

In `@packages/core/src/llm-core/platform/model.ts`:
- Around line 835-891: 当前把 baselineTokens 直接一次性加到 totalTokens(在使用
baselineIdx/baselineRoundIdx 时)会低估同一轮中 baseline 之后的 AI 回复和 tool 消息的代价。修复方法:不要使用
baselineTokens 作为整个 0..baselineRoundIdx 的成本;在处理到 i <= baselineRoundIdx 且
selectedRounds 为空的分支里,逐轮调用 countRoundTokens(conversationRounds[j]) 累加
0..baselineRoundIdx 每一轮的真实 token 数并据此判断 exceedsLimit/truncated,然后将这些轮逐个 unshift
到 selectedRounds(而不是直接加 baselineTokens 并一次性 unshift 重复
baselineRoundIdx)。参考符号:baselineIdx, baselineRoundIdx, baselineTokens,
conversationRounds, selectedRounds, totalTokens, countRoundTokens,
maxTokenLimit。

In `@packages/core/src/llm-core/prompt/chat_history.ts`:
- Around line 72-137: The baseline calculation underestimates historical tokens
because findBaseline/baseline.tokens is treated as the full cost up to baseline
while runtime.usedTokens has already subtracted the current request
(input/scratchpad) and the baseline AI reply token count is not added back; this
causes selectedRounds to include too much history when chatHistory ends with an
AI message. Fix by computing the true baseline cost as baseline.tokens plus the
token count of the baseline AI message if that message is not already included
in runtime.usedTokens (i.e., when current request tokens were removed), or
alternatively recompute the baseline segment by calling countMessagesTokens on
rounds[0..baselineRoundIdx] instead of trusting baseline.tokens; update the
logic in the loop that unwraps the bulkRounds (the block using baselineRoundIdx,
baseline.tokens, runtime.usedTokens, selectedRounds, availableLimit and
countMessagesTokens) and likewise apply the same correction in the analogous
code at lines 198-217 so usedTokens correctly reflects all messages up to and
including the baseline AI message before comparing to availableLimit.

In `@packages/core/src/middlewares/chat/read_chat_message.ts`:
- Line 252: The warning strings reference the old plugin name
"chatluna-multimodal-service" while the code checks for
ctx.chatluna.getPlugin('multimodal-service'); update all warning/error messages
in this file that mention "chatluna-multimodal-service" (the messages near the
checks around ctx.chatluna.getPlugin('multimodal-service')) to use
"multimodal-service" so the logged/printed plugin name matches the actual plugin
id the code looks up (apply to the other similar messages in the same file).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b2d3d4ce-1642-4765-a201-18f725a050f0

📥 Commits

Reviewing files that changed from the base of the PR and between f5422c9 and 9256c50.

📒 Files selected for processing (9)
  • packages/core/src/llm-core/agent/legacy-executor.ts
  • packages/core/src/llm-core/chain/infinite_context_chain.ts
  • packages/core/src/llm-core/chat/app.ts
  • packages/core/src/llm-core/chat/infinite_context.ts
  • packages/core/src/llm-core/platform/model.ts
  • packages/core/src/llm-core/prompt/chat_history.ts
  • packages/core/src/llm-core/prompt/system_prompts.ts
  • packages/core/src/llm-core/utils/count_tokens.ts
  • packages/core/src/middlewares/chat/read_chat_message.ts

Comment thread packages/core/src/llm-core/agent/legacy-executor.ts Outdated
Comment thread packages/core/src/llm-core/chat/infinite_context.ts Outdated
Comment thread packages/core/src/llm-core/platform/model.ts Outdated
Comment thread packages/core/src/llm-core/prompt/chat_history.ts Outdated
Comment thread packages/core/src/middlewares/chat/read_chat_message.ts
- count_tokens.ts: allow baseline when it's the last message (baselineIdx >= 0)
- Pass AbortSignal through compression chain (app.ts -> infinite_context -> compressChunk, legacy-executor -> compressScratchpad -> compressChunk)
- Unify compression threshold to 0.85
- Fix compacted messages detection: use reference equality (compacted !== messages) instead of length comparison
- Revert chat_history.ts baseline optimization (unreliable in prompt pipeline context where system tokens differ between calls)
…text

- cropMessages baseline now counts the AI message itself and subsequent
  tool messages in the same round (usage_metadata.input_tokens only covers
  messages before the AI response)
- Update warning messages to show both plugin names for clarity
@dingyi222666 dingyi222666 changed the title [Refactor] streamline context compression [Refactor] improve context compression budgeting May 23, 2026
@dingyi222666 dingyi222666 linked an issue May 24, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/core/src/middlewares/chat/read_chat_message.ts (1)

295-295: 💤 Low value

警告字符串超出最大行长度限制。

静态分析工具标记 Line 295(180 字符)和 Line 740(176 字符)超出 160 字符限制。建议将长字符串拆分为多行以符合代码风格规范。

♻️ 建议修复
-                    logger.warn(
-                        `Detected GIF image, which is not supported by most models. Please install chatluna-multimodal-service (multimodal-service) plugin to parse GIF animations.`
-                    )
+                    logger.warn(
+                        'Detected GIF image, which is not supported by most models. ' +
+                            'Please install chatluna-multimodal-service (multimodal-service) plugin to parse GIF animations.'
+                    )

Also applies to: 740-740

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/middlewares/chat/read_chat_message.ts` at line 295, The GIF
warning string in read_chat_message.ts is over the 160-char line limit; locate
the long message literal (`Detected GIF image, which is not supported by most
models...multimodal-service) plugin to parse GIF animations.`) in the read chat
message handler and break it into shorter pieces (either concatenate shorter
string literals, use a template literal with explicit line breaks, or join an
array of strings) so no single source line exceeds 160 characters while
preserving the exact message content.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/core/src/middlewares/chat/read_chat_message.ts`:
- Line 295: The GIF warning string in read_chat_message.ts is over the 160-char
line limit; locate the long message literal (`Detected GIF image, which is not
supported by most models...multimodal-service) plugin to parse GIF animations.`)
in the read chat message handler and break it into shorter pieces (either
concatenate shorter string literals, use a template literal with explicit line
breaks, or join an array of strings) so no single source line exceeds 160
characters while preserving the exact message content.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 39b10d9c-69d7-4325-97e0-19358e12b729

📥 Commits

Reviewing files that changed from the base of the PR and between 9256c50 and ba1ccd6.

📒 Files selected for processing (7)
  • packages/core/src/llm-core/agent/legacy-executor.ts
  • packages/core/src/llm-core/chat/app.ts
  • packages/core/src/llm-core/chat/infinite_context.ts
  • packages/core/src/llm-core/platform/model.ts
  • packages/core/src/llm-core/utils/count_tokens.ts
  • packages/core/src/middlewares/chat/read_chat_message.ts
  • packages/extension-usage/client/charts/token-line.ts

@dingyi222666 dingyi222666 force-pushed the feat/context-compression-refactor branch from ba1ccd6 to 28562b3 Compare May 24, 2026 06:19
@dingyi222666 dingyi222666 merged commit b31502e into v1-dev May 24, 2026
4 of 5 checks passed
@dingyi222666 dingyi222666 deleted the feat/context-compression-refactor branch May 24, 2026 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] 当发送图片时产生错误的 token 占用计数

1 participant