[Refactor] improve context compression budgeting by dingyi222666 · Pull Request #874 · ChatLunaLab/chatluna

dingyi222666 · 2026-05-23T13:48:34Z

This pr refactors ChatLuna context compression to better account for chat history, tool messages, and agent scratchpad tokens.

New Features

Add smarter context compression flow for chat history token budgeting.
Support agent scratchpad compression in the legacy executor path.
Use actual usage_metadata.input_tokens as the scratchpad compression trigger baseline.

Bug fixes

Count AI and tool messages in the same round when deriving baseline token usage.
Adjust scratchpad compression threshold so compression triggers closer to the configured token budget.
Clean up stale token counter and unused prompt imports found by lint.

Other Changes

Streamline context compression formatting and type annotations.
Apply review-feedback formatting and warning text cleanup.
Validation: yarn lint-fix completed with no errors. Existing max-len warnings remain in read_chat_message.ts.

…rt token counting - Rewrite infinite_context.ts: class -> function, structured output (summary + recent messages) - Rewrite infinite_context_chain.ts: class -> simple compressChunk function - Add scratchpad compression in agent loop (legacy-executor.ts) - Extract shared countMessageTokens/countMessagesTokens to utils/count_tokens.ts with usage_metadata baseline optimization - Update chat_history.ts and model.ts cropMessages to use baseline optimization - Fix multimodal warning: 'chatluna-multimodal-service' -> 'multimodal-service'

coderabbitai · 2026-05-23T13:48:49Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b7931988-4764-4c84-9f32-0de8a4ca5d16

📥 Commits

Reviewing files that changed from the base of the PR and between ba1ccd6 and 28562b3.

📒 Files selected for processing (1)

packages/extension-usage/client/charts/token-line.ts

Walkthrough

本PR重构ChatLuna的上下文压缩管道，将基于类的InfiniteContextManager改为函数式compressIfNeededAPI，新增token计数工具并引入基线驱动的截断优化，在代理执行中集成scratchpad压缩能力，并更新多模态插件名称。

Changes

无限上下文和Scratchpad压缩系统重构

Layer / File(s)	Summary
Token计数工具函数提取与重导出 `packages/core/src/llm-core/utils/count_tokens.ts`, `packages/core/src/llm-core/prompt/system_prompts.ts`	新增并导出 `countMessageTokens` 与 `countMessagesTokens`（支持 base64 图片片段移除与 baseline 优化）；在 system_prompts 中移除本地实现并重导出这些函数。
压缩链从类式改为函数式设计 `packages/core/src/llm-core/chain/infinite_context_chain.ts`	将 `ChatLunaInfiniteContextChain` 精简为导出函数 `compressChunk` 并新增 `CompressChunkResult`，增强压缩提示以要求汇总工具调用及其结果。
无限上下文管理器重构为函数API `packages/core/src/llm-core/chat/infinite_context.ts`	移除 `InfiniteContextManager`，导出 `compressIfNeeded` 与相关接口；实现过期工具结果占位、轮次分割（保留最多3轮）、transcript 格式化（包含 tool_calls 截断）与压缩结果重计 token 并返回。
ChatInterface集成新压缩API `packages/core/src/llm-core/chat/app.ts`	删除 `_infiniteContextManager` 与工厂方法，直接在 `processChat` 与 `compressContext` 中调用 `compressIfNeeded`，并在调用点添加独立错误捕获与后续替换/记录逻辑。
Scratchpad压缩实现与集成 `packages/core/src/llm-core/agent/legacy-executor.ts`	在 `runAgent` 每轮工具调用后新增压缩触发：当存在模型且 scratchpad 超过阈值并且最近输入 tokens 超过模型上下文 85% 时，构建 transcript 调用 `compressChunk`，用带 `name: 'infinite_context'` 的 `HumanMessage` 替换 `chat_history` 并裁剪早期 scratchpad（保留最近3条）。
Token截断中的基线优化机制 `packages/core/src/llm-core/platform/model.ts`	`cropMessages` 改为按轮次计数并新增 baseline 搜索/补计逻辑：定位最后一条带已知 `usage_metadata.input_tokens` 的 AI 消息作为基线并补计该轮以优化轮次截断。

多模态插件名称更新

Layer / File(s)	Summary
中间件中的插件名称更新 `packages/core/src/middlewares/chat/read_chat_message.ts`	图片/GIF/音频处理的插件检测与告警文案中，将建议安装的插件名从 `chatluna-multimodal-service` 更新为 `multimodal-service`。

前端微调

Layer / File(s)	Summary
token-line tooltip 行为调整 `packages/extension-usage/client/charts/token-line.ts`	将 tooltip 的 `skipZero` 默认值改为 `true`，使得值为 0 的项不再渲染 tooltip 行。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

ChatLunaLab/chatluna#664: 涉及 compressIfNeeded 的压缩前处理与工具相关消息处理，代码级相关。
ChatLunaLab/chatluna#820: 与本 PR 都修改了 compressChunk / compressIfNeeded 的返回与集成点。
ChatLunaLab/chatluna#656: 同样对 infinite_context 的分段/汇总与工具消息处理做了改动，功能链路相关。

Poem

🐰 我把对话卷成细线，

摘要替代远方的沉重，
基线护航轮次的边缘，
scratchpad 只留三段清风，
名字换新，跳跃更轻松。

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	标题准确反映了本次 PR 的主要目标：重构和改进上下文压缩的预算管理机制。
Description check	✅ Passed	描述详细说明了 PR 的新特性、Bug 修复和其他更改，与代码变更高度相关，内容充分。
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/context-compression-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the infinite context management system, moving from a class-based manager to functional utilities like compressIfNeeded and compressChunk. It introduces scratchpad compression for the agent executor to handle long tool-call loops and optimizes token counting by leveraging usage_metadata from previous AI responses as a baseline. Feedback focuses on ensuring that AbortSignal is correctly propagated through the new asynchronous compression paths to prevent unnecessary background processing and addressing a logic error in the token counting optimization that skips valid baseline messages. Additionally, it was noted that compression thresholds should be unified across the codebase.

…n trigger Instead of estimating tokens by formatting scratchpad text, use the real input_tokens from the AI message's usage_metadata returned by the LLM call. This is accurate since it's what the model actually consumed.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9256c50b33

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/core/src/llm-core/agent/legacy-executor.ts`:
- Around line 381-395: The compression condition uses only scratchpadTokens
(from formatScratchpadForCount and tokenCounter) against maxTokenLimit * 0.84,
but the actual prompt also includes input['chat_history']; update the check in
legacy-executor.ts to include chat history tokens: format and count
input['chat_history'] (using the same tokenCounter), then compute either
totalTokens = scratchpadTokens + chatHistoryTokens and compare totalTokens to
maxTokenLimit * 0.84, or compute remainingBudget = maxTokenLimit -
chatHistoryTokens and compare scratchpadTokens to remainingBudget * 0.84;
trigger compression when the combined/remaining-based threshold is exceeded
(adjust the existing if that currently tests scratchpadTokens).

In `@packages/core/src/llm-core/chat/infinite_context.ts`:
- Around line 111-123: 当前实现使用 splitMessages() 固定按轮次（1~3）保留最近消息并在
compressIfNeeded() 仅记录 outputTokens 而不再校验
threshold/maxTokenLimit，导致若保留的最近轮次很长仍会超预算并在下次调用失败。请改为按 token 预算从后往前回填最近轮次：在
splitMessages 或 compressIfNeeded 中引入基于 threshold/maxTokenLimit 的预算计算（使用
threshold 和 maxTokenLimit、inputTokens、outputTokens），逐轮累加最近完整轮次直到累加的 tokens
达到预算上限为止；在生成 resultMessages 后重新计算并设置 outputTokens、compressed
标志、remainingMessageCount 和 messages 字段以反映真实压缩结果（引用符号：splitMessages,
compressIfNeeded, resultMessages, outputTokens, threshold, maxTokenLimit,
remainingMessageCount）。

In `@packages/core/src/llm-core/platform/model.ts`:
- Around line 835-891: 当前把 baselineTokens 直接一次性加到 totalTokens（在使用
baselineIdx/baselineRoundIdx 时）会低估同一轮中 baseline 之后的 AI 回复和 tool 消息的代价。修复方法：不要使用
baselineTokens 作为整个 0..baselineRoundIdx 的成本；在处理到 i <= baselineRoundIdx 且
selectedRounds 为空的分支里，逐轮调用 countRoundTokens(conversationRounds[j]) 累加
0..baselineRoundIdx 每一轮的真实 token 数并据此判断 exceedsLimit/truncated，然后将这些轮逐个 unshift
到 selectedRounds（而不是直接加 baselineTokens 并一次性 unshift 重复
baselineRoundIdx）。参考符号：baselineIdx, baselineRoundIdx, baselineTokens,
conversationRounds, selectedRounds, totalTokens, countRoundTokens,
maxTokenLimit。

In `@packages/core/src/llm-core/prompt/chat_history.ts`:
- Around line 72-137: The baseline calculation underestimates historical tokens
because findBaseline/baseline.tokens is treated as the full cost up to baseline
while runtime.usedTokens has already subtracted the current request
(input/scratchpad) and the baseline AI reply token count is not added back; this
causes selectedRounds to include too much history when chatHistory ends with an
AI message. Fix by computing the true baseline cost as baseline.tokens plus the
token count of the baseline AI message if that message is not already included
in runtime.usedTokens (i.e., when current request tokens were removed), or
alternatively recompute the baseline segment by calling countMessagesTokens on
rounds[0..baselineRoundIdx] instead of trusting baseline.tokens; update the
logic in the loop that unwraps the bulkRounds (the block using baselineRoundIdx,
baseline.tokens, runtime.usedTokens, selectedRounds, availableLimit and
countMessagesTokens) and likewise apply the same correction in the analogous
code at lines 198-217 so usedTokens correctly reflects all messages up to and
including the baseline AI message before comparing to availableLimit.

In `@packages/core/src/middlewares/chat/read_chat_message.ts`:
- Line 252: The warning strings reference the old plugin name
"chatluna-multimodal-service" while the code checks for
ctx.chatluna.getPlugin('multimodal-service'); update all warning/error messages
in this file that mention "chatluna-multimodal-service" (the messages near the
checks around ctx.chatluna.getPlugin('multimodal-service')) to use
"multimodal-service" so the logged/printed plugin name matches the actual plugin
id the code looks up (apply to the other similar messages in the same file).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b2d3d4ce-1642-4765-a201-18f725a050f0

📥 Commits

Reviewing files that changed from the base of the PR and between f5422c9 and 9256c50.

📒 Files selected for processing (9)

packages/core/src/llm-core/agent/legacy-executor.ts
packages/core/src/llm-core/chain/infinite_context_chain.ts
packages/core/src/llm-core/chat/app.ts
packages/core/src/llm-core/chat/infinite_context.ts
packages/core/src/llm-core/platform/model.ts
packages/core/src/llm-core/prompt/chat_history.ts
packages/core/src/llm-core/prompt/system_prompts.ts
packages/core/src/llm-core/utils/count_tokens.ts
packages/core/src/middlewares/chat/read_chat_message.ts

- count_tokens.ts: allow baseline when it's the last message (baselineIdx >= 0) - Pass AbortSignal through compression chain (app.ts -> infinite_context -> compressChunk, legacy-executor -> compressScratchpad -> compressChunk) - Unify compression threshold to 0.85 - Fix compacted messages detection: use reference equality (compacted !== messages) instead of length comparison - Revert chat_history.ts baseline optimization (unreliable in prompt pipeline context where system tokens differ between calls)

…text - cropMessages baseline now counts the AI message itself and subsequent tool messages in the same round (usage_metadata.input_tokens only covers messages before the AI response) - Update warning messages to show both plugin names for clarity

coderabbitai

🧹 Nitpick comments (1)

packages/core/src/middlewares/chat/read_chat_message.ts (1)

295-295: 💤 Low value

警告字符串超出最大行长度限制。

静态分析工具标记 Line 295（180 字符）和 Line 740（176 字符）超出 160 字符限制。建议将长字符串拆分为多行以符合代码风格规范。

♻️ 建议修复

-                    logger.warn(
-                        `Detected GIF image, which is not supported by most models. Please install chatluna-multimodal-service (multimodal-service) plugin to parse GIF animations.`
-                    )
+                    logger.warn(
+                        'Detected GIF image, which is not supported by most models. ' +
+                            'Please install chatluna-multimodal-service (multimodal-service) plugin to parse GIF animations.'
+                    )

Also applies to: 740-740

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/middlewares/chat/read_chat_message.ts` at line 295, The GIF
warning string in read_chat_message.ts is over the 160-char line limit; locate
the long message literal (`Detected GIF image, which is not supported by most
models...multimodal-service) plugin to parse GIF animations.`) in the read chat
message handler and break it into shorter pieces (either concatenate shorter
string literals, use a template literal with explicit line breaks, or join an
array of strings) so no single source line exceeds 160 characters while
preserving the exact message content.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/core/src/middlewares/chat/read_chat_message.ts`:
- Line 295: The GIF warning string in read_chat_message.ts is over the 160-char
line limit; locate the long message literal (`Detected GIF image, which is not
supported by most models...multimodal-service) plugin to parse GIF animations.`)
in the read chat message handler and break it into shorter pieces (either
concatenate shorter string literals, use a template literal with explicit line
breaks, or join an array of strings) so no single source line exceeds 160
characters while preserving the exact message content.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 39b10d9c-69d7-4325-97e0-19358e12b729

📥 Commits

Reviewing files that changed from the base of the PR and between 9256c50 and ba1ccd6.

📒 Files selected for processing (7)

packages/core/src/llm-core/agent/legacy-executor.ts
packages/core/src/llm-core/chat/app.ts
packages/core/src/llm-core/chat/infinite_context.ts
packages/core/src/llm-core/platform/model.ts
packages/core/src/llm-core/utils/count_tokens.ts
packages/core/src/middlewares/chat/read_chat_message.ts
packages/extension-usage/client/charts/token-line.ts

dingyi222666 added 3 commits May 23, 2026 21:42

fix: scratchpad compression threshold 50% -> 84%

384630d

[Refactor] streamline context compression

9256c50

gemini-code-assist Bot reviewed May 23, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

Comment thread packages/core/src/llm-core/chat/infinite_context.ts Outdated

Comment thread packages/core/src/llm-core/prompt/chat_history.ts Outdated

Comment thread packages/core/src/llm-core/prompt/chat_history.ts Outdated

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

dingyi222666 added 3 commits May 23, 2026 22:00

[Fix] format context compression output

ff19d58

dingyi222666 changed the title ~~[Refactor] streamline context compression~~ [Refactor] improve context compression budgeting May 23, 2026

[Fix] simplify context token counting

c8b84c1

dingyi222666 linked an issue May 24, 2026 that may be closed by this pull request

[Bug] 当发送图片时产生错误的 token 占用计数 #875

Closed

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

fix(extension-usage): hide zero-value models in chart tooltips

28562b3

dingyi222666 force-pushed the feat/context-compression-refactor branch from ba1ccd6 to 28562b3 Compare May 24, 2026 06:19

dingyi222666 merged commit b31502e into v1-dev May 24, 2026
4 of 5 checks passed

dingyi222666 deleted the feat/context-compression-refactor branch May 24, 2026 07:18

dingyi222666 mentioned this pull request May 24, 2026

chore(packages): bump context compression versions #876

Merged

coderabbitai Bot mentioned this pull request May 27, 2026

[Fix] count compacted prompt tokens directly #883

Merged

Uh oh!

Conversation

dingyi222666 commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Features

Bug fixes

Other Changes

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dingyi222666 commented May 23, 2026 •

edited

Loading

coderabbitai Bot commented May 23, 2026 •

edited

Loading