Skip to content

Context window management lacks LLM-driven compression — _llm_summarize() is a placeholder #1806

@MervinPraison

Description

@MervinPraison

Summary

When an agent's conversation grows beyond the model's context window, praisonaiagents falls back to heuristic message truncation — silently discarding the oldest turns. The context/optimizer.py module contains a _llm_summarize() method that was clearly planned to compress context via an LLM call, but the implementation is an empty placeholder that returns an empty string. Long-running sessions (research, multi-step code generation, customer support) lose tool results, prior decisions, and task intent that could be preserved through intelligent summarisation.

Current behaviour

The summarise strategy in the context optimiser is unimplemented:

# praisonaiagents/context/optimizer.py
def _llm_summarize(self, messages: List[dict]) -> str:
    # TODO: integrate LLM summarization
    return ""   # placeholder — returns empty string, never calls the LLM

Token estimation uses a crude heuristic with no model-specific tokeniser:

# praisonaiagents/compaction/compactor.py line 54
tokens = len(content) // 4   # 4 chars ≈ 1 token — inaccurate for code, CJK, tool outputs

When context overflows, the Truncate strategy in context/optimizer.py removes the oldest non-system messages indiscriminately — with no protection for prior tool results, in-progress task state, or the framing exchange that gives the agent its purpose. There is no session lineage tracking, so a compressed conversation cannot be traced back to its origin or resumed.

OptimizerStrategy.SUMMARIZE is a defined enum value and the dispatch branch exists, but it routes to the empty _llm_summarize() — meaning selecting the summarise strategy today produces the same truncation result as Truncate, silently.

Desired behaviour

Long-running sessions should be managed via an LLM-driven compression pipeline:

  1. Protect head and tail: Preserve the system prompt + first framing exchange (task intent) and the most recent N tokens (active working context) without modification
  2. Compress the middle: Summarise the compressible region using an auxiliary LLM call with a structured prompt that preserves resolved tasks, in-progress tasks, key tool call/result pairs, and file/path references
  3. Session lineage: Record compression as a state event (end current logical session, open a child session with a parent pointer) so the conversation chain is traceable and resumable
  4. Fallback: If the auxiliary LLM call fails, emit a deterministic static summary (tools called, files touched, message counts) rather than silent truncation

Proposed approach

  • Layer: core SDK (praisonaiagents/context/)
  • Extension point: OptimizerStrategy.SUMMARIZE in context/optimizer.py, backed by a new ContextCompressor class
  • Minimal API sketch:
# praisonaiagents/context/compressor.py  (new)
class ContextCompressor:
    async def compress(
        self,
        messages: List[dict],
        *,
        protect_last_n_tokens: int = 20_000,
        summary_target_tokens: int = 750,
        auxiliary_model: str | None = None,
    ) -> CompressResult:
        head = self._protect_head(messages)
        tail = self._find_tail_by_tokens(messages, protect_last_n_tokens)
        middle = messages[len(head) : len(messages) - len(tail)]
        summary_text = await self._summarize_with_llm(middle, summary_target_tokens, auxiliary_model)
        summary_msg = {"role": "user", "content": f"[Context summary]\n{summary_text}"}
        return CompressResult(
            messages=head + [summary_msg] + tail,
            tokens_saved=token_count(middle) - token_count([summary_msg]),
        )

Resolution sketch

# Before (today) — Truncate strategy in context/optimizer.py
def optimize(self, messages: List[dict]) -> List[dict]:
    while self._token_count(messages) > self.budget:
        messages.pop(1)   # remove oldest non-system message, no preservation logic
    return messages

# After (proposed) — Summarize strategy wired to ContextCompressor
async def optimize(self, messages: List[dict]) -> List[dict]:
    if self._token_count(messages) <= self.budget:
        return messages
    compressor = ContextCompressor(llm=self._llm, tokenizer=self._tokenizer)
    result = await compressor.compress(
        messages,
        protect_last_n_tokens=20_000,
        summary_target_tokens=750,
        auxiliary_model=self.auxiliary_model,
    )
    return result.messages

Severity

High — Any agent used for multi-turn research, iterative code generation, or customer support will eventually exhaust the context window. Silent truncation deletes the task history the agent needs to complete its work, causing confusing regressions mid-session. The presence of the _llm_summarize() placeholder and the OptimizerStrategy.SUMMARIZE enum confirm this capability was planned; it simply needs a working implementation.

Validation

Confirmed by reading:

  • praisonaiagents/context/optimizer.py_llm_summarize() returns "" with a TODO comment; OptimizerStrategy.SUMMARIZE branch calls it
  • praisonaiagents/compaction/compactor.py:54len(content) // 4 heuristic, no tokeniser import
  • praisonaiagents/context/tokens.py — heuristic-only estimation, no tiktoken or model-native tokeniser
  • praisonaiagents/session.py — no parent_session_id or compression lineage field; sessions are isolated with no chain tracking

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions