Summary
When an agent's conversation grows beyond the model's context window, praisonaiagents falls back to heuristic message truncation — silently discarding the oldest turns. The context/optimizer.py module contains a _llm_summarize() method that was clearly planned to compress context via an LLM call, but the implementation is an empty placeholder that returns an empty string. Long-running sessions (research, multi-step code generation, customer support) lose tool results, prior decisions, and task intent that could be preserved through intelligent summarisation.
Current behaviour
The summarise strategy in the context optimiser is unimplemented:
# praisonaiagents/context/optimizer.py
def _llm_summarize(self, messages: List[dict]) -> str:
# TODO: integrate LLM summarization
return "" # placeholder — returns empty string, never calls the LLM
Token estimation uses a crude heuristic with no model-specific tokeniser:
# praisonaiagents/compaction/compactor.py line 54
tokens = len(content) // 4 # 4 chars ≈ 1 token — inaccurate for code, CJK, tool outputs
When context overflows, the Truncate strategy in context/optimizer.py removes the oldest non-system messages indiscriminately — with no protection for prior tool results, in-progress task state, or the framing exchange that gives the agent its purpose. There is no session lineage tracking, so a compressed conversation cannot be traced back to its origin or resumed.
OptimizerStrategy.SUMMARIZE is a defined enum value and the dispatch branch exists, but it routes to the empty _llm_summarize() — meaning selecting the summarise strategy today produces the same truncation result as Truncate, silently.
Desired behaviour
Long-running sessions should be managed via an LLM-driven compression pipeline:
- Protect head and tail: Preserve the system prompt + first framing exchange (task intent) and the most recent N tokens (active working context) without modification
- Compress the middle: Summarise the compressible region using an auxiliary LLM call with a structured prompt that preserves resolved tasks, in-progress tasks, key tool call/result pairs, and file/path references
- Session lineage: Record compression as a state event (end current logical session, open a child session with a parent pointer) so the conversation chain is traceable and resumable
- Fallback: If the auxiliary LLM call fails, emit a deterministic static summary (tools called, files touched, message counts) rather than silent truncation
Proposed approach
- Layer: core SDK (
praisonaiagents/context/)
- Extension point:
OptimizerStrategy.SUMMARIZE in context/optimizer.py, backed by a new ContextCompressor class
- Minimal API sketch:
# praisonaiagents/context/compressor.py (new)
class ContextCompressor:
async def compress(
self,
messages: List[dict],
*,
protect_last_n_tokens: int = 20_000,
summary_target_tokens: int = 750,
auxiliary_model: str | None = None,
) -> CompressResult:
head = self._protect_head(messages)
tail = self._find_tail_by_tokens(messages, protect_last_n_tokens)
middle = messages[len(head) : len(messages) - len(tail)]
summary_text = await self._summarize_with_llm(middle, summary_target_tokens, auxiliary_model)
summary_msg = {"role": "user", "content": f"[Context summary]\n{summary_text}"}
return CompressResult(
messages=head + [summary_msg] + tail,
tokens_saved=token_count(middle) - token_count([summary_msg]),
)
Resolution sketch
# Before (today) — Truncate strategy in context/optimizer.py
def optimize(self, messages: List[dict]) -> List[dict]:
while self._token_count(messages) > self.budget:
messages.pop(1) # remove oldest non-system message, no preservation logic
return messages
# After (proposed) — Summarize strategy wired to ContextCompressor
async def optimize(self, messages: List[dict]) -> List[dict]:
if self._token_count(messages) <= self.budget:
return messages
compressor = ContextCompressor(llm=self._llm, tokenizer=self._tokenizer)
result = await compressor.compress(
messages,
protect_last_n_tokens=20_000,
summary_target_tokens=750,
auxiliary_model=self.auxiliary_model,
)
return result.messages
Severity
High — Any agent used for multi-turn research, iterative code generation, or customer support will eventually exhaust the context window. Silent truncation deletes the task history the agent needs to complete its work, causing confusing regressions mid-session. The presence of the _llm_summarize() placeholder and the OptimizerStrategy.SUMMARIZE enum confirm this capability was planned; it simply needs a working implementation.
Validation
Confirmed by reading:
praisonaiagents/context/optimizer.py — _llm_summarize() returns "" with a TODO comment; OptimizerStrategy.SUMMARIZE branch calls it
praisonaiagents/compaction/compactor.py:54 — len(content) // 4 heuristic, no tokeniser import
praisonaiagents/context/tokens.py — heuristic-only estimation, no tiktoken or model-native tokeniser
praisonaiagents/session.py — no parent_session_id or compression lineage field; sessions are isolated with no chain tracking
Summary
When an agent's conversation grows beyond the model's context window,
praisonaiagentsfalls back to heuristic message truncation — silently discarding the oldest turns. Thecontext/optimizer.pymodule contains a_llm_summarize()method that was clearly planned to compress context via an LLM call, but the implementation is an empty placeholder that returns an empty string. Long-running sessions (research, multi-step code generation, customer support) lose tool results, prior decisions, and task intent that could be preserved through intelligent summarisation.Current behaviour
The summarise strategy in the context optimiser is unimplemented:
Token estimation uses a crude heuristic with no model-specific tokeniser:
When context overflows, the
Truncatestrategy incontext/optimizer.pyremoves the oldest non-system messages indiscriminately — with no protection for prior tool results, in-progress task state, or the framing exchange that gives the agent its purpose. There is no session lineage tracking, so a compressed conversation cannot be traced back to its origin or resumed.OptimizerStrategy.SUMMARIZEis a defined enum value and the dispatch branch exists, but it routes to the empty_llm_summarize()— meaning selecting the summarise strategy today produces the same truncation result asTruncate, silently.Desired behaviour
Long-running sessions should be managed via an LLM-driven compression pipeline:
Proposed approach
praisonaiagents/context/)OptimizerStrategy.SUMMARIZEincontext/optimizer.py, backed by a newContextCompressorclassResolution sketch
Severity
High — Any agent used for multi-turn research, iterative code generation, or customer support will eventually exhaust the context window. Silent truncation deletes the task history the agent needs to complete its work, causing confusing regressions mid-session. The presence of the
_llm_summarize()placeholder and theOptimizerStrategy.SUMMARIZEenum confirm this capability was planned; it simply needs a working implementation.Validation
Confirmed by reading:
praisonaiagents/context/optimizer.py—_llm_summarize()returns""with a TODO comment;OptimizerStrategy.SUMMARIZEbranch calls itpraisonaiagents/compaction/compactor.py:54—len(content) // 4heuristic, no tokeniser importpraisonaiagents/context/tokens.py— heuristic-only estimation, no tiktoken or model-native tokeniserpraisonaiagents/session.py— noparent_session_idor compression lineage field; sessions are isolated with no chain tracking