Context window management lacks LLM-driven compression — _llm_summarize() is a placeholder

## Summary

When an agent's conversation grows beyond the model's context window, `praisonaiagents` falls back to heuristic message truncation — silently discarding the oldest turns. The `context/optimizer.py` module contains a `_llm_summarize()` method that was clearly planned to compress context via an LLM call, but the implementation is an empty placeholder that returns an empty string. Long-running sessions (research, multi-step code generation, customer support) lose tool results, prior decisions, and task intent that could be preserved through intelligent summarisation.

## Current behaviour

The summarise strategy in the context optimiser is unimplemented:

```python
# praisonaiagents/context/optimizer.py
def _llm_summarize(self, messages: List[dict]) -> str:
    # TODO: integrate LLM summarization
    return ""   # placeholder — returns empty string, never calls the LLM
```

Token estimation uses a crude heuristic with no model-specific tokeniser:

```python
# praisonaiagents/compaction/compactor.py line 54
tokens = len(content) // 4   # 4 chars ≈ 1 token — inaccurate for code, CJK, tool outputs
```

When context overflows, the `Truncate` strategy in `context/optimizer.py` removes the oldest non-system messages indiscriminately — with no protection for prior tool results, in-progress task state, or the framing exchange that gives the agent its purpose. There is no session lineage tracking, so a compressed conversation cannot be traced back to its origin or resumed.

`OptimizerStrategy.SUMMARIZE` is a defined enum value and the dispatch branch exists, but it routes to the empty `_llm_summarize()` — meaning selecting the summarise strategy today produces the same truncation result as `Truncate`, silently.

## Desired behaviour

Long-running sessions should be managed via an LLM-driven compression pipeline:

1. **Protect head and tail**: Preserve the system prompt + first framing exchange (task intent) and the most recent N tokens (active working context) without modification
2. **Compress the middle**: Summarise the compressible region using an auxiliary LLM call with a structured prompt that preserves resolved tasks, in-progress tasks, key tool call/result pairs, and file/path references
3. **Session lineage**: Record compression as a state event (end current logical session, open a child session with a parent pointer) so the conversation chain is traceable and resumable
4. **Fallback**: If the auxiliary LLM call fails, emit a deterministic static summary (tools called, files touched, message counts) rather than silent truncation

## Proposed approach

- **Layer**: core SDK (`praisonaiagents/context/`)
- **Extension point**: `OptimizerStrategy.SUMMARIZE` in `context/optimizer.py`, backed by a new `ContextCompressor` class
- **Minimal API sketch**:

```python
# praisonaiagents/context/compressor.py  (new)
class ContextCompressor:
    async def compress(
        self,
        messages: List[dict],
        *,
        protect_last_n_tokens: int = 20_000,
        summary_target_tokens: int = 750,
        auxiliary_model: str | None = None,
    ) -> CompressResult:
        head = self._protect_head(messages)
        tail = self._find_tail_by_tokens(messages, protect_last_n_tokens)
        middle = messages[len(head) : len(messages) - len(tail)]
        summary_text = await self._summarize_with_llm(middle, summary_target_tokens, auxiliary_model)
        summary_msg = {"role": "user", "content": f"[Context summary]\n{summary_text}"}
        return CompressResult(
            messages=head + [summary_msg] + tail,
            tokens_saved=token_count(middle) - token_count([summary_msg]),
        )
```

## Resolution sketch

```python
# Before (today) — Truncate strategy in context/optimizer.py
def optimize(self, messages: List[dict]) -> List[dict]:
    while self._token_count(messages) > self.budget:
        messages.pop(1)   # remove oldest non-system message, no preservation logic
    return messages

# After (proposed) — Summarize strategy wired to ContextCompressor
async def optimize(self, messages: List[dict]) -> List[dict]:
    if self._token_count(messages) <= self.budget:
        return messages
    compressor = ContextCompressor(llm=self._llm, tokenizer=self._tokenizer)
    result = await compressor.compress(
        messages,
        protect_last_n_tokens=20_000,
        summary_target_tokens=750,
        auxiliary_model=self.auxiliary_model,
    )
    return result.messages
```

## Severity

**High** — Any agent used for multi-turn research, iterative code generation, or customer support will eventually exhaust the context window. Silent truncation deletes the task history the agent needs to complete its work, causing confusing regressions mid-session. The presence of the `_llm_summarize()` placeholder and the `OptimizerStrategy.SUMMARIZE` enum confirm this capability was planned; it simply needs a working implementation.

## Validation

Confirmed by reading:
- `praisonaiagents/context/optimizer.py` — `_llm_summarize()` returns `""` with a TODO comment; `OptimizerStrategy.SUMMARIZE` branch calls it
- `praisonaiagents/compaction/compactor.py:54` — `len(content) // 4` heuristic, no tokeniser import
- `praisonaiagents/context/tokens.py` — heuristic-only estimation, no tiktoken or model-native tokeniser
- `praisonaiagents/session.py` — no `parent_session_id` or compression lineage field; sessions are isolated with no chain tracking


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Context window management lacks LLM-driven compression — _llm_summarize() is a placeholder #1806

Summary

Current behaviour

Desired behaviour

Proposed approach

Resolution sketch

Severity

Validation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Context window management lacks LLM-driven compression — _llm_summarize() is a placeholder #1806

Description

Summary

Current behaviour

Desired behaviour

Proposed approach

Resolution sketch

Severity

Validation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions