🚀 Feature Request
📋 Summary
Implementation of a Hierarchical Context Manager to enable infinite agent execution by managing LLM context windows through automatic summarization and memory swapping.
🎯 Problem Statement
Is your feature request related to a problem? Please describe:
- What problem does this solve? It solves the
ContextLengthExceeded error in long-running autonomous agents. Currently, as the agent's history grows, the token count inevitably exceeds the model's limit, causing crashes or forcing the truncation of critical early context.
- What use case does this enable? It enables "Background Agents" or "Daemon Agents" that can run indefinitely (hours or days) performing tasks, monitoring systems, or coding, without losing the "train of thought" or the semantic meaning of earlier interactions.
💡 Proposed Solution
Describe the solution you'd like:
- How should this feature work? It should function like a Operating System's Virtual Memory (Swapping). When the context window reaches a defined threshold (e.g., 75%), the system should identify the oldest messages, generate a semantic summary using the LLM, move that summary to a
long_term_memory buffer, and clear the immediate context (short_term_memory). This keeps the active token count low while preserving information density.
- What API or interface would you expect? A
HierarchicalContextManager class that can be injected into the agent's core loop, exposing methods for addMessage and getContext.
📦 Package Scope
Which AIOS-FullStack package should this feature belong to?
📋 Code Example
What would the API look like? Provide a code example:
import { HierarchicalContextManager } from '@synkra/aios-core/memory';
// 1. Injeção de Dependências (Conectando com as ferramentas do AIOS)
const contextManager = new HierarchicalContextManager({
maxTokens: 8192,
summarizationThreshold: 0.75,
tokenizer: aios.tokenizer, // Usa o tokenizer nativo do projeto
summarizer: aios.llmService // Usa o LLM do projeto para resumir
});
// 2. Monitoramento (Opcional - Mostra domínio de Eventos)
contextManager.on('swap:complete', (data) => {
console.log(`Memória compactada: ${data.messagesRemoved} mensagens resumidas.`);
});
// 3. No loop do agente
await contextManager.addMessage({ role: 'user', content: input });
// 4. Enviando contexto seguro para o LLM
const safeContext = contextManager.getContext();
await llm.chat(safeContext);
🔄 Alternatives Considered
Describe alternatives you've considered:
- Sliding Window: simply dropping the oldest messages. This is bad because it loses the "why" behind the agent's current state.
- Vector DB only: storing everything in a vector database. This is useful for retrieval but doesn't solve the immediate context window limit for the LLM reasoning process.
- Summarization is the hybrid approach: keeping a running summary (semantic compression) allows the agent to "remember" everything relevant in fewer tokens.
🎨 Implementation Ideas
If you have ideas on how this could be implemented:
- Architecture: A standalone class that wraps the message array.
- Dependencies: A tokenizer library (like
js-tiktoken) is required to accurately count tokens locally before sending requests.
- Challenges: The main challenge is the summarization latency. This can be mitigated by running summarization asynchronously (predictive swapping) before the limit is hit.
- Performance: I have prepared a draft implementation attached below that uses Incremental Token Counting (O(1) complexity) to avoid re-calculating the entire history on every new message.
📊 Impact Assessment
How would this feature impact AIOS-FullStack?
🔧 Technical Requirements
- Performance: Adds minimal overhead to the main loop. Summarization is an extra LLM call but saves costs by reducing the size of every subsequent call.
- Security: No specific security implications, handles data already exposed to the agent.
- Dependencies:
js-tiktoken or similar for precise token counting.
- Testing: Unit tests for token counting accuracy and integration tests verifying that the context never exceeds
maxTokens.
📖 Documentation
What documentation would be needed?
🌟 Priority
How important is this feature to you?
👥 Community Interest
🔗 Related Issues
✅ Checklist
♠ Arquivo Completo: Hierarchical Context Manager.js
🚀 Feature Request
📋 Summary
Implementation of a Hierarchical Context Manager to enable infinite agent execution by managing LLM context windows through automatic summarization and memory swapping.
🎯 Problem Statement
Is your feature request related to a problem? Please describe:
ContextLengthExceedederror in long-running autonomous agents. Currently, as the agent's history grows, the token count inevitably exceeds the model's limit, causing crashes or forcing the truncation of critical early context.💡 Proposed Solution
Describe the solution you'd like:
long_term_memorybuffer, and clear the immediate context (short_term_memory). This keeps the active token count low while preserving information density.HierarchicalContextManagerclass that can be injected into the agent's core loop, exposing methods foraddMessageandgetContext.📦 Package Scope
Which AIOS-FullStack package should this feature belong to?
📋 Code Example
What would the API look like? Provide a code example:
🔄 Alternatives Considered
Describe alternatives you've considered:
🎨 Implementation Ideas
If you have ideas on how this could be implemented:
js-tiktoken) is required to accurately count tokens locally before sending requests.📊 Impact Assessment
How would this feature impact AIOS-FullStack?
🔧 Technical Requirements
js-tiktokenor similar for precise token counting.maxTokens.📖 Documentation
What documentation would be needed?
🌟 Priority
How important is this feature to you?
👥 Community Interest
🔗 Related Issues
✅ Checklist
♠ Arquivo Completo: Hierarchical Context Manager.js