Skip to content

[FEATURE] Hierarchical Context Management for Infinite Agent Execution #447

@LorenzaVolponi

Description

@LorenzaVolponi

🚀 Feature Request

📋 Summary

Implementation of a Hierarchical Context Manager to enable infinite agent execution by managing LLM context windows through automatic summarization and memory swapping.

🎯 Problem Statement

Is your feature request related to a problem? Please describe:

  • What problem does this solve? It solves the ContextLengthExceeded error in long-running autonomous agents. Currently, as the agent's history grows, the token count inevitably exceeds the model's limit, causing crashes or forcing the truncation of critical early context.
  • What use case does this enable? It enables "Background Agents" or "Daemon Agents" that can run indefinitely (hours or days) performing tasks, monitoring systems, or coding, without losing the "train of thought" or the semantic meaning of earlier interactions.

💡 Proposed Solution

Describe the solution you'd like:

  • How should this feature work? It should function like a Operating System's Virtual Memory (Swapping). When the context window reaches a defined threshold (e.g., 75%), the system should identify the oldest messages, generate a semantic summary using the LLM, move that summary to a long_term_memory buffer, and clear the immediate context (short_term_memory). This keeps the active token count low while preserving information density.
  • What API or interface would you expect? A HierarchicalContextManager class that can be injected into the agent's core loop, exposing methods for addMessage and getContext.

📦 Package Scope

Which AIOS-FullStack package should this feature belong to?

  • @synkra/aios-core/workspace (overall framework)
  • @synkra/aios-core/core (meta-agent, task management)
  • @synkra/aios-core/memory (vector storage, semantic search)
  • @synkra/aios-core/security (sanitization, vulnerability scanning)
  • @synkra/aios-core/performance (monitoring, profiling, optimization)
  • @synkra/aios-core/telemetry (analytics, error reporting, metrics)
  • New package: ________________

📋 Code Example

What would the API look like? Provide a code example:

import { HierarchicalContextManager } from '@synkra/aios-core/memory';

// 1. Injeção de Dependências (Conectando com as ferramentas do AIOS)
const contextManager = new HierarchicalContextManager({
  maxTokens: 8192,
  summarizationThreshold: 0.75,
  tokenizer: aios.tokenizer,   // Usa o tokenizer nativo do projeto
  summarizer: aios.llmService  // Usa o LLM do projeto para resumir
});

// 2. Monitoramento (Opcional - Mostra domínio de Eventos)
contextManager.on('swap:complete', (data) => {
  console.log(`Memória compactada: ${data.messagesRemoved} mensagens resumidas.`);
});

// 3. No loop do agente
await contextManager.addMessage({ role: 'user', content: input });

// 4. Enviando contexto seguro para o LLM
const safeContext = contextManager.getContext();
await llm.chat(safeContext);

🔄 Alternatives Considered

Describe alternatives you've considered:

  • Sliding Window: simply dropping the oldest messages. This is bad because it loses the "why" behind the agent's current state.
  • Vector DB only: storing everything in a vector database. This is useful for retrieval but doesn't solve the immediate context window limit for the LLM reasoning process.
  • Summarization is the hybrid approach: keeping a running summary (semantic compression) allows the agent to "remember" everything relevant in fewer tokens.

🎨 Implementation Ideas

If you have ideas on how this could be implemented:

  • Architecture: A standalone class that wraps the message array.
  • Dependencies: A tokenizer library (like js-tiktoken) is required to accurately count tokens locally before sending requests.
  • Challenges: The main challenge is the summarization latency. This can be mitigated by running summarization asynchronously (predictive swapping) before the limit is hit.
  • Performance: I have prepared a draft implementation attached below that uses Incremental Token Counting (O(1) complexity) to avoid re-calculating the entire history on every new message.

📊 Impact Assessment

How would this feature impact AIOS-FullStack?

  • Breaking change - requires major version bump
  • New functionality - backward compatible
  • Enhancement to existing feature
  • Performance improvement
  • Developer experience improvement

🔧 Technical Requirements

  • Performance: Adds minimal overhead to the main loop. Summarization is an extra LLM call but saves costs by reducing the size of every subsequent call.
  • Security: No specific security implications, handles data already exposed to the agent.
  • Dependencies: js-tiktoken or similar for precise token counting.
  • Testing: Unit tests for token counting accuracy and integration tests verifying that the context never exceeds maxTokens.

📖 Documentation

What documentation would be needed?

  • API documentation
  • Usage examples
  • Tutorial/guide
  • Migration guide (if breaking change)

🌟 Priority

How important is this feature to you?

  • Critical - Blocking my use case
  • High - Would significantly improve my workflow
  • Medium - Nice to have
  • Low - Minor improvement

👥 Community Interest

  • I would be willing to contribute to implementing this
  • I can help with documentation
  • I can help with testing
  • I would be interested in reviewing the implementation

🔗 Related Issues

  • N/A (Initial proposal)

✅ Checklist

  • I have searched existing issues and feature requests
  • I have provided a clear use case for this feature
  • I have considered the impact on existing functionality
  • I have provided enough detail for evaluation

♠ Arquivo Completo: Hierarchical Context Manager.js

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: coreCore framework (.aios-core/core/)priority: P3Medium — affects some usersstatus: confirmedIssue confirmed, ready for worktype: featureNew feature request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions