Summary
The chat agent currently uses a tool-based approach for memory - the LLM must explicitly call add_memory and search_memory tools. This gives users control (LLM asks permission before storing), but misses the automatic fact extraction that memory-proxy provides.
Current Behavior
- Memory tools are available to the LLM
- LLM decides when to store (and asks user permission per system prompt)
- No automatic memory retrieval injected into context
- User doesn't see when memories are stored/retrieved
Proposed Enhancement
Add an --auto-memory flag (or similar) that enables the full memory-proxy pipeline:
- Auto-retrieve: Inject relevant memories into LLM context before each turn
- Auto-extract: Extract and store facts from each conversation turn
- Visual feedback: Show user what memories were retrieved/stored
Possible Modes
| Mode |
Retrieval |
Storage |
Use Case |
--no-memory |
None |
None |
Privacy, testing |
--memory (current) |
Tool-based |
Tool-based |
User control |
--auto-memory |
Automatic |
Automatic |
Seamless experience |
--auto-memory-retrieve |
Automatic |
Tool-based |
Hybrid (best of both?) |
Implementation Notes
- Could leverage
memory_client.chat() which already implements the full pipeline
- Or call
augment_chat_request() + extract_and_store_facts_and_summaries() directly
- Need to consider: should auto-mode still respect the "ask permission" guideline?
Related
Summary
The chat agent currently uses a tool-based approach for memory - the LLM must explicitly call
add_memoryandsearch_memorytools. This gives users control (LLM asks permission before storing), but misses the automatic fact extraction thatmemory-proxyprovides.Current Behavior
Proposed Enhancement
Add an
--auto-memoryflag (or similar) that enables the full memory-proxy pipeline:Possible Modes
--no-memory--memory(current)--auto-memory--auto-memory-retrieveImplementation Notes
memory_client.chat()which already implements the full pipelineaugment_chat_request()+extract_and_store_facts_and_summaries()directlyRelated
agent_cli/memory/engine.py-process_chat_request()has the full pipeline