An experimental Next.js application demonstrating autonomous context window management for AI agents. Inspired by the blog post "Teaching AI Agents to Forget to Stop Forgetting".
Traditional AI agents are limited by fixed context windows. As conversations grow, older messages get dropped arbitrarily. This project explores a smarter approach: let the AI decide what to forget.
The agent:
- Sees its context budget - Token usage is visible in the system prompt
- Tags messages with metadata - Each message has
[msg:XXX][tokens:N][tally:N] - Suggests what to prune - When appropriate, outputs
<prune_suggestions>with confidence scores - Preserves context - Generates a summary of pruned content as a breadcrumb
- Filter happens server-side - Approved prunes are filtered on subsequent requests
┌──────────────────────────────────────────────────────────────────┐
│ Client │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ useChat │ │DebugPanel │ │ Context Budget Display │ │
│ └──────┬──────┘ └─────────────┘ └─────────────────────────┘ │
└─────────┼────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ API Route (/api/chat) │
│ │
│ 1. Inject metadata [msg:XXX][tokens:N][tally:N] │
│ 2. Filter previously pruned messages │
│ 3. Inject context summary breadcrumb │
│ 4. Send to model │
│ 5. Parse <prune_suggestions> from response │
│ 6. Store context summary + pruned IDs for next request │
└──────────────────────────────────────────────────────────────────┘
app/src/
├── app/
│ ├── api/
│ │ ├── chat/route.ts # Main chat API with pruning logic
│ │ └── usage/route.ts # Token usage endpoint
│ └── page.tsx # Chat UI
├── components/
│ ├── context-budget.tsx # Token budget progress bar
│ ├── debug-panel.tsx # Event log viewer
│ ├── prune-archive.tsx # Archived pruned messages
│ └── prune-settings.tsx # Pruning configuration UI
├── lib/
│ ├── middleware/
│ │ ├── metadata-injector.ts # Adds [msg:XXX] tags to messages
│ │ ├── prune-parser.ts # Extracts <prune_suggestions> XML
│ │ ├── prune-executor.ts # Filters pruned messages
│ │ └── token-counter.ts # Tiktoken-based token counting
│ ├── prompts/
│ │ └── system.ts # System prompt with pruning instructions
│ ├── usage-store.ts # Global state for usage + prune tracking
│ └── config.ts # Configuration constants
└── hooks/
└── use-prune-manager.ts # Prune state management hook
- Node.js 18+
- pnpm (or npm/yarn)
- OpenAI API key
cd app
npm installCreate .env file:
OPENAI_API_KEY=your_openai_api_key_herenpm run dev- Build context: "Get me the weather for New York, LA, Chicago, Miami, and Seattle"
- Synthesize: "Which city has the best weather? Give me a comparison."
- Close topic: "Perfect, I've noted that. The weather research is complete."
- Pivot: "Now, what's the square root of 144?"
You should see:
✂️ PRUNE_SUGGESTIONevents in the debug panel🗑️ PRUNE_EXECUTEDevents for approved suggestions[Context Summary]breadcrumb injected on subsequent requests
| Technology | Purpose |
|---|---|
| Next.js 16 | App framework |
| AI SDK v6 | LLM streaming + tool calling |
| OpenAI GPT-4o-mini | Language model |
| Tiktoken | Token counting |
| Tailwind CSS 4 | Styling |
| Radix UI | UI primitives |
| Zustand | State management |
- Metadata Injection: Messages tagged with
[msg:XXX][tokens:N][tally:N] - Context Budget Display: Visual progress bar showing token usage
- Prune Suggestions: Model outputs XML with message IDs and confidence scores
- Context Summarization: Full conversation summary preserved as breadcrumb
- Debug Panel: Real-time event logging for tool calls, messages, and pruning
- Server-Side Filtering: Pruned messages never sent to model on subsequent requests
The model receives instructions to:
- Monitor context budget
- Suggest pruning when topics close or data is synthesized
- Include a
<context_summary>of the entire conversation - Format suggestions as XML with confidence scores
<prune_suggestions>
<context_summary>User researched weather for 5 cities. Best: LA at 85°F. Then searched Austin restaurants.</context_summary>
<suggestion id="msg:002" confidence="0.9" tokens="114" reason="Weather data synthesized" />
<suggestion id="msg:004" confidence="0.85" tokens="286" reason="Comparison no longer needed" />
</prune_suggestions>- Parse suggestions from model response
- Filter by confidence threshold (default: 0.8)
- Store approved IDs + context summary in global state
- On next request, skip pruned messages and inject summary
- Persist prune state to database (currently in-memory)
- User approval UI for prune suggestions
- Configurable confidence thresholds per message type
- Migrate to
ToolLoopAgentpattern for cleaner SDK integration - A/B testing prune accuracy vs manual truncation
- Teaching AI Agents to Forget to Stop Forgetting - Inspiration blog post
- AI SDK Documentation - Vercel AI SDK
- Context Caching Strategies - OpenAI docs
MIT