Self-Pruning Agent

An experimental Next.js application demonstrating autonomous context window management for AI agents. Inspired by the blog post "Teaching AI Agents to Forget to Stop Forgetting".

🧠 The Concept

Traditional AI agents are limited by fixed context windows. As conversations grow, older messages get dropped arbitrarily. This project explores a smarter approach: let the AI decide what to forget.

The agent:

Sees its context budget - Token usage is visible in the system prompt
Tags messages with metadata - Each message has [msg:XXX][tokens:N][tally:N]
Suggests what to prune - When appropriate, outputs <prune_suggestions> with confidence scores
Preserves context - Generates a summary of pruned content as a breadcrumb
Filter happens server-side - Approved prunes are filtered on subsequent requests

🏗️ Architecture

┌──────────────────────────────────────────────────────────────────┐
│                          Client                                   │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │   useChat   │  │DebugPanel   │  │  Context Budget Display │  │
│  └──────┬──────┘  └─────────────┘  └─────────────────────────┘  │
└─────────┼────────────────────────────────────────────────────────┘
          │
          ▼
┌──────────────────────────────────────────────────────────────────┐
│                       API Route (/api/chat)                       │
│                                                                   │
│  1. Inject metadata [msg:XXX][tokens:N][tally:N]                 │
│  2. Filter previously pruned messages                             │
│  3. Inject context summary breadcrumb                             │
│  4. Send to model                                                 │
│  5. Parse <prune_suggestions> from response                       │
│  6. Store context summary + pruned IDs for next request          │
└──────────────────────────────────────────────────────────────────┘

📁 Project Structure

app/src/
├── app/
│   ├── api/
│   │   ├── chat/route.ts      # Main chat API with pruning logic
│   │   └── usage/route.ts     # Token usage endpoint
│   └── page.tsx               # Chat UI
├── components/
│   ├── context-budget.tsx     # Token budget progress bar
│   ├── debug-panel.tsx        # Event log viewer
│   ├── prune-archive.tsx      # Archived pruned messages
│   └── prune-settings.tsx     # Pruning configuration UI
├── lib/
│   ├── middleware/
│   │   ├── metadata-injector.ts   # Adds [msg:XXX] tags to messages
│   │   ├── prune-parser.ts        # Extracts <prune_suggestions> XML
│   │   ├── prune-executor.ts      # Filters pruned messages
│   │   └── token-counter.ts       # Tiktoken-based token counting
│   ├── prompts/
│   │   └── system.ts          # System prompt with pruning instructions
│   ├── usage-store.ts         # Global state for usage + prune tracking
│   └── config.ts              # Configuration constants
└── hooks/
    └── use-prune-manager.ts   # Prune state management hook

🚀 Getting Started

Prerequisites

Node.js 18+
pnpm (or npm/yarn)
OpenAI API key

Installation

cd app
npm install

Environment Setup

Create .env file:

OPENAI_API_KEY=your_openai_api_key_here

Run Development Server

npm run dev

Open http://localhost:3000

🧪 Testing the Pruning Flow

Build context: "Get me the weather for New York, LA, Chicago, Miami, and Seattle"
Synthesize: "Which city has the best weather? Give me a comparison."
Close topic: "Perfect, I've noted that. The weather research is complete."
Pivot: "Now, what's the square root of 144?"

You should see:

✂️ PRUNE_SUGGESTION events in the debug panel
🗑️ PRUNE_EXECUTED events for approved suggestions
[Context Summary] breadcrumb injected on subsequent requests

🔧 Tech Stack

Technology	Purpose
Next.js 16	App framework
AI SDK v6	LLM streaming + tool calling
OpenAI GPT-4o-mini	Language model
Tiktoken	Token counting
Tailwind CSS 4	Styling
Radix UI	UI primitives
Zustand	State management

📊 Key Features

Metadata Injection: Messages tagged with [msg:XXX][tokens:N][tally:N]
Context Budget Display: Visual progress bar showing token usage
Prune Suggestions: Model outputs XML with message IDs and confidence scores
Context Summarization: Full conversation summary preserved as breadcrumb
Debug Panel: Real-time event logging for tool calls, messages, and pruning
Server-Side Filtering: Pruned messages never sent to model on subsequent requests

📝 How Pruning Works

System Prompt Instructions

The model receives instructions to:

Monitor context budget
Suggest pruning when topics close or data is synthesized
Include a <context_summary> of the entire conversation
Format suggestions as XML with confidence scores

Prune Suggestion Format

<prune_suggestions>
  <context_summary>User researched weather for 5 cities. Best: LA at 85°F. Then searched Austin restaurants.</context_summary>
  <suggestion id="msg:002" confidence="0.9" tokens="114" reason="Weather data synthesized" />
  <suggestion id="msg:004" confidence="0.85" tokens="286" reason="Comparison no longer needed" />
</prune_suggestions>

Server-Side Execution

Parse suggestions from model response
Filter by confidence threshold (default: 0.8)
Store approved IDs + context summary in global state
On next request, skip pruned messages and inject summary

🔮 Future Improvements

Persist prune state to database (currently in-memory)
User approval UI for prune suggestions
Configurable confidence thresholds per message type
Migrate to ToolLoopAgent pattern for cleaner SDK integration
A/B testing prune accuracy vs manual truncation

📚 References

Teaching AI Agents to Forget to Stop Forgetting - Inspiration blog post
AI SDK Documentation - Vercel AI SDK
Context Caching Strategies - OpenAI docs

📄 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
public		public
src		src
.gitignore		.gitignore
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Pruning Agent

🧠 The Concept

🏗️ Architecture

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

Environment Setup

Run Development Server

🧪 Testing the Pruning Flow

🔧 Tech Stack

📊 Key Features

📝 How Pruning Works

System Prompt Instructions

Prune Suggestion Format

Server-Side Execution

🔮 Future Improvements

📚 References

📄 License

About

Uh oh!

Releases

Packages

Languages

Phoenixrr2113/self-managing-agent

Folders and files

Latest commit

History

Repository files navigation

Self-Pruning Agent

🧠 The Concept

🏗️ Architecture

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

Environment Setup

Run Development Server

🧪 Testing the Pruning Flow

🔧 Tech Stack

📊 Key Features

📝 How Pruning Works

System Prompt Instructions

Prune Suggestion Format

Server-Side Execution

🔮 Future Improvements

📚 References

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages