🔬 Agentic Apps Internals

How AI coding agents actually work under the hood
System prompts, tool architectures, session traces, and implementation patterns — captured from real agent sessions.

by AgenticLoops.ai - for Engineers, from Engineers

🔍 What's Inside

This repository contains captured API traffic, decoded system prompts, complete tool schemas, and turn-by-turn session traces from real AI coding agent sessions. Everything here comes from actual intercepted data — no guesswork, no assumptions.

For each agent you'll find:

System Prompts — the exact instructions that shape agent behavior, extracted from live API traffic
Tool Catalogs — complete tool definitions with full JSON schemas, descriptions, and categories
Session Traces — turn-by-turn breakdowns showing every LLM call, tool use, and token count
Prompt Engineering Analysis — how each agent structures its prompts, what sections they include, and how prompts change between modes

Interesting things you'll discover:

📖 The exact system prompt that tells Claude Code to "avoid over-engineering" — and the detailed rules that follow it
🔄 Why Copilot sends 8 overhead requests before your task even starts — the multi-model routing pipeline
⚖️ 5 tools vs 65 tools — same task, radically different approaches — Codex CLI's minimalism vs Copilot's comprehensive toolset
🧠 Codex CLI's explicit engineering values: "Clarity, Pragmatism, Rigor" — personality encoded directly in the system prompt

Useful for:

AI Engineers building coding assistants or agent systems
Researchers studying LLM agent architectures and prompt design
Developers learning advanced prompt engineering from production systems
Anyone curious about what happens behind /ask, /agent, or /plan

🗺️ Start Here

New to the repo? Follow this reading path:

Pick an agent — Choose Claude Code, Codex CLI, Copilot, or OpenCode and read its README
System Prompt — Read the agent's system-prompt.md to see the exact instructions it receives
Prompt Engineering — Read PROMPT-ENGINEERING.md for how the prompt is structured and how it changes between modes
Tool Catalog — Browse TOOL-USE.md for the full tool definitions with JSON schemas
Session Traces — Check session.md for turn-by-turn breakdowns of real agent sessions

🤖 Agents Analyzed

Agent	Type	Main Model	Overhead Model	Agent Tools
Claude Code	CLI	claude-opus-4-6	claude-haiku-4-5	24
Codex CLI	CLI	gpt-5.3-codex	—	5
GitHub Copilot	VS Code	user-selected ¹	gpt-4o-mini	65
OpenCode	CLI	gpt-5.3-codex	—	10
Cursor	IDE	—	—	—
Windsurf	IDE	—	—	—
Cline	VS Code	—	—	—
Aider	CLI	—	—	—

¹ GitHub Copilot lets users select their main model. This analysis uses gpt-5.3-codex, which was selected during our capture session. The overhead model (gpt-4o-mini) is not user-selectable. Other model choices may produce different behavior.

🔬 Research Approach

All data in this repository was captured using AgentLens, an open-source MITM proxy that intercepts LLM API traffic during normal agent use.

How it works:

Capture — AgentLens sits between the agent and the LLM API, recording every request and response (system prompts, tool definitions, messages, token usage, timing)
Export — Raw session data is exported as structured JSON with per-request breakdowns
Analyze — AgentLens exports are manually analyzed and processed into structured markdown: per-agent READMEs, prompt engineering analysis, tool catalogs, and session traces
Verify — Tool counts, prompt completeness, and schema validity are spot-checked against the raw data

See RESEARCH.md for the full methodology, deliverable structure, and analysis pipeline details.

⚙️ How Agents Work

Every AI coding agent in this repository follows the same fundamental loop — Reason → Act → Observe:

Reason — The LLM receives the user's task plus conversation context. It decides what to do next and whether it needs to use a tool.
Act — If a tool is needed, the agent executes it (read a file, run a command, search code). The tool result is appended to the context.
Observe — The updated context (with tool results) is sent back to the LLM for the next reasoning step. The loop repeats until the task is complete.

All three agents analyzed here implement this exact pattern — but the details differ dramatically: how many tools they expose (5 vs 65), how they restrict capabilities between modes, whether they cache prompts, and how they route requests through multiple models.

📚 Deep dive: How Agents Work: The Patterns Behind the Magic
🛠️ Build your own: Agentic AI Engineering Tutorials

💡 Key Insights

These are patterns we found interesting while analyzing the captured sessions — the kind of implementation details you won't find in product docs:

Same loop, different philosophies — All three agents implement Reason-Act-Observe, but Codex CLI does it with 5 tools and a "just use the shell" philosophy, while Copilot provides 65 specialized tools for granular control. Claude sits in the middle with 24.
Mode restrictions reveal design thinking — Claude Code keeps the same 24 tools in both Agent and Plan mode, controlling behavior through runtime-injected <system-reminder> tags in conversation turns ("Plan mode is active... you MUST NOT make any edits") rather than the system prompt itself (which is identical across modes). Copilot takes the opposite approach: it physically removes write/execute tools in Plan mode (65 → 22), making unsafe actions structurally impossible.
Prompt caching is a major differentiator — Claude Code reads ~595K cached tokens and creates ~77K cache tokens per session. This means the large system prompt and tool definitions are sent once and reused across turns. Neither Codex CLI nor Copilot show prompt caching in their captured sessions.
Multi-model pipelines hide overhead — Claude Code uses claude-haiku for warmup/quota checks and extracting file paths from command outputs (not titling or categorization). Copilot uses gpt-4o-mini for titling and activity summarization in agent mode, with request categorization only in ask mode. Codex CLI skips this entirely — zero overhead requests.
System prompts encode engineering culture — Codex CLI's prompt opens with explicit values ("Clarity, Pragmatism, Rigor"). Claude Code's prompt includes detailed anti-over-engineering rules ("Don't add features beyond what was asked. Three similar lines of code is better than a premature abstraction"). These aren't just instructions — they shape the agent's personality.
Tool design reflects trust boundaries — Codex CLI trusts a single exec_command tool for nearly everything (file ops, git, testing). Copilot separates concerns across 65 tools with distinct schemas and permissions. Claude provides dedicated tools (Glob, Grep, Read, Edit) but warns against falling back to shell equivalents.

📂 Repository Structure

Each agent follows the same directory pattern:

<agent-name>/
├── README.md                # Agent summary + session metrics
├── PROMPT-ENGINEERING.md    # System prompt analysis (structure, sections, mode differences)
├── TOOL-USE.md              # Complete tool catalog with full JSON schemas
├── agent-mode/
│   ├── system-prompt.md     # Exact system prompt text
│   ├── user-prompt.md       # User message with injected context (skills, project, task)
│   ├── session.md           # Session summary
│   ├── transcript.md        # Full session transcript
│   └── log/
│       ├── session.json     # Raw captured API traffic
│       └── session.csv      # Per-request metrics (tokens, cost, timing)
└── plan-mode/               # Same structure, repeated per mode
    └── ...

Agents: claude-code-cli/ · codex-cli/ · github-copilot/

Other files: .tools/alens (capture launcher) · RESEARCH.md (methodology) · CONTRIBUTING.md

⚠️ Disclaimer

This repository is for educational and research purposes only. All trademarks belong to their respective owners. The goal is to understand and learn from these systems, not to replicate proprietary services.

📜 Legal Notice

This analysis was conducted through observation of network traffic during normal use of publicly available software. No security measures were bypassed, no proprietary source code was accessed, and no terms of service were violated beyond what is necessary for standard interoperability research.

This is independent research and is not affiliated with, endorsed by, or connected to GitHub, Microsoft, Anthropic, OpenAI, or any other company analyzed.

⚖️ License

MIT — See individual agent analyses for their respective product licenses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 Agentic Apps Internals

🔍 What's Inside

🗺️ Start Here

🤖 Agents Analyzed

🔬 Research Approach

⚙️ How Agents Work

💡 Key Insights

📂 Repository Structure

⚠️ Disclaimer

📜 Legal Notice

⚖️ License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.docs		.docs
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.tools		.tools
claude-code-cli		claude-code-cli
codex-cli		codex-cli
github-copilot		github-copilot
opencode		opencode
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
RESEARCH.md		RESEARCH.md

Folders and files

Latest commit

History

Repository files navigation

🔬 Agentic Apps Internals

🔍 What's Inside

🗺️ Start Here

🤖 Agents Analyzed

🔬 Research Approach

⚙️ How Agents Work

💡 Key Insights

📂 Repository Structure

⚠️ Disclaimer

📜 Legal Notice

⚖️ License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages