diff --git a/docs/README.skills.md b/docs/README.skills.md
index 046ca7502..e87c20259 100644
--- a/docs/README.skills.md
+++ b/docs/README.skills.md
@@ -48,6 +48,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to
 | [arize-trace](../skills/arize-trace/SKILL.md) | INVOKE THIS SKILL when downloading or exporting Arize traces and spans. Covers exporting traces by ID, sessions by ID, and debugging LLM application issues using the ax CLI. | `references/ax-profiles.md`<br />`references/ax-setup.md` |
 | [aspire](../skills/aspire/SKILL.md) | Aspire skill covering the Aspire CLI, AppHost orchestration, service discovery, integrations, MCP server, VS Code extension, Dev Containers, GitHub Codespaces, templates, dashboard, and deployment. Use when the user asks to create, run, debug, configure, deploy, or troubleshoot an Aspire distributed application. | `references/architecture.md`<br />`references/cli-reference.md`<br />`references/dashboard.md`<br />`references/deployment.md`<br />`references/integrations-catalog.md`<br />`references/mcp-server.md`<br />`references/polyglot-apis.md`<br />`references/testing.md`<br />`references/troubleshooting.md` |
 | [aspnet-minimal-api-openapi](../skills/aspnet-minimal-api-openapi/SKILL.md) | Create ASP.NET Minimal API endpoints with proper OpenAPI documentation | None |
+| [async-profiler](../skills/async-profiler/SKILL.md) | Install, run, and analyze async-profiler for Java — low-overhead sampling profiler producing flamegraphs, JFR recordings, and allocation profiles. Use for: "install async-profiler", "set up Java profiling", "Failed to open perf_events", "what JVM flags for profiling", "capture a flamegraph", "profile CPU/memory/allocations/lock contention", "profile my Spring Boot app", "generate a JFR recording", "heap keeps growing", "what does this flamegraph mean", "how do I read a flamegraph", "interpret profiling results", "open a .jfr file", "what's causing my CPU hotspot", "wide frame in my profile", "I see a lot of GC / Hibernate / park in my profile". Use this skill any time a Java developer mentions profiling, flamegraphs, async-profiler, JFR, or wants to understand JVM performance. | `README.md`<br />`analyze`<br />`profile`<br />`scripts/analyze_collapsed.py`<br />`scripts/collect.sh`<br />`scripts/install.sh`<br />`scripts/run_profile.sh`<br />`setup` |
 | [automate-this](../skills/automate-this/SKILL.md) | Analyze a screen recording of a manual process and produce targeted, working automation scripts. Extracts frames and audio narration from video files, reconstructs the step-by-step workflow, and proposes automation at multiple complexity levels using tools already installed on the user machine. | None |
 | [autoresearch](../skills/autoresearch/SKILL.md) | Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric. | None |
 | [aws-cdk-python-setup](../skills/aws-cdk-python-setup/SKILL.md) | Setup and initialization guide for developing AWS CDK (Cloud Development Kit) applications in Python. This skill enables users to configure environment prerequisites, create new CDK projects, manage dependencies, and deploy to AWS. | None |
diff --git a/skills/async-profiler/README.md b/skills/async-profiler/README.md
new file mode 100644
index 000000000..99f89fc7f
--- /dev/null
+++ b/skills/async-profiler/README.md
@@ -0,0 +1,72 @@
+# async-profiler
+
+Install, run, and analyze async-profiler for Java — a low-overhead sampling profiler producing flamegraphs, JFR recordings, and allocation profiles.
+
+## What it does
+
+- Installs async-profiler automatically for macOS or Linux
+- Captures CPU time, heap allocations, wall-clock time, and lock contention
+- Produces interactive flamegraphs, JFR recordings, and collapsed stack traces
+- Interprets profiling output: identifies hotspots, GC pressure, lock contention, N+1 Hibernate patterns
+
+## Compatibility
+
+Requires Python 3.7+ for the analysis script. async-profiler works on macOS and Linux with a running JVM process.
+
+## Installation
+
+**GitHub Copilot CLI:**
+
+Point Copilot at the skill directory from within a session:
+```
+/skills add /path/to/async-profiler
+```
+
+Or copy manually to your personal skills directory (`~/.copilot/skills/` or `~/.agents/skills/` depending on your version):
+```bash
+cp -r async-profiler ~/.copilot/skills/
+# or
+cp -r async-profiler ~/.agents/skills/
+```
+
+**Claude Code:**
+```bash
+cp -r async-profiler ~/.claude/skills/async-profiler
+```
+
+**OpenCode:**
+```bash
+cp -r async-profiler ~/.config/opencode/skills/async-profiler
+```
+
+## Trigger phrases
+
+- "install async-profiler"
+- "capture a flamegraph"
+- "profile my Spring Boot app"
+- "heap keeps growing"
+- "what does this flamegraph mean"
+- "I see a lot of GC in my profile"
+
+## Bundled scripts
+
+| Script | Purpose |
+|---|---|
+| `scripts/install.sh` | Auto-detect platform, download and verify async-profiler |
+| `scripts/run_profile.sh` | Wrap `asprof` with defaults, timestamp output |
+| `scripts/collect.sh` | Background collection: start all-event profiling, stop and retrieve flamegraphs |
+| `scripts/analyze_collapsed.py` | Ranked self-time/inclusive-time table for `.collapsed` files |
+
+## Directory structure
+
+```
+async-profiler/
+├── SKILL.md          # Entry point — routes to sub-guides
+├── scripts/          # Bundled scripts
+├── setup/
+│   └── SKILL.md      # Installation and configuration
+├── profile/
+│   └── SKILL.md      # Running profiling sessions
+└── analyze/
+    └── SKILL.md      # Interpreting profiling output
+```
diff --git a/skills/async-profiler/SKILL.md b/skills/async-profiler/SKILL.md
new file mode 100644
index 000000000..d1c2a7f56
--- /dev/null
+++ b/skills/async-profiler/SKILL.md
@@ -0,0 +1,136 @@
+---
+name: async-profiler
+description: 'Install, run, and analyze async-profiler for Java — low-overhead sampling profiler producing flamegraphs, JFR recordings, and allocation profiles. Use for: "install async-profiler", "set up Java profiling", "Failed to open perf_events", "what JVM flags for profiling", "capture a flamegraph", "profile CPU/memory/allocations/lock contention", "profile my Spring Boot app", "generate a JFR recording", "heap keeps growing", "what does this flamegraph mean", "how do I read a flamegraph", "interpret profiling results", "open a .jfr file", "what''s causing my CPU hotspot", "wide frame in my profile", "I see a lot of GC / Hibernate / park in my profile". Use this skill any time a Java developer mentions profiling, flamegraphs, async-profiler, JFR, or wants to understand JVM performance.'
+compatibility: Requires Python 3.7+ for the analyze_collapsed.py script.
+---
+
+# async-profiler
+
+async-profiler is a production-safe, low-overhead sampling profiler for Java
+that avoids the safepoint bias of standard JVM profilers. It can capture CPU
+time, heap allocations, wall-clock time, and lock contention, and produce
+interactive flamegraphs, JFR recordings, and collapsed stack traces.
+
+## Installing this skill
+
+### IntelliJ IDEA (Junie or GitHub Copilot)
+
+Skills live in a `.claude/skills/`, `.agents/skills/`, or `.github/skills/`
+directory, either in your project repo or in your home directory.
+
+**Project-level — recommended for teams** (commit so everyone gets it):
+```bash
+# From your project root:
+mkdir -p .github/skills
+cd .github/skills
+unzip /path/to/async-profiler.skill
+git add async-profiler
+git commit -m "Add async-profiler skill"
+```
+
+**Global — personal use across all projects:**
+```bash
+mkdir -p ~/.claude/skills
+cd ~/.claude/skills
+unzip /path/to/async-profiler.skill
+```
+
+> **Note for GitHub Copilot users:** There is a known issue where the Copilot
+> JetBrains plugin does not reliably pick up skills from the global `~/.copilot/skills`
+> directory. Use the project-level `.github/skills/` location to be safe.
+
+Alternatively, install the **Agent Skills Manager** plugin from the JetBrains
+Marketplace (*Settings → Plugins → Marketplace* → "Agent Skills Manager") for
+a UI that installs skills without unzipping manually.
+
+---
+
+## Using this skill in IntelliJ IDEA
+
+### With Junie (JetBrains AI)
+
+Junie is JetBrains' native coding agent, available in the AI Chat panel.
+
+1. Open the AI Chat panel (*View → Tool Windows → AI Chat*, or the chat icon
+   in the right toolbar)
+2. In the agent dropdown at the top of the chat, select **Junie**
+3. Choose a mode:
+   - **Code mode** — Junie can run terminal commands, write files, and execute
+     the profiling scripts directly. Use this when you want it to actually run
+     `scripts/install.sh` or `scripts/run_profile.sh` for you.
+   - **Ask mode** — read-only; Junie analyzes and explains but won't touch
+     files. Use this when you want help interpreting a flamegraph or JFR file.
+4. Just ask naturally — Junie loads the skill automatically when your question
+   matches the description. You don't need to invoke it by name.
+
+Example prompts that will trigger this skill in Junie:
+- *"My Spring Boot app is using too much CPU. Help me capture a flamegraph."*
+- *"I have this JFR file — open it and tell me what's slow."*
+- *"Install async-profiler on this machine and set up the JVM flags."*
+
+In Code mode, Junie will run `scripts/install.sh`, execute `scripts/run_profile.sh`
+with the right flags, and then walk you through the results — all without
+leaving IntelliJ.
+
+### With GitHub Copilot in IntelliJ
+
+1. Enable agent mode: *Settings → GitHub Copilot → Chat → Agent* → turn on
+   **Agent mode** and **Agent Skills**
+2. Open the Copilot Chat panel and make sure the mode selector shows **Agent**
+3. Ask naturally — Copilot loads the skill when your prompt matches
+
+Example prompts:
+- *"Profile my running Java app and show me where the CPU is going."*
+- *"Analyze this collapsed stack file and tell me what's allocating the most."*
+
+GitHub Copilot's agent mode can also run the bundled scripts on your behalf —
+it will propose the terminal command and ask for confirmation before executing.
+
+### GitHub Copilot CLI
+
+```bash
+# Copilot CLI
+mkdir -p ~/.copilot/skills
+cd ~/.copilot/skills
+unzip /path/to/async-profiler.skill
+
+# Or, if your version uses ~/.agents/skills/:
+mkdir -p ~/.agents/skills
+cd ~/.agents/skills
+unzip /path/to/async-profiler.skill
+```
+
+Run `/skills list` to confirm it loaded. Then just ask naturally in the terminal.
+
+---
+
+## Bundled scripts
+
+This skill includes four ready-to-run scripts in `scripts/`:
+
+| Script | What it does |
+|---|---|
+| `scripts/install.sh` | Auto-detects platform, downloads the right binary, verifies install |
+| `scripts/run_profile.sh` | Wraps `asprof` with defaults, timestamps output, prints opening instructions |
+| `scripts/collect.sh` | Agent-friendly background collection: start all-event profiling, do other work, then stop and get all flamegraphs |
+| `scripts/analyze_collapsed.py` | Ranked self-time / inclusive-time table for `.collapsed` files, with filters |
+
+Always offer to run these scripts on the user's behalf when relevant.
+
+## How to use this skill
+
+This skill has three sub-guides. Read the one that matches what the user needs:
+
+| Situation | Read |
+|---|---|
+| User needs to install or configure async-profiler, or is hitting setup errors | `setup/SKILL.md` |
+| User wants to run a profiling session (capture flamegraph, JFR, etc.) | `profile/SKILL.md` |
+| User has profiling output and wants to understand or interpret it | `analyze/SKILL.md` |
+
+**When the conversation spans multiple phases** (e.g., the user just ran a
+profile and now wants to understand the output), read whichever sub-guide is
+most relevant to the current question. If the user needs both setup *and*
+profiling guidance in one message, read `setup/SKILL.md` first and summarize
+the setup steps before moving to `profile/SKILL.md`.
+
+Read the relevant sub-guide now before responding.
diff --git a/skills/async-profiler/analyze/SKILL.md b/skills/async-profiler/analyze/SKILL.md
new file mode 100644
index 000000000..b7f11e414
--- /dev/null
+++ b/skills/async-profiler/analyze/SKILL.md
@@ -0,0 +1,364 @@
+---
+name: async-profiler-analyze
+description: 'Interpret and analyze async-profiler output: flamegraph HTML/SVG files, JFR recordings, and collapsed stack traces. Use this skill whenever a Java developer shares profiler output or wants help understanding profiling results. Trigger for: "what does this flamegraph mean", "how do I read this JFR", "what''s causing my CPU hotspot", "interpret my profiling results", "analyze this flamegraph", "what should I look for in my profile", "the wide frame in my flamegraph is X, what does that mean", "I see a lot of GC in my profile", "my profile shows 80% in X, is that bad", or whenever someone pastes or describes profiling output. Also trigger proactively when the async-profiler-profile skill just produced output and the user seems to want to understand it.'
+compatibility: Requires Python 3.7+ for the analyze_collapsed.py script.
+---
+
+# async-profiler Output Analysis
+
+The three main output formats — flamegraph HTML, JFR recordings, and collapsed
+stacks — each tell a different story. This skill walks you through reading each
+one and turning the visual patterns into concrete action.
+
+---
+
+## Flamegraphs (HTML or SVG)
+
+Open `.html` output in any browser. It's interactive: hover to see exact sample
+counts, click to zoom into a subtree, press Escape or click "Reset Zoom" to go back.
+
+### How to read a flamegraph
+
+```
+▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ ← leaf frames (actual CPU consumers)
+       doWork()
+    processItem()
+  handleRequest()
+      run()
+▔▔▔▔▔▔▔▔▔▔▔▔▔ ← base (thread entry points)
+```
+
+- **Width = time (or allocation volume)**. A frame that's wide consumed a lot
+  of the profiled resource. This is the most important thing to look at.
+- **Height = call depth**. Taller stacks just mean more levels of method calls —
+  depth by itself isn't a problem.
+- **X-axis is NOT time**. The horizontal position has no meaning; similar frames
+  are sorted alphabetically to make identical paths merge visually.
+- **Leaf frames (top of each column)** are where execution actually spent time.
+  Wide leaf frames = actual hotspots.
+- **Intermediate frames** show the call path to the hotspot. A wide intermediate
+  frame with a narrow leaf means the cost is spread across many callees, which
+  is harder to optimize than a single wide leaf.
+
+### What to look for first
+
+1. **Wide frames near the top** — these are your primary optimization targets.
+   If `serialize()` is 40% wide at the top, start there.
+
+2. **Plateau patterns** — a wide frame that suddenly narrows just above it
+   (like a plateau). The plateau frame is spending most of its time directly
+   in itself (i.e., not calling further). Classic hotspot.
+
+3. **Tall, narrow spikes** — deep call stacks that are thin. Usually framework
+   overhead (reflection, proxies, Spring AOP) or recursive algorithms. Often
+   hard to optimize directly.
+
+4. **Unexpected runtime/framework frames** — if 30% of your CPU flamegraph is
+   in GC or JIT compilation (`[Unknown]`, `Compiler::compile`, etc.), that's a
+   signal of memory pressure or cold-start behavior, not application logic.
+
+### Color coding
+
+Colors in the default scheme encode frame type, not performance severity:
+
+| Color | Frame type |
+|---|---|
+| Green | Java methods |
+| Yellow / orange | JVM internal / native |
+| Red | Kernel frames |
+| Purple | C++ (JVM internals) |
+| Grey | Inlined frames |
+
+Don't read too much into colors beyond type classification.
+
+### CPU vs allocation vs wall-clock flamegraphs
+
+The visual grammar is identical, but what "width" means differs:
+
+- **CPU flamegraph**: width = CPU sample count = time on-CPU
+- **Allocation flamegraph**: width = bytes allocated from that call path
+- **Wall-clock flamegraph**: width = wall-clock samples = total elapsed time
+  including blocking
+
+For latency investigations, compare CPU and wall-clock side by side:
+- Frames that appear wide in wall-clock but narrow in CPU are spending time
+  *blocked* (waiting on I/O, locks, sleep) — these are candidates for async
+  refactoring or reducing external dependencies.
+- Frames wide in CPU but narrow in wall-clock are actually compute-heavy.
+
+### Common flamegraph patterns and what they mean
+
+**"My app is mostly GC"**
+Wide `GarbageCollector` or `ZGC` or `G1GC` frames in a CPU profile indicate
+the JVM is spending significant CPU collecting heap. Switch to an allocation
+profile (`-e alloc`) to find the code paths generating garbage.
+
+**"I see a lot of `Object.wait` or `LockSupport.park`"**
+These show up in wall-clock profiles as threads blocking. Look at the frames
+just below `park`/`wait` in the stack — those are the callers waiting on
+something (a queue, a lock, a CompletableFuture). That's where to investigate.
+
+**"Everything is in reflection or proxies"**
+Frames like `sun.reflect.GeneratedMethodAccessor`, Spring AOP proxies, or
+Jackson deserializers. This is usually framework overhead and often not worth
+optimizing unless it's genuinely dominant (>20%). Consider warming strategies
+or native compilation (GraalVM).
+
+**"Wide frame is in a library I don't control"**
+Look at *your* code just below it. Can you call this library less often? Can
+you cache results? Can you batch calls? The library frame tells you what's
+expensive; the frames below it tell you who's calling it.
+
+---
+
+## JFR Files (Java Flight Recorder)
+
+JFR files are richer than flamegraphs — they contain timestamped events,
+multiple event types, JVM metrics, and more. You need a viewer to explore them.
+
+### Opening JFR files in IntelliJ IDEA (recommended)
+
+IntelliJ IDEA is the richest viewer for async-profiler output and the
+recommended choice for day-to-day profiling work.
+
+**IntelliJ IDEA Ultimate — built-in profiler**
+
+Ultimate has async-profiler and JFR support built in, no plugins needed.
+
+Open a captured `.jfr` file:
+- *Run → Open Profiler Snapshot…* → select the file
+- Or drag the `.jfr` file directly onto the editor
+
+You'll see five views across the top tab bar. Start here:
+
+1. **Flame Graph** — same reading rules as HTML flamegraphs (width = time).
+   Use the search box (Ctrl/Cmd+F) to highlight all frames matching a class
+   or package. Right-click any frame to jump to source.
+
+2. **Call Tree** — hierarchical breakdown. Expand hotspots top-down to see
+   exactly which call path is responsible. The "%" column shows inclusive time.
+
+3. **Method List** — flat ranked list of methods by self-time. The fastest
+   way to answer "what is the single hottest method?" Sort by *Self* to find
+   direct CPU consumers; sort by *Total* for inclusive time.
+
+4. **Timeline** — thread activity over the profiling window. Each thread is a
+   row; colours show running vs. blocked vs. waiting state. Use this to spot
+   contention (many threads blocked at the same moment) or to correlate a
+   spike with a specific time window.
+
+5. **Events** — raw JFR event log. Useful for GC events, class loading, JIT
+   compilations, and socket I/O — things that don't show up in the flame graph.
+
+**IntelliJ IDEA Community — install the Java JFR Profiler plugin**
+
+Search the Marketplace for **"Java JFR Profiler"** (by parttimenerd). It adds
+full JFR and async-profiler support including flame graph, call tree, and
+Firefox Profiler integration:
+- *Settings → Plugins → Marketplace* → search "Java JFR Profiler" → Install
+- After restart: *Tools → Open JFR File…* or drag the file into the editor
+
+**Launching async-profiler directly from IntelliJ (Ultimate)**
+
+You can skip the terminal entirely and profile a run configuration from inside
+the IDE:
+- Open *Run → Edit Configurations…*
+- Select your run configuration → switch to the **Profiler** tab
+- Choose **Async Profiler** from the dropdown
+- Click the profile button (▶ with the flame icon) instead of the normal run
+- IntelliJ attaches async-profiler automatically and opens results when done
+
+Configure which events to capture at *Settings → Build, Execution, Deployment
+→ Java Profiler*.
+
+**Navigating from flamegraph frame → source**
+
+In IntelliJ's flamegraph view, right-clicking any frame shows:
+- *Navigate to Source* — jumps directly to the method in the editor
+- *Find Usages* — shows callers
+- *Filter* — narrows the flame to stacks containing this frame
+
+This makes it much faster to go from "this frame is hot" to "here's the code"
+compared to any other viewer.
+
+---
+
+### Other viewers
+
+**JDK Mission Control (JMC)**
+- Download from https://adoptium.net/jmc/
+- Strength: *Automated Analysis Report* — runs heuristics and flags findings
+  with explanations. Good for a second opinion or sharing with someone who
+  doesn't have IntelliJ.
+- *File → Open File* or drag-and-drop
+
+**Command-line `jfr` utility** (ships with JDK 14+)
+```bash
+jfr summary recording.jfr                              # what event types are present
+jfr print --events jdk.ExecutionSample recording.jfr   # raw CPU samples
+```
+
+**`jfrconv`** (bundled with async-profiler — convert to flamegraph HTML)
+```bash
+jfrconv recording.jfr flamegraph.html         # full flamegraph
+jfrconv --alloc recording.jfr alloc.html      # allocation-only flamegraph
+jfrconv recording.jfr collapsed.txt           # collapsed stacks for scripting
+```
+
+### What to examine in JMC / IntelliJ
+
+After opening a JFR file, prioritize these views:
+
+1. **Automated Analysis** (JMC only) — runs heuristics and flags findings
+   automatically. Always start here.
+
+2. **Method Profiling** → flame graph view of CPU samples
+
+3. **Memory** → allocation sites, heap occupancy over time, GC events
+
+4. **Threads** → thread states over time (runnable vs. blocked vs. waiting)
+   — useful for spotting lock contention
+
+5. **Lock Instances** → which monitors had the most contention
+
+6. **I/O** → socket and file read/write events with durations
+
+### Reading JFR from a `--all` combined profile
+
+When you capture with `--all`, the JFR contains multiple event streams. In JMC,
+each event type appears as a separate section. Compare:
+- CPU samples vs. wall-clock: identifies blocking vs. compute-bound time
+- Allocation events: find garbage-producing call paths
+- Lock events: find synchronization bottlenecks
+
+---
+
+## Collapsed Stacks
+
+Collapsed stack files are plain text in the format:
+```
+com/example/App.main;com/example/Service.process;java/util/HashMap.get 42
+com/example/App.main;com/example/Service.process;java/util/HashMap.put 18
+```
+
+Each line is a semicolon-separated call stack (bottom to top) followed by
+a sample count. They're the input format for the original
+[FlameGraph scripts](https://github.com/brendangregg/FlameGraph) and useful
+for programmatic analysis.
+
+### Quick analysis with the bundled script
+
+`scripts/analyze_collapsed.py` produces a ranked table of self-time and
+inclusive-time frames, with percentage bars and filter support:
+
+```bash
+# Top 20 self-time and inclusive frames
+python3 scripts/analyze_collapsed.py profile.collapsed
+
+# Filter to your own code only
+python3 scripts/analyze_collapsed.py profile.collapsed --grep 'com/yourcompany'
+
+# Group by package instead of method
+python3 scripts/analyze_collapsed.py profile.collapsed --packages
+
+# Exclude framework noise
+python3 scripts/analyze_collapsed.py profile.collapsed --exclude 'sun/reflect|\$\$Lambda'
+
+# Top 40 self-time frames as CSV (for further analysis)
+python3 scripts/analyze_collapsed.py profile.collapsed --self-time --top 40 --csv
+```
+
+### Manual analysis with grep/awk
+
+```bash
+# How much time in any HashMap operation?
+awk '{if ($1 ~ /HashMap/) total += $NF} END {print total}' profile.collapsed
+
+# Everything involving serialization
+grep -i "serial\|jackson\|json" profile.collapsed | awk '{sum+=$NF} END{print sum}'
+```
+
+### Convert collapsed → flamegraph
+
+```bash
+# Using async-profiler's jfrconv
+jfrconv collapsed.txt flamegraph.html
+
+# Or using the original FlameGraph perl script (if installed)
+flamegraph.pl profile.collapsed > flamegraph.svg
+```
+
+---
+
+## Interpreting allocation profiles
+
+Allocation flamegraphs answer "where is memory being created?" not "what's
+alive on the heap?" (for live object analysis, use `-e live` or look at JFR
+heap snapshots).
+
+Key things to look for:
+
+- **`byte[]` or `char[]` at the top** — string manipulation, serialization, or
+  logging are common culprits. Look at the callers.
+- **`Object[]` allocations** — often from collections growing (`ArrayList.grow`,
+  `HashMap.resize`). Pre-size collections if you know the expected cardinality.
+- **Allocation spikes in request-handling code** — objects created per-request
+  that could be pooled or cached.
+- **Framework allocations** — ORM, serialization libraries often allocate heavily.
+  Consider caching deserialized objects or using streaming APIs.
+
+---
+
+## Interpreting lock / wall-clock profiles
+
+When wall-clock shows threads blocked:
+
+- **`LockSupport.park` + `AbstractQueuedSynchronizer`** — JUC locks
+  (`ReentrantLock`, semaphores, etc.). Look two frames up to see which lock.
+- **`Object.wait`** — classic `synchronized` monitors. The caller is your target.
+- **`sun.nio.ch.EPoll.wait` or similar** — network I/O wait. Thread is blocked
+  on the network. Is connection pool exhausted? Is a remote service slow?
+- **`Thread.sleep`** — deliberate sleep (scheduled polling, backoff, etc.).
+  Usually expected, but verify the intervals are appropriate.
+
+---
+
+## Worked example: reading a flamegraph
+
+Suppose you see this pattern in a CPU flamegraph:
+
+```
+processOrder()  ← wide frame (45% of samples)
+  |
+  ├── ProductService.loadProduct()  ← 30% (wide)
+  │    └── HibernateSession.find()  ← 30% (leaf)
+  │
+  └── TaxCalculator.calculate()  ← 10%
+       └── BigDecimal.multiply()  ← 10% (leaf)
+```
+
+**Diagnosis:**
+- 30% of CPU in `HibernateSession.find()` — likely N+1 query problem.
+  Each `processOrder()` call loads a product via Hibernate one at a time.
+- 10% in `BigDecimal.multiply()` — tax calculations using high-precision
+  arithmetic. Often fine, but if this is called thousands of times per second,
+  consider pre-computing or caching tax rates.
+
+**Next steps:**
+1. Check if `loadProduct()` could be batched or pre-fetched (JPA `@BatchSize`,
+   fetch joins, or a bulk load before the loop).
+2. Profile with `-e alloc` to see if Hibernate is also creating a lot of garbage.
+3. If the fix is non-trivial, capture a JFR (`--all`) to get a fuller picture
+   before committing to an approach.
+
+---
+
+## When to reach for each output format
+
+| Situation | Best format |
+|---|---|
+| Quick overview, share with team | HTML flamegraph |
+| Need timestamped events, JVM metrics | JFR + JMC/IntelliJ |
+| Scripted / automated analysis | Collapsed stacks |
+| Multi-event combined analysis | JFR with `--all` |
+| Share with someone without a viewer | HTML flamegraph |
diff --git a/skills/async-profiler/profile/SKILL.md b/skills/async-profiler/profile/SKILL.md
new file mode 100644
index 000000000..0a83dc565
--- /dev/null
+++ b/skills/async-profiler/profile/SKILL.md
@@ -0,0 +1,414 @@
+---
+name: async-profiler-profile
+description: 'Run async-profiler against a live JVM process to capture CPU, memory allocation, wall-clock, or lock contention profiles and generate flamegraphs or JFR recordings. Use this skill whenever a Java developer wants to start a profiling session, capture a flamegraph, find CPU hotspots, identify memory allocation pressure, measure thread blocking or lock contention, or asks: "how do I profile my running Java app", "capture a flamegraph", "find what''s using CPU", "profile heap allocations", "measure lock contention", "generate a JFR recording", "profile for N seconds", "what''s slow in my app". Assumes async-profiler is already installed (see async-profiler-setup skill if not).'
+---
+
+# async-profiler — Running Profiles
+
+## Agent-driven background profiling
+
+Use `scripts/collect.sh` when you need to profile while simultaneously
+reproducing the workload — the standard blocking `run_profile.sh` would make
+that impossible because it holds the terminal for the full duration.
+
+`collect.sh` captures all event types (CPU, allocation, wall-clock, lock) in
+a single JFR recording and produces four separate flamegraphs when done.
+
+### When to use `collect.sh` vs `run_profile.sh`
+
+| Scenario | Use |
+|---|---|
+| You need the terminal free to run load, tests, or other commands during the capture | `collect.sh start` / `collect.sh stop` |
+| Fixed duration, you can background the call | `collect.sh timed -d <seconds> <PID> &` |
+| Simple timed capture, terminal can block | `run_profile.sh --comprehensive` |
+
+### `start` / `stop` workflow — full agent control
+
+```bash
+# 1. Find the JVM process
+jps -l
+
+# 2. Start profiling (returns immediately, saves session state)
+bash scripts/collect.sh start <PID>
+
+# 3. Reproduce the problem — run load tests, make requests, etc.
+#    The profiler is attached and collecting.
+
+# 4. Stop profiling and generate all flamegraphs
+bash scripts/collect.sh stop <PID>
+```
+
+> ⚠️ **macOS: `asprof stop -f <path>` silently ignores the output path.**
+> The JFR is written to `/var/folders/<hash>/T/<timestamp>_<pid>/<timestamp>.jfr`
+> regardless of the `-f` argument. `collect.sh` handles this automatically by
+> creating a sentinel file at `start` time and using `find -newer` to locate the
+> JFR after `stop`. If you call `asprof stop` directly, find the file with:
+> ```bash
+> find /var/folders -maxdepth 8 -name "*.jfr" 2>/dev/null
+> ```
+
+Output is written to `profile-<pid>-<timestamp>/` in the current directory.
+The directory contains:
+- `combined.jfr` — the raw multi-event recording
+- `profile-cpu.html`, `profile-alloc.html`, `profile-wall.html`, `profile-lock.html` — interactive flamegraphs
+
+### `timed` workflow — fixed-duration background capture
+
+Use this when you know exactly how long the workload takes:
+
+```bash
+# Start a 60-second capture in the background
+bash scripts/collect.sh timed -d 60 <PID> &
+PROF_PID=$!
+
+# Run your workload here while profiling is active
+./run-load-test.sh
+
+# Wait for the profiler to finish (if workload finished faster)
+wait $PROF_PID
+```
+
+`timed` blocks for the specified duration, so run it in the background with `&`.
+
+### After collecting
+
+Once `stop` or `timed` completes, offer to analyze the results immediately.
+Read `analyze/SKILL.md` before interpreting the flamegraphs. Each `.html` file
+can be opened directly in a browser; pass `.collapsed` files to
+`scripts/analyze_collapsed.py` for a ranked self-time table.
+
+---
+
+## IntelliJ IDEA Ultimate — no terminal needed
+
+If the process you want to profile was launched from IntelliJ, the fastest path
+is to use the built-in integration:
+
+1. Click the **flame icon** next to the run/debug buttons (▶🔥), or use
+   *Run → Profile '[configuration name]'*
+2. IntelliJ attaches async-profiler automatically and opens results when done
+3. To choose which events to capture (CPU, allocation, wall-clock):
+   *Settings → Build, Execution, Deployment → Java Profiler*
+
+Results open directly in IntelliJ's viewer — see `analyze/SKILL.md` for how to
+navigate the flame graph, call tree, and timeline tabs.
+
+Use the terminal approach below when you need to profile a process that wasn't
+started from IntelliJ (a remote server, a running Docker container, a
+production JVM, etc.).
+
+---
+
+## Always start with `--all`
+
+**`asprof start --all` records CPU, allocation, wall-clock, and lock contention
+simultaneously in a single JFR file.** There is no meaningful overhead penalty
+for capturing all events together compared to capturing just one. You then split
+the JFR into separate flamegraphs with `jfrconv` after the fact.
+
+**Never run separate captures for each event type.** Each capture requires
+reproducing the workload, which is disruptive and often impossible for realistic
+or intermittent problems. Capture once, analyze everything.
+
+```bash
+# Direct asprof — capture all events, produce a single JFR
+jps -l                                    # find your PID
+asprof start --all <PID>                  # attach, collect everything
+# ... reproduce the problem ...
+asprof stop <PID>                         # stop; JFR written to disk
+                                          # ⚠️  macOS: see note below on output path
+
+# Then split into flamegraphs:
+jfrconv --cpu   combined.jfr cpu.html
+jfrconv --alloc combined.jfr alloc.html
+jfrconv --wall  combined.jfr wall.html
+jfrconv --lock  combined.jfr lock.html
+```
+
+For agent-driven work, use `collect.sh` instead — it handles the macOS output
+path bug, session state, and the JFR split automatically:
+
+```bash
+bash scripts/collect.sh start <PID>
+# ... reproduce the problem ...
+bash scripts/collect.sh stop <PID>
+# → outputs cpu.html, alloc.html, wall.html, lock.html
+```
+
+---
+
+## Quick start (terminal / remote processes)
+
+### Using the bundled script
+
+`scripts/run_profile.sh` wraps `asprof` with sensible defaults and auto-timestamped
+output files.
+
+**Default — capture all events:**
+```bash
+# One 30s capture → four separate flamegraphs generated in parallel
+bash scripts/run_profile.sh --comprehensive -d 30 <PID>
+```
+This runs a single `--all` JFR capture, then uses `jfrconv` in parallel to
+split it into separate CPU, allocation, wall-clock, and lock flamegraphs.
+On macOS all four open in the browser automatically.
+
+**When you already know which event type to focus on:**
+```bash
+# Allocation only (heap pressure / GC churn)
+bash scripts/run_profile.sh -e alloc -d 60 <PID>
+
+# Wall-clock only (latency / blocking / I/O)
+bash scripts/run_profile.sh -e wall <PID>
+
+# Target by app name instead of PID
+bash scripts/run_profile.sh MyApplication
+```
+
+---
+
+## Choose the right flamegraph to read
+
+`--all` records everything — use `jfrconv` to pick the view that matches your
+symptom:
+
+| Symptom | View to read | `jfrconv` flag |
+|---|---|---|
+| High CPU, slow throughput | `--cpu` | CPU time by call stack |
+| High GC pressure / heap churn | `--alloc` | Where objects are being allocated |
+| Threads are blocked / latency spikes | `--wall` | All threads regardless of state |
+| Slow synchronized methods | `--lock` | Java monitor contention time |
+
+All four are captured by `asprof start --all` — just open the flamegraph that
+matches your symptom. When in doubt, read the **wall-clock** view first: it
+shows blocked and sleeping threads that CPU profiling misses entirely.
+
+---
+
+## Common profiling scenarios
+
+### The standard approach — capture all, read what you need
+
+```bash
+# Capture everything in one session
+asprof start --all <PID>
+# ... reproduce the problem ...
+asprof stop <PID>
+# ⚠️  macOS: see output path note above — use collect.sh to handle this automatically
+
+# Generate whichever flamegraph(s) you need:
+jfrconv --cpu   combined.jfr cpu.html     # CPU hotspots
+jfrconv --alloc combined.jfr alloc.html   # Garbage / allocation pressure
+jfrconv --wall  combined.jfr wall.html    # Latency / blocking / I/O
+jfrconv --lock  combined.jfr lock.html    # Lock contention
+```
+
+Open any `.html` in a browser. Wide frames at the top are your hotspots.
+
+**What each view shows:**
+- **CPU** — where the CPU is spending time; misses sleeping/blocked threads
+- **Alloc** — which call stacks produce the most heap; wide = large allocations
+- **Wall** — all threads regardless of state; best for latency/I/O investigations
+- **Lock** — time spent *waiting* to acquire monitors (not holding them)
+
+### Fixed-duration capture (blocks terminal)
+
+```bash
+# All events, 60 seconds, output to JFR
+asprof -d 60 --all -f combined.jfr <PID>
+```
+
+Use `collect.sh timed` or background with `&` if you need the terminal free.
+
+---
+
+## Key flags to know
+
+### Duration and output
+
+```bash
+-d N          # Profile for N seconds (e.g., -d 30)
+-f FILE       # Output file; extension sets format: .html, .jfr, .txt, .collapsed
+```
+
+File extension drives format automatically:
+- `.html` → interactive flamegraph (recommended for sharing)
+- `.jfr` → JFR recording (for IntelliJ / JDK Mission Control)
+- `.collapsed` → raw collapsed stacks (for FlameGraph scripts)
+- `.txt` → plain-text summary
+
+### Targeting
+
+```bash
+# Attach to a specific PID
+asprof -d 30 -f out.html 12345
+
+# Auto-detect if only one JVM is running
+asprof -d 30 -f out.html jps
+
+# Target by application name
+asprof -d 30 -f out.html MyApplication
+```
+
+### Thread-level breakdown
+
+```bash
+# Separate flame per thread (useful for pinpointing which thread is the culprit)
+asprof -d 30 -t -f out.html <PID>
+```
+
+### Sampling interval
+
+```bash
+# Sample every 1ms (default is ~10ms; lower = more detail but higher overhead)
+-i 1ms
+
+# Sample every N nanoseconds
+-i 500000    # 0.5ms
+```
+
+Note: on macOS with itimer, the minimum effective interval is ~10ms regardless
+of what you specify.
+
+### Stack depth and filtering
+
+```bash
+-j 512        # Max stack depth (default 2048; reduce if stacks are very deep)
+
+# Include only frames matching a pattern
+-I 'com/mycompany/*'
+
+# Exclude frames matching a pattern
+-X 'sun/reflect/*'
+```
+
+### Long-running or manual start/stop
+
+Sometimes you want to start profiling, do a specific action, then stop rather
+than time-boxing it:
+
+```bash
+# Start profiling (runs indefinitely)
+asprof start -e cpu <PID>
+
+# ... do your thing ...
+
+# Stop and write output
+asprof stop -f profile.html <PID>
+
+# Or dump a snapshot without stopping (live sampling continues)
+asprof dump -f snapshot.html <PID>
+```
+
+> ⚠️ **macOS only:** `asprof stop -f <path>` silently ignores the `-f` path.
+> Use `bash scripts/collect.sh start/stop` instead — it handles this automatically.
+> If calling `asprof` directly, find the output with:
+> ```bash
+> find /var/folders -maxdepth 8 -name "*.jfr" 2>/dev/null
+> ```
+
+---
+
+## Continuous profiling
+
+For finding intermittent regressions, profile in a loop and dump results
+periodically:
+
+```bash
+# Dump a new flamegraph every 60 seconds, cycling indefinitely
+# %t in the filename is replaced with a timestamp
+asprof -e cpu --loop 60s -f /tmp/profile-%t.html <PID>
+```
+
+---
+
+## Attach as Java agent (no dynamic attach)
+
+If the JVM doesn't allow dynamic attach (common in locked-down environments),
+use the agent at startup:
+
+```bash
+java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,interval=1ms,file=output.html,duration=60 \
+     -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints \
+     -jar myapp.jar
+```
+
+Agent options are comma-separated after the `=`. Duration is in seconds.
+
+---
+
+## macOS-specific notes
+
+- Default CPU engine is **itimer** — works without elevated privileges
+- No kernel frame collection (platform limitation, not a bug)
+- itimer has a known bias toward system calls; wall-clock (`--wall`) is often
+  more representative for latency investigations on macOS
+- Minimum sampling interval ~10ms (kernel timer resolution)
+
+These limitations don't make macOS profiling useless — CPU and wall-clock
+flamegraphs are still highly actionable for application-level code.
+
+---
+
+## Overhead and production use
+
+async-profiler is designed to be low-overhead:
+
+- **CPU profiling**: ~1-3% overhead at default intervals
+- **Allocation profiling**: ~1-5% depending on allocation rate (uses TLAB sampling)
+- **Wall-clock**: ~1% overhead (timer-based, not instruction-based)
+
+It's reasonable to run brief (30-60s) profiles in production. For longer sessions,
+use the `--memlimit` flag to cap memory usage:
+
+```bash
+asprof -d 300 --memlimit 256m -f profile.html <PID>
+```
+
+---
+
+## jfrconv syntax
+
+Convert a JFR recording to flamegraphs:
+
+```bash
+jfrconv --cpu   combined.jfr cpu.html
+jfrconv --alloc combined.jfr alloc.html
+jfrconv --lock  combined.jfr lock.html
+jfrconv --wall  combined.jfr wall.html
+```
+
+> ⚠️ The event flag (`--cpu`, `--alloc`, etc.) must come **before** the input
+> file. The form `jfrconv input.jfr --event cpu output.html` does not work.
+
+---
+
+## Session layout (recommended)
+
+Store all output in one versioned directory per session:
+
+```
+profiling/
+  session-1/
+    combined.jfr
+    profile-cpu.html
+    profile-alloc.html
+    profile-wall.html
+    profile-lock.html
+    findings.md
+```
+
+---
+
+## After profiling: always offer to analyze
+
+Once a profile capture completes, **always offer to analyze the results
+immediately** — don't wait for the user to ask. Say something like:
+
+> "The profile is saved at `profile-all-20250409-143201-cpu.html`. Want me to
+> analyze it and identify the bottlenecks?"
+
+Then read `analyze/SKILL.md` and interpret the output. If it's a JFR file,
+offer to run `jfrconv` to extract flamegraphs first. If it's collapsed stacks,
+offer to run `scripts/analyze_collapsed.py`. The user has already done the hard
+part (reproducing the problem) — close the loop for them.
diff --git a/skills/async-profiler/scripts/analyze_collapsed.py b/skills/async-profiler/scripts/analyze_collapsed.py
new file mode 100644
index 000000000..221636ef6
--- /dev/null
+++ b/skills/async-profiler/scripts/analyze_collapsed.py
@@ -0,0 +1,243 @@
+#!/usr/bin/env python3
+"""
+analyze_collapsed.py — Quick analysis of async-profiler collapsed stack output.
+
+Collapsed stack format: each line is a semicolon-separated call stack
+(bottom frame first) followed by a sample count:
+  com/example/App.main;com/example/Service.process;java/util/HashMap.get 42
+
+Usage:
+  python analyze_collapsed.py <profile.collapsed> [options]
+
+Options:
+  --top N           Show top N frames (default: 20)
+  --grep PATTERN    Filter: only include stacks matching PATTERN
+  --exclude PATTERN Filter: exclude stacks matching PATTERN
+  --packages        Group results by top-level package instead of method
+  --self-time       Show only leaf (self-time) frames, not inclusive time
+  --csv             Output as CSV instead of table
+"""
+
+from __future__ import annotations
+
+import sys
+import re
+from collections import defaultdict
+from pathlib import Path
+
+
+def parse_collapsed(path: str) -> list[tuple[list[str], int]]:
+    """Parse a collapsed stack file into (frames, count) tuples."""
+    stacks = []
+    with open(path, "r", encoding="utf-8", errors="replace") as f:
+        for lineno, line in enumerate(f, 1):
+            line = line.strip()
+            if not line or line.startswith("#"):
+                continue
+            # Last token is the count; everything before is the stack
+            parts = line.rsplit(" ", 1)
+            if len(parts) != 2:
+                continue
+            try:
+                count = int(parts[1])
+            except ValueError:
+                continue
+            frames = parts[0].split(";")
+            stacks.append((frames, count))
+    return stacks
+
+
+def top_leaf_frames(stacks, n=20, grep=None, exclude=None):
+    """Count samples where each frame is the leaf (top of stack = actual work)."""
+    counts = defaultdict(int)
+    for frames, count in stacks:
+        if not frames:
+            continue
+        stack_str = ";".join(frames)
+        if grep and not re.search(grep, stack_str, re.IGNORECASE):
+            continue
+        if exclude and re.search(exclude, stack_str, re.IGNORECASE):
+            continue
+        leaf = frames[-1]
+        counts[leaf] += count
+    return sorted(counts.items(), key=lambda x: x[1], reverse=True)[:n]
+
+
+def top_inclusive_frames(stacks, n=20, grep=None, exclude=None):
+    """Count samples where each frame appears anywhere in the stack (inclusive time)."""
+    counts = defaultdict(int)
+    for frames, count in stacks:
+        stack_str = ";".join(frames)
+        if grep and not re.search(grep, stack_str, re.IGNORECASE):
+            continue
+        if exclude and re.search(exclude, stack_str, re.IGNORECASE):
+            continue
+        seen = set()
+        for frame in frames:
+            if frame not in seen:
+                counts[frame] += count
+                seen.add(frame)
+    return sorted(counts.items(), key=lambda x: x[1], reverse=True)[:n]
+
+
+def top_packages(stacks, n=20, grep=None, exclude=None):
+    """Group inclusive time by top-level Java package."""
+    counts = defaultdict(int)
+    for frames, count in stacks:
+        stack_str = ";".join(frames)
+        if grep and not re.search(grep, stack_str, re.IGNORECASE):
+            continue
+        if exclude and re.search(exclude, stack_str, re.IGNORECASE):
+            continue
+        seen_pkgs = set()
+        for frame in frames:
+            # Extract package: everything up to the last '/' before the class name
+            # e.g. "com/example/Service.process" → "com/example"
+            # e.g. "[vmlinux]" → "[kernel]"
+            if frame.startswith("["):
+                pkg = frame  # kernel / JVM internal frame
+            elif "/" in frame:
+                pkg = frame.rsplit("/", 1)[0].replace("/", ".")
+            elif "." in frame:
+                pkg = frame.rsplit(".", 1)[0]
+            else:
+                pkg = frame
+            if pkg not in seen_pkgs:
+                counts[pkg] += count
+                seen_pkgs.add(pkg)
+    return sorted(counts.items(), key=lambda x: x[1], reverse=True)[:n]
+
+
+def print_table(rows, total, header_left, header_right="Samples", csv_mode=False):
+    if csv_mode:
+        print(f"{header_left},{header_right},Pct")
+        for name, count in rows:
+            pct = 100.0 * count / total if total else 0
+            print(f"{name},{count},{pct:.1f}")
+        return
+
+    if not rows:
+        print("  (no data)")
+        return
+
+    max_name = max(len(r[0]) for r in rows)
+    max_name = max(max_name, len(header_left))
+    col_w = min(max_name, 80)
+
+    bar_total = rows[0][1] if rows else 1
+    print(f"  {'─' * (col_w + 32)}")
+    print(f"  {header_left:<{col_w}}  {header_right:>8}  {'%':>6}  {'bar'}")
+    print(f"  {'─' * (col_w + 32)}")
+
+    for name, count in rows:
+        pct = 100.0 * count / total if total else 0
+        bar_len = int(30 * count / bar_total) if bar_total else 0
+        bar = "█" * bar_len
+        display = name if len(name) <= col_w else "…" + name[-(col_w - 1) :]
+        print(f"  {display:<{col_w}}  {count:>8,}  {pct:>5.1f}%  {bar}")
+
+    print(f"  {'─' * (col_w + 32)}")
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description="Analyze async-profiler collapsed stack output",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument("file", help="Path to .collapsed stack file")
+    parser.add_argument(
+        "--top", type=int, default=20, help="Number of top frames to show"
+    )
+    parser.add_argument(
+        "--grep", metavar="PATTERN", help="Only include stacks matching this regex"
+    )
+    parser.add_argument(
+        "--exclude", metavar="PATTERN", help="Exclude stacks matching this regex"
+    )
+    parser.add_argument(
+        "--packages", action="store_true", help="Group by package instead of method"
+    )
+    parser.add_argument(
+        "--self-time",
+        action="store_true",
+        dest="self_time",
+        help="Show only leaf frames (self-time), not inclusive",
+    )
+    parser.add_argument("--csv", action="store_true", help="Output as CSV")
+    args = parser.parse_args()
+
+    path = args.file
+    if not Path(path).exists():
+        print(f"❌ File not found: {path}", file=sys.stderr)
+        sys.exit(1)
+
+    print("\n📊 async-profiler collapsed stack analysis")
+    print(f"   File: {path}\n")
+
+    stacks = parse_collapsed(path)
+    if not stacks:
+        print("❌ No stack data found. Is this a valid .collapsed file?")
+        sys.exit(1)
+
+    total_samples = sum(c for _, c in stacks)
+    total_stacks = len(stacks)
+
+    filters = ""
+    if args.grep:
+        filters += f"  grep={args.grep}"
+    if args.exclude:
+        filters += f"  exclude={args.exclude}"
+    if filters:
+        # count how many survive the filter
+        surviving = sum(
+            c
+            for frames, c in stacks
+            if (not args.grep or re.search(args.grep, ";".join(frames), re.IGNORECASE))
+            and (
+                not args.exclude
+                or not re.search(args.exclude, ";".join(frames), re.IGNORECASE)
+            )
+        )
+        matching_pct = 0.0 if total_samples == 0 else 100 * surviving / total_samples
+        print(f"  Filters applied:{filters}")
+        print(
+            f"  Matching samples: {surviving:,} / {total_samples:,} "
+            f"({matching_pct:.1f}%)\n"
+        )
+
+    print(f"  Total samples : {total_samples:,}")
+    print(f"  Unique stacks : {total_stacks:,}\n")
+
+    if args.packages:
+        rows = top_packages(stacks, args.top, args.grep, args.exclude)
+        print(f"  Top {args.top} packages by inclusive time:\n")
+        print_table(rows, total_samples, "Package", csv_mode=args.csv)
+    elif args.self_time:
+        rows = top_leaf_frames(stacks, args.top, args.grep, args.exclude)
+        print(f"  Top {args.top} methods by self-time (leaf frames):\n")
+        print_table(rows, total_samples, "Method (leaf / self-time)", csv_mode=args.csv)
+    else:
+        # Default: show both self-time and inclusive for context
+        leaf_rows = top_leaf_frames(stacks, args.top, args.grep, args.exclude)
+        incl_rows = top_inclusive_frames(stacks, args.top, args.grep, args.exclude)
+
+        print(f"  Top {args.top} by self-time (leaf frames — actual CPU consumers):\n")
+        print_table(leaf_rows, total_samples, "Method (self-time)", csv_mode=args.csv)
+        print()
+        print(f"  Top {args.top} by inclusive time (appears anywhere in stack):\n")
+        print_table(incl_rows, total_samples, "Method (inclusive)", csv_mode=args.csv)
+
+    print()
+    print("  Tips:")
+    print("  • High self-time → direct optimization target")
+    print("  • High inclusive but low self-time → dispatcher/framework overhead")
+    print("  • Filter to your code: --grep 'com/yourcompany'")
+    print("  • Exclude noise:      --exclude 'sun/reflect|\\$\\$Lambda'")
+    print("  • Group by package:   --packages")
+    print()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/skills/async-profiler/scripts/collect.sh b/skills/async-profiler/scripts/collect.sh
new file mode 100755
index 000000000..cb5e40fba
--- /dev/null
+++ b/skills/async-profiler/scripts/collect.sh
@@ -0,0 +1,364 @@
+#!/usr/bin/env bash
+# collect.sh — Agent-friendly async-profiler background collection.
+#
+# Designed for coding agents that need to start profiling without blocking
+# so they can reproduce the problem, run load, or do other work while data
+# is being collected.
+#
+# Usage:
+#   bash scripts/collect.sh start <PID|app-name> [--asprof PATH]
+#   bash scripts/collect.sh stop  <PID|app-name> [--asprof PATH]
+#   bash scripts/collect.sh timed [-d N] <PID|app-name> [--asprof PATH]
+#
+# Subcommands:
+#   start   Attach asprof and begin recording all events; returns immediately.
+#           Session state is saved in $XDG_RUNTIME_DIR when available, otherwise
+#           under /tmp, so 'stop' knows where to write output.
+#   stop    Stop the active session, split the JFR into four per-event flamegraphs
+#           in parallel (cpu, alloc, wall, lock), then print paths to all outputs.
+#   timed   Fixed-duration all-event capture that blocks for the duration.
+#           Run with & to let the agent continue working; then: wait $PROF_PID
+#
+# Agent workflow — start/stop (full control):
+#   bash scripts/collect.sh start 12345
+#   # ... reproduce the problem, trigger load, wait for requests, etc. ...
+#   bash scripts/collect.sh stop 12345
+#
+# Agent workflow — timed background:
+#   bash scripts/collect.sh timed -d 30 12345 &
+#   PROF_PID=$!
+#   # ... trigger load while profiling runs ...
+#   wait $PROF_PID
+#
+# Output layout:
+#   profile-<target>-<timestamp>/
+#     combined.jfr          — multi-event JFR (open in IntelliJ or JMC)
+#     profile-cpu.html      — CPU flamegraph
+#     profile-alloc.html    — allocation flamegraph
+#     profile-wall.html     — wall-clock flamegraph
+#     profile-lock.html     — lock contention flamegraph
+
+set -euo pipefail
+
+# ── Parse subcommand ──────────────────────────────────────────────────────────
+if [[ $# -eq 0 ]]; then
+    sed -n '2,35p' "$0" | grep '^#' | sed 's/^# \?//'
+    exit 0
+fi
+
+SUBCMD="$1"; shift
+
+# ── Parse options ─────────────────────────────────────────────────────────────
+DURATION=30
+TARGET=""
+ASPROF_ARG=""
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -d|--duration) [[ $# -ge 2 ]] || { echo "❌ Missing value for $1" >&2; exit 1; }; DURATION="$2"; shift 2 ;;
+        --asprof)      [[ $# -ge 2 ]] || { echo "❌ Missing value for $1" >&2; exit 1; }; ASPROF_ARG="$2"; shift 2 ;;
+        -h|--help)
+            sed -n '2,/^[^#]/p' "$0" | grep '^#' | sed 's/^# \?//'
+            exit 0
+            ;;
+        -*)
+            echo "❌ Unknown option: $1" >&2
+            exit 1
+            ;;
+        *)
+            TARGET="$1"; shift ;;
+    esac
+done
+
+if [[ -z "$TARGET" && "$SUBCMD" != "help" ]]; then
+    echo "❌ No target specified. Provide a PID or app name." >&2
+    echo "   List Java processes: jps -l" >&2
+    exit 1
+fi
+
+# ── Helpers ───────────────────────────────────────────────────────────────────
+locate_asprof() {
+    local asprof=""
+    if [[ -n "$ASPROF_ARG" ]]; then
+        asprof="$ASPROF_ARG"
+    elif command -v asprof &>/dev/null; then
+        asprof="$(command -v asprof)"
+    else
+        for candidate in \
+            "$HOME/async-profiler-4.3/bin/asprof" \
+            "$HOME/async-profiler/bin/asprof" \
+            "/opt/async-profiler/bin/asprof" \
+            "/usr/local/bin/asprof"
+        do
+            if [[ -x "$candidate" ]]; then
+                asprof="$candidate"
+                break
+            fi
+        done
+    fi
+    if [[ -z "$asprof" ]]; then
+        echo "❌ asprof not found. Install with: bash scripts/install.sh" >&2
+        exit 1
+    fi
+    echo "$asprof"
+}
+
+locate_jfrconv() {
+    local asprof="$1"
+    if command -v jfrconv &>/dev/null; then
+        command -v jfrconv
+    elif [[ -x "$(dirname "$asprof")/jfrconv" ]]; then
+        echo "$(dirname "$asprof")/jfrconv"
+    else
+        echo ""
+    fi
+}
+
+# Session state file — stores output path and asprof path between start/stop.
+session_file() {
+    local safe uid state_dir
+    safe="${TARGET//[^a-zA-Z0-9_-]/_}"
+    uid="$(id -u)"
+
+    if [[ -n "${XDG_RUNTIME_DIR:-}" && -d "${XDG_RUNTIME_DIR}" && -w "${XDG_RUNTIME_DIR}" ]]; then
+        state_dir="${XDG_RUNTIME_DIR}"
+    else
+        state_dir="/tmp"
+    fi
+
+    echo "${state_dir}/asprof-session-${uid}-${safe}"
+}
+
+split_jfr() {
+    local jfrconv="$1"
+    local jfr_path="$2"
+    local base="$3"
+
+    local cpu_html="${base}-cpu.html"
+    local alloc_html="${base}-alloc.html"
+    local wall_html="${base}-wall.html"
+    local lock_html="${base}-lock.html"
+
+    echo "Splitting JFR into per-event flamegraphs in parallel..."
+    # jfrconv: event flag must come FIRST, before the input file
+    "$jfrconv" --cpu   "$jfr_path" "$cpu_html"   &
+    local pid_cpu=$!
+    "$jfrconv" --alloc "$jfr_path" "$alloc_html" &
+    local pid_alloc=$!
+    "$jfrconv" --wall  "$jfr_path" "$wall_html"  &
+    local pid_wall=$!
+    "$jfrconv" --lock  "$jfr_path" "$lock_html"  &
+    local pid_lock=$!
+    local wait_failed=0
+    local _pid _label
+    for _pid in "$pid_cpu" "$pid_alloc" "$pid_wall" "$pid_lock"; do
+        case "$_pid" in
+            "$pid_cpu")   _label="cpu" ;;
+            "$pid_alloc") _label="alloc" ;;
+            "$pid_wall")  _label="wall" ;;
+            "$pid_lock")  _label="lock" ;;
+        esac
+        if ! wait "$_pid"; then
+            echo "ERROR: jfrconv ${_label} conversion failed." >&2
+            wait_failed=1
+        fi
+    done
+    if [[ "$wait_failed" -ne 0 ]]; then
+        return 1
+    fi
+
+    echo ""
+    echo "📊 Flamegraphs ready:"
+    echo "   CPU time        : $cpu_html"
+    echo "   Allocations     : $alloc_html"
+    echo "   Wall-clock      : $wall_html"
+    echo "   Lock contention : $lock_html"
+    echo "   Combined JFR    : $jfr_path"
+
+    if [[ "$(uname)" == "Darwin" ]]; then
+        echo ""
+        echo "Opening all flamegraphs in browser..."
+        open "$cpu_html" "$alloc_html" "$wall_html" "$lock_html"
+    fi
+
+    local base_dir; base_dir="$(dirname "$jfr_path")"
+    echo ""
+    echo "💡 Next step: analyze results."
+    echo "   For collapsed stack analysis (CPU):"
+    echo "   jfrconv --cpu $jfr_path ${base}-cpu.collapsed"
+    echo "   python3 scripts/analyze_collapsed.py ${base}-cpu.collapsed"
+}
+
+# ── start ─────────────────────────────────────────────────────────────────────
+cmd_start() {
+    local asprof; asprof="$(locate_asprof)"
+    local timestamp; timestamp="$(date +%Y%m%d-%H%M%S)"
+    local safe_target; safe_target="$(printf '%s' "$TARGET" | tr -c '[:alnum:]._-' '_')"
+    [[ -n "$safe_target" ]] || safe_target="unknown"
+    local outdir="profile-${safe_target}-${timestamp}"
+    mkdir -p "$outdir"
+    local jfr_path; jfr_path="$(pwd)/${outdir}/combined.jfr"
+    local sess; sess="$(session_file)"
+
+    echo "▶ Starting all-event async-profiler on target: $TARGET"
+    echo "  Binary    : $asprof"
+    echo "  Output dir: $outdir/"
+    echo "  Events    : cpu + alloc + wall + lock (combined JFR)"
+    echo ""
+
+    # macOS: asprof stop ignores -f and writes to /var/folders instead.
+    # Create a sentinel so we can find the JFR after stop via find -newer.
+    local sentinel; sentinel="$(mktemp "/tmp/asprof-sentinel.XXXXXX")"
+    if [[ -L "$sentinel" ]]; then
+        echo "❌ mktemp created a symlink for the sentinel file: $sentinel" >&2
+        exit 1
+    fi
+
+    "$asprof" start --all "$TARGET"
+
+    # Save session state (jfr_path, asprof binary, sentinel path)
+    if [[ -L "$sess" ]]; then
+        echo "❌ Session file path is a symlink — refusing to use it." >&2
+        rm -f "$sentinel"; exit 1
+    fi
+    (umask 077; printf '%s\n%s\n%s\n' "$jfr_path" "$asprof" "$sentinel" > "$sess")
+
+    echo "✅ Profiling started. Session state: $sess"
+    echo ""
+    echo "Now reproduce the problem — make requests, run load, wait for the"
+    echo "slow operation, etc. asprof is collecting all event types."
+    echo ""
+    echo "When ready to collect results:"
+    echo "   bash scripts/collect.sh stop $TARGET"
+}
+
+# ── stop ──────────────────────────────────────────────────────────────────────
+cmd_stop() {
+    local sess; sess="$(session_file)"
+
+    if [[ ! -f "$sess" ]]; then
+        echo "❌ No active session found for target '$TARGET'." >&2
+        echo "   Expected state file: $sess" >&2
+        echo "   Run first: bash scripts/collect.sh start $TARGET" >&2
+        exit 1
+    fi
+
+    local jfr_path; jfr_path="$(sed -n '1p' "$sess")"
+    local asprof;   asprof="$(sed -n '2p' "$sess")"
+    local sentinel; sentinel="$(sed -n '3p' "$sess")"
+    [[ -n "$ASPROF_ARG" ]] && asprof="$ASPROF_ARG"
+
+    echo "⏹  Stopping profiler on target: $TARGET"
+    # Note: on macOS, -f is silently ignored by asprof stop — handled below.
+    "$asprof" stop -f "$jfr_path" "$TARGET"
+    # Session file is removed only after the JFR is confirmed written (see end of block).
+
+    # ── macOS JFR path workaround ────────────────────────────────────────────
+    # On macOS, asprof stop ignores -f and writes the JFR to:
+    #   /var/folders/<hash>/T/<timestamp>_<pid>/<timestamp>.jfr
+    # Use the sentinel (created at 'start') to find the file via find -newer.
+    if [[ "$(uname)" == "Darwin" ]] && [[ -n "$sentinel" ]] && [[ -f "$sentinel" ]]; then
+        echo ""
+        echo "⚠️  macOS: -f is ignored by asprof stop — locating JFR in /var/folders..."
+        local found_jfr=""
+        local -a jfr_matches=()
+        local jfr_candidate
+        while IFS= read -r -d '' jfr_candidate; do
+            jfr_matches+=("$jfr_candidate")
+        done < <(find /var/folders -maxdepth 8 -name "*.jfr" -newer "$sentinel" -print0 2>/dev/null)
+
+        # Sort by mtime (newest first) to avoid picking up an unrelated recording.
+        if [[ ${#jfr_matches[@]} -gt 0 ]]; then
+            found_jfr=$(ls -1t "${jfr_matches[@]}" 2>/dev/null | head -1)
+        fi
+        if [[ -n "$found_jfr" ]]; then
+            cp "$found_jfr" "$jfr_path"
+            rm -f "$sentinel"
+            echo "   Found: $found_jfr"
+            echo "   Copied to: $jfr_path"
+        else
+            echo "❌ Could not find JFR in /var/folders. Try:"
+            echo "   find /var/folders -maxdepth 8 -name '*.jfr' -newer '$sentinel' 2>/dev/null"
+            echo "   (The JFR may still be there — copy it manually to $jfr_path)"
+            echo "   Sentinel preserved at: $sentinel for retry"
+            echo "   Session state preserved at: $sess"
+            exit 1
+        fi
+    else
+        rm -f "$sentinel" 2>/dev/null || true
+    fi
+    # ────────────────────────────────────────────────────────────────────────
+    if [[ ! -s "$jfr_path" ]]; then
+        echo "❌ Profiling stopped but expected JFR output is missing or empty: $jfr_path"
+        echo "   Session state preserved at: $sess"
+        exit 1
+    fi
+    rm -f "$sess"
+
+    echo ""
+    echo "✅ Capture saved: $jfr_path"
+    echo ""
+
+    local jfrconv; jfrconv="$(locate_jfrconv "$asprof")"
+    if [[ -z "$jfrconv" ]]; then
+        echo "⚠️  jfrconv not found — skipping flamegraph split."
+        echo "   Convert manually: jfrconv --cpu $jfr_path cpu.html"
+        echo "   Or open in IntelliJ IDEA or JDK Mission Control."
+        return
+    fi
+
+    local base; base="$(dirname "$jfr_path")/profile"
+    split_jfr "$jfrconv" "$jfr_path" "$base"
+}
+
+# ── timed ─────────────────────────────────────────────────────────────────────
+cmd_timed() {
+    local asprof; asprof="$(locate_asprof)"
+    local timestamp; timestamp="$(date +%Y%m%d-%H%M%S)"
+    local safe_target; safe_target="$(printf '%s' "$TARGET" | tr -c '[:alnum:]._-' '_')"
+    [[ -n "$safe_target" ]] || safe_target="unknown"
+    local outdir="profile-${safe_target}-${timestamp}"
+    mkdir -p "$outdir"
+    local jfr_path="${outdir}/combined.jfr"
+
+    echo "⏱  ${DURATION}s all-event capture on target: $TARGET"
+    echo "   Binary  : $asprof"
+    echo "   Output  : $jfr_path"
+    echo "   Events  : cpu + alloc + wall + lock"
+    echo ""
+    echo "Running for ${DURATION}s — trigger your workload now."
+    echo "(If called with &, the agent can do other work and then: wait \$PROF_PID)"
+    echo ""
+
+    "$asprof" -d "$DURATION" --all -f "$jfr_path" "$TARGET"
+
+    echo ""
+    echo "✅ Capture complete: $jfr_path"
+    echo ""
+
+    local jfrconv; jfrconv="$(locate_jfrconv "$asprof")"
+    if [[ -z "$jfrconv" ]]; then
+        echo "⚠️  jfrconv not found — skipping flamegraph split."
+        echo "   Open $jfr_path in IntelliJ IDEA or JDK Mission Control."
+        return
+    fi
+
+    local base="${outdir}/profile"
+    split_jfr "$jfrconv" "$jfr_path" "$base"
+}
+
+# ── Dispatch ──────────────────────────────────────────────────────────────────
+case "$SUBCMD" in
+    start)           cmd_start ;;
+    stop)            cmd_stop  ;;
+    timed)           cmd_timed ;;
+    help|-h|--help)
+        sed -n '2,35p' "$0" | grep '^#' | sed 's/^# \?//'
+        exit 0
+        ;;
+    *)
+        echo "❌ Unknown subcommand: '$SUBCMD'" >&2
+        echo "   Valid subcommands: start | stop | timed" >&2
+        exit 1
+        ;;
+esac
diff --git a/skills/async-profiler/scripts/install.sh b/skills/async-profiler/scripts/install.sh
new file mode 100644
index 000000000..2987eca76
--- /dev/null
+++ b/skills/async-profiler/scripts/install.sh
@@ -0,0 +1,147 @@
+#!/usr/bin/env bash
+# install.sh — Download and install async-profiler for the current platform.
+#
+# Usage:
+#   ./install.sh                  # installs to ~/async-profiler-4.3
+#   ./install.sh /opt/profilers   # installs to /opt/profilers/async-profiler-4.3
+#   ./install.sh --path-only      # just prints the install path (for scripting)
+#
+# After install, the script prints the path to the asprof binary.
+
+set -euo pipefail
+
+VERSION="4.3"
+BASE_URL="https://github.com/async-profiler/async-profiler/releases/download/v${VERSION}"
+INSTALL_PARENT="${1:-$HOME}"
+
+# --path-only: don't install, just print where asprof would end up
+if [[ "${1:-}" == "--path-only" ]]; then
+  echo "$HOME/async-profiler-${VERSION}/bin/asprof"
+  exit 0
+fi
+
+# ── Detect platform ──────────────────────────────────────────────────────────
+OS="$(uname -s)"
+ARCH="$(uname -m)"
+
+case "$OS" in
+  Darwin)
+    PLATFORM="macos"
+    ;;
+  Linux)
+    PLATFORM="linux"
+    ;;
+  *)
+    echo "❌ Unsupported OS: $OS (async-profiler supports Linux and macOS)"
+    exit 1
+    ;;
+esac
+
+case "$ARCH" in
+  x86_64|amd64)  ARCH_LABEL="x64" ;;
+  aarch64|arm64) ARCH_LABEL="arm64" ;;
+  *)
+    echo "❌ Unsupported architecture: $ARCH"
+    exit 1
+    ;;
+esac
+
+# macOS ships as a single universal binary (covers both x64 and arm64)
+if [[ "$PLATFORM" == "macos" ]]; then
+  ARCHIVE="async-profiler-${VERSION}-macos.zip"
+  EXTRACTED_DIR="async-profiler-${VERSION}-macos"
+else
+  ARCHIVE="async-profiler-${VERSION}-linux-${ARCH_LABEL}.tar.gz"
+  EXTRACTED_DIR="async-profiler-${VERSION}-linux-${ARCH_LABEL}"
+fi
+
+INSTALL_DIR="${INSTALL_PARENT}/async-profiler-${VERSION}"
+DOWNLOAD_URL="${BASE_URL}/${ARCHIVE}"
+
+# ── Already installed? ───────────────────────────────────────────────────────
+if [[ -x "${INSTALL_DIR}/bin/asprof" ]]; then
+  echo "✅ async-profiler ${VERSION} is already installed at: ${INSTALL_DIR}"
+  echo "   Binary: ${INSTALL_DIR}/bin/asprof"
+  exit 0
+fi
+
+# Destination exists but is not a valid installation — refuse to clobber.
+if [[ -e "${INSTALL_DIR}" ]]; then
+  echo "❌ Install destination already exists but does not appear to be a valid async-profiler installation:"
+  echo "   ${INSTALL_DIR}"
+  echo "   Expected executable: ${INSTALL_DIR}/bin/asprof"
+  echo "   Remove it manually and re-run, or choose a different parent directory:"
+  echo "   bash scripts/install.sh /path/to/dir"
+  exit 1
+fi
+
+# ── Download ─────────────────────────────────────────────────────────────────
+echo "📦 Installing async-profiler ${VERSION} for ${PLATFORM}-${ARCH_LABEL}..."
+echo "   Downloading: ${DOWNLOAD_URL}"
+
+TMP_DIR="$(mktemp -d)"
+trap 'rm -rf "$TMP_DIR"' EXIT
+
+cd "$TMP_DIR"
+
+if command -v curl &>/dev/null; then
+  curl -fsSL -o "$ARCHIVE" "$DOWNLOAD_URL"
+elif command -v wget &>/dev/null; then
+  wget -q -O "$ARCHIVE" "$DOWNLOAD_URL"
+else
+  echo "❌ Neither curl nor wget found. Install one and retry."
+  exit 1
+fi
+
+# ── Extract ──────────────────────────────────────────────────────────────────
+echo "   Extracting..."
+if [[ "$ARCHIVE" == *.zip ]]; then
+  if ! command -v unzip &>/dev/null; then
+    echo "❌ 'unzip' is required to extract the macOS archive but was not found."
+    echo "   Install it with: brew install unzip"
+    exit 1
+  fi
+  unzip -q "$ARCHIVE"
+else
+  tar xf "$ARCHIVE"
+fi
+
+# Move into place
+mkdir -p "$INSTALL_PARENT"
+mv "$EXTRACTED_DIR" "$INSTALL_DIR"
+chmod +x "${INSTALL_DIR}/bin/asprof"
+
+# macOS: remove quarantine flag so Gatekeeper doesn't block it
+if [[ "$PLATFORM" == "macos" ]]; then
+  xattr -dr com.apple.quarantine "${INSTALL_DIR}" 2>/dev/null || true
+fi
+
+# ── Verify ───────────────────────────────────────────────────────────────────
+ASPROF="${INSTALL_DIR}/bin/asprof"
+if ! "$ASPROF" --version &>/dev/null; then
+  echo "❌ Installed but 'asprof --version' failed. Check $INSTALL_DIR"
+  exit 1
+fi
+
+INSTALLED_VERSION="$("$ASPROF" --version 2>&1 | head -1)"
+
+echo ""
+echo "✅ async-profiler installed successfully!"
+echo "   Version : $INSTALLED_VERSION"
+echo "   Location: ${INSTALL_DIR}"
+echo "   Binary  : ${ASPROF}"
+echo ""
+echo "To add asprof to your PATH, add this to ~/.zshrc or ~/.bashrc:"
+echo "   export PATH=\"${INSTALL_DIR}/bin:\$PATH\""
+echo ""
+
+# ── macOS: print limitation note ─────────────────────────────────────────────
+if [[ "$PLATFORM" == "macos" ]]; then
+  echo "ℹ️  macOS note: async-profiler uses the itimer CPU engine on macOS."
+  echo "   Kernel stack frames are not available (platform limitation)."
+  echo "   CPU and allocation profiles are still highly useful."
+  echo ""
+fi
+
+echo "Quick test (requires a running JVM — find PID with: jps -l):"
+echo "   asprof -d 5 <PID>"
diff --git a/skills/async-profiler/scripts/run_profile.sh b/skills/async-profiler/scripts/run_profile.sh
new file mode 100644
index 000000000..09a6867a1
--- /dev/null
+++ b/skills/async-profiler/scripts/run_profile.sh
@@ -0,0 +1,253 @@
+#!/usr/bin/env bash
+# run_profile.sh — Wrapper around asprof for common profiling scenarios.
+#
+# Usage:
+#   ./run_profile.sh [options] <PID|app-name>
+#
+# Options:
+#   -e, --event   cpu|alloc|wall|lock    Single event (default: cpu)
+#   -d, --duration N                     Seconds to profile (default: 30)
+#   -f, --format  html|jfr|collapsed     Output format for single-event (default: html)
+#   -o, --output  FILE                   Output path (default: auto-named)
+#   -t, --threads                        Profile threads separately
+#       --all                            Capture all events to a JFR file
+#       --comprehensive                  Capture all events AND split into per-event
+#                                        flamegraphs in parallel (recommended for
+#                                        diagnosis when you don't know the cause)
+#       --asprof  PATH                   Path to asprof binary (auto-detected)
+#   -h, --help                           Show this help
+#
+# Examples:
+#   ./run_profile.sh 12345                        # 30s CPU flamegraph
+#   ./run_profile.sh --comprehensive 12345        # all events, split into flamegraphs
+#   ./run_profile.sh -e alloc -d 60 MyApp         # 60s allocation flamegraph
+#   ./run_profile.sh -e wall -f jfr 12345         # wall-clock JFR recording
+#   ./run_profile.sh --all -d 120 12345           # all events, single JFR file
+
+set -euo pipefail
+
+# ── Defaults ─────────────────────────────────────────────────────────────────
+EVENT="cpu"
+DURATION=30
+FORMAT="html"
+OUTPUT=""
+THREADS=false
+ALL_EVENTS=false
+COMPREHENSIVE=false
+ASPROF=""
+TARGET=""
+
+# ── Parse arguments ───────────────────────────────────────────────────────────
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    -e|--event)       [[ $# -ge 2 ]] || { echo "❌ Missing value for $1" >&2; exit 1; }; EVENT="$2";    shift 2 ;;
+    -d|--duration)    [[ $# -ge 2 ]] || { echo "❌ Missing value for $1" >&2; exit 1; }; DURATION="$2"; shift 2 ;;
+    -f|--format)      [[ $# -ge 2 ]] || { echo "❌ Missing value for $1" >&2; exit 1; }; FORMAT="$2";   shift 2 ;;
+    -o|--output)      [[ $# -ge 2 ]] || { echo "❌ Missing value for $1" >&2; exit 1; }; OUTPUT="$2";   shift 2 ;;
+    -t|--threads)     THREADS=true;  shift ;;
+    --all)            ALL_EVENTS=true; FORMAT="jfr"; shift ;;
+    --comprehensive)  COMPREHENSIVE=true; ALL_EVENTS=true; FORMAT="jfr"; shift ;;
+    --asprof)         [[ $# -ge 2 ]] || { echo "❌ Missing value for $1" >&2; exit 1; }; ASPROF="$2";   shift 2 ;;
+    -h|--help)
+      sed -n '2,/^[^#]/p' "$0" | grep '^#' | sed 's/^# \?//'
+      exit 0
+      ;;
+    -*)
+      echo "❌ Unknown option: $1" >&2
+      exit 1
+      ;;
+    *)
+      TARGET="$1"
+      shift
+      ;;
+  esac
+done
+
+if [[ -z "$TARGET" ]]; then
+  echo "❌ No target specified. Provide a PID or app name."
+  echo "   Usage: $0 [options] <PID|app-name>"
+  echo "   List Java processes: jps -l"
+  exit 1
+fi
+
+# ── Locate asprof ─────────────────────────────────────────────────────────────
+if [[ -z "$ASPROF" ]]; then
+  if command -v asprof &>/dev/null; then
+    ASPROF="$(command -v asprof)"
+  else
+    for candidate in \
+      "$HOME/async-profiler-4.3/bin/asprof" \
+      "$HOME/async-profiler/bin/asprof" \
+      "/opt/async-profiler/bin/asprof" \
+      "/usr/local/bin/asprof"
+    do
+      if [[ -x "$candidate" ]]; then
+        ASPROF="$candidate"
+        break
+      fi
+    done
+  fi
+fi
+
+if [[ -z "$ASPROF" ]]; then
+  echo "❌ asprof not found. Install with: bash scripts/install.sh"
+  echo "   Or specify path: --asprof /path/to/asprof"
+  exit 1
+fi
+
+# ── Build output filename ─────────────────────────────────────────────────────
+TIMESTAMP="$(date +%Y%m%d-%H%M%S)"
+
+if [[ -z "$OUTPUT" ]]; then
+  if $ALL_EVENTS; then
+    OUTPUT="profile-all-${TIMESTAMP}.jfr"
+  else
+    EXT="$FORMAT"
+    OUTPUT="profile-${EVENT}-${TIMESTAMP}.${EXT}"
+  fi
+fi
+
+# ── Build asprof command ──────────────────────────────────────────────────────
+CMD=("$ASPROF" "-d" "$DURATION" "-f" "$OUTPUT")
+$ALL_EVENTS  && CMD+=("--all") || CMD+=("-e" "$EVENT")
+$THREADS     && CMD+=("-t")
+CMD+=("$TARGET")
+
+# ── Print plan ────────────────────────────────────────────────────────────────
+echo "🔍 async-profiler run"
+echo "   Binary  : $ASPROF"
+echo "   Target  : $TARGET"
+if $COMPREHENSIVE; then
+  echo "   Mode    : comprehensive (all events → JFR → split into flamegraphs)"
+elif $ALL_EVENTS; then
+  echo "   Events  : all (cpu + alloc + wall + lock)"
+else
+  echo "   Event   : $EVENT"
+fi
+echo "   Duration: ${DURATION}s"
+echo "   Output  : $OUTPUT"
+$THREADS && echo "   Threads : separate"
+echo ""
+echo "▶ ${CMD[*]}"
+echo "Press Ctrl+C to stop early (partial results will be saved)."
+echo ""
+
+# ── Execute ───────────────────────────────────────────────────────────────────
+"${CMD[@]}"
+
+echo ""
+echo "✅ Capture complete: $OUTPUT"
+echo ""
+
+# ── Comprehensive mode: split JFR into per-event flamegraphs in parallel ──────
+if $COMPREHENSIVE; then
+  if ! command -v jfrconv &>/dev/null; then
+    # jfrconv ships alongside asprof
+    JFRCONV="$(dirname "$ASPROF")/jfrconv"
+    if [[ ! -x "$JFRCONV" ]]; then
+      echo "⚠️  jfrconv not found — skipping flamegraph split."
+      echo "   You can convert manually: jfrconv $OUTPUT flamegraph.html"
+      COMPREHENSIVE=false
+    fi
+  else
+    JFRCONV="jfrconv"
+  fi
+fi
+
+if $COMPREHENSIVE; then
+  BASE="${OUTPUT%.jfr}"
+  CPU_HTML="${BASE}-cpu.html"
+  ALLOC_HTML="${BASE}-alloc.html"
+  WALL_HTML="${BASE}-wall.html"
+  LOCK_HTML="${BASE}-lock.html"
+
+  echo "Splitting into per-event flamegraphs in parallel..."
+
+  "$JFRCONV" --cpu   "$OUTPUT" "$CPU_HTML"   &  PID_CPU=$!
+  "$JFRCONV" --alloc "$OUTPUT" "$ALLOC_HTML" &  PID_ALLOC=$!
+  "$JFRCONV" --wall  "$OUTPUT" "$WALL_HTML"  &  PID_WALL=$!
+  "$JFRCONV" --lock  "$OUTPUT" "$LOCK_HTML"  &  PID_LOCK=$!
+
+  CONVERSION_FAILED=false
+  for pid in "$PID_CPU" "$PID_ALLOC" "$PID_WALL" "$PID_LOCK"; do
+    if ! wait "$pid"; then
+      CONVERSION_FAILED=true
+    fi
+  done
+
+  if $CONVERSION_FAILED; then
+    echo "Error: one or more jfrconv conversions failed." >&2
+    exit 1
+  fi
+
+  echo ""
+  echo "📊 Flamegraphs ready:"
+  echo "   CPU time     : $CPU_HTML"
+  echo "   Allocations  : $ALLOC_HTML"
+  echo "   Wall-clock   : $WALL_HTML"
+  echo "   Lock contention: $LOCK_HTML"
+  echo "   Combined JFR : $OUTPUT  (open in IntelliJ or JDK Mission Control)"
+  echo ""
+
+  # Open all flamegraphs at once if on macOS
+  if [[ "$(uname)" == "Darwin" ]]; then
+    echo "Opening all flamegraphs in browser..."
+    open "$CPU_HTML" "$ALLOC_HTML" "$WALL_HTML" "$LOCK_HTML"
+  else
+    echo "Open flamegraphs with:"
+    echo "   xdg-open $CPU_HTML"
+    echo "   xdg-open $ALLOC_HTML"
+    echo "   xdg-open $WALL_HTML"
+    echo "   xdg-open $LOCK_HTML"
+  fi
+
+  echo ""
+  echo "💡 Next step — analyze results:"
+  echo "   Ask your AI assistant: 'Analyze these profiles and tell me where"
+  echo "   to focus: $CPU_HTML, $ALLOC_HTML, $WALL_HTML, $LOCK_HTML'"
+  echo ""
+  echo "   Or for collapsed stack analysis:"
+  echo "   jfrconv $OUTPUT ${BASE}-cpu.collapsed"
+  echo "   python3 scripts/analyze_collapsed.py ${BASE}-cpu.collapsed"
+
+else
+  # Single-event post-run guidance
+  case "$FORMAT" in
+    html)
+      echo "Open in browser:"
+      if [[ "$(uname)" == "Darwin" ]]; then
+        open "$OUTPUT"
+      else
+        echo "   xdg-open $OUTPUT"
+      fi
+      echo ""
+      echo "What to look for:"
+      echo "  • Wide frames near the top = hot code (primary optimization targets)"
+      echo "  • Wide leaf frames = direct CPU/allocation consumers"
+      echo "  • LockSupport.park / Object.wait (wall profile) = blocked threads"
+      echo ""
+      echo "💡 Next step — ask your AI assistant to analyze:"
+      echo "   'I have a flamegraph at $OUTPUT — what's causing the bottleneck?'"
+      ;;
+    jfr)
+      echo "Open in IntelliJ IDEA: File → Open → select $OUTPUT"
+      echo "Open in JDK Mission Control: File → Open File → select $OUTPUT"
+      echo ""
+      echo "Or convert to flamegraph:"
+      echo "   jfrconv $OUTPUT flamegraph.html"
+      echo ""
+      echo "💡 Next step — ask your AI assistant to analyze:"
+      echo "   'I have a JFR recording at $OUTPUT — help me interpret it.'"
+      ;;
+    collapsed)
+      echo "Analyze with:"
+      echo "   python3 scripts/analyze_collapsed.py $OUTPUT"
+      echo ""
+      echo "Or convert to flamegraph:"
+      echo "   jfrconv $OUTPUT flamegraph.html"
+      echo ""
+      echo "💡 Next step — ask your AI assistant to analyze:"
+      echo "   'Run analyze_collapsed.py on $OUTPUT and tell me what's slow.'"
+      ;;
+  esac
+fi
diff --git a/skills/async-profiler/setup/SKILL.md b/skills/async-profiler/setup/SKILL.md
new file mode 100644
index 000000000..79f2fda40
--- /dev/null
+++ b/skills/async-profiler/setup/SKILL.md
@@ -0,0 +1,199 @@
+---
+name: async-profiler-setup
+description: 'Install, configure, and verify async-profiler for Java on macOS or Linux. Use this skill whenever a Java developer wants to profile their JVM and needs to get async-profiler installed first. Trigger for: "install async-profiler", "how do I set up async-profiler", "get started with Java profiling", "async-profiler not found", "profiler setup", "download asprof", or any question about system requirements, permissions, or JVM flags for profiling. Also trigger when someone says "I want to profile my Java app" and hasn''t mentioned having async-profiler installed yet.'
+---
+
+# async-profiler Setup
+
+async-profiler (v4.3+) is a low-overhead sampling profiler for Java. It avoids the
+"safepoint bias" of standard JVM profilers by using HotSpot-specific APIs, and it
+can profile CPU, memory allocation, wall-clock time, and lock contention.
+
+## Do you need to install anything?
+
+**If you're using IntelliJ IDEA Ultimate**, async-profiler is already bundled —
+no installation needed for profiling apps you run from the IDE. You can profile
+any run configuration right now by clicking the flame icon (▶🔥) next to the run
+button, or via *Run → Profile*. Jump straight to the **async-profiler-profile**
+skill if that's your use case.
+
+You do still need a standalone install if you want to:
+- Profile a process not launched from IntelliJ (remote server, Docker, SSH)
+- Use `asprof` from the terminal or CI pipeline
+- Run `scripts/run_profile.sh` or `scripts/analyze_collapsed.py`
+- Use IntelliJ IDEA Community (no built-in profiler)
+
+**Everyone else** (Community edition, terminal-only, production servers):
+continue below.
+
+---
+
+## Step 1 — Download
+
+The latest stable release is **v4.3** (January 2025). The skill includes an
+install script that handles everything automatically.
+
+### Option A — use the bundled install script (recommended)
+
+`scripts/install.sh` auto-detects the platform (macOS arm64/x64, Linux x64/arm64),
+downloads the right binary, removes the macOS Gatekeeper quarantine flag, and
+verifies the install:
+
+```bash
+bash scripts/install.sh               # installs to ~/async-profiler-4.3/
+bash scripts/install.sh /opt          # installs to /opt/async-profiler-4.3/
+```
+
+It prints the exact binary path and a one-liner to add it to your PATH.
+
+### Option B — manual install
+
+**macOS (Intel or Apple Silicon):**
+```bash
+# Using Homebrew (easiest)
+brew install async-profiler
+
+# Or download directly
+curl -LO https://github.com/async-profiler/async-profiler/releases/download/v4.3/async-profiler-4.3-macos.zip
+unzip async-profiler-4.3-macos.zip
+```
+
+**Linux x64:**
+```bash
+curl -LO https://github.com/async-profiler/async-profiler/releases/download/v4.3/async-profiler-4.3-linux-x64.tar.gz
+tar xf async-profiler-4.3-linux-x64.tar.gz
+```
+
+**Linux arm64:**
+```bash
+curl -LO https://github.com/async-profiler/async-profiler/releases/download/v4.3/async-profiler-4.3-linux-arm64.tar.gz
+tar xf async-profiler-4.3-linux-arm64.tar.gz
+```
+
+After extracting, add `bin/` to your PATH:
+```bash
+export PATH="$PWD/bin:$PATH"
+# Or permanently in ~/.zshrc / ~/.bashrc
+```
+
+Verify:
+```bash
+asprof --version
+```
+
+## Step 2 — Platform-specific configuration
+
+### macOS
+
+On macOS, async-profiler works out of the box with no extra configuration. The
+default CPU sampling engine is **itimer**, which works without elevated privileges.
+
+**Important limitation to communicate to the user:** On macOS, async-profiler
+cannot collect kernel stack frames and the itimer engine has a known bias toward
+system calls. CPU profiles are still very useful, but they reflect user-space
+time more faithfully than kernel time. This is a platform constraint, not a bug.
+
+### Linux — enabling kernel stack traces (optional but recommended)
+
+On Linux, async-profiler prefers the **perf_events** engine, which gives the most
+accurate profiles and includes kernel frames. It requires:
+
+```bash
+# Allow non-root perf_events (set once, persists until reboot)
+sudo sysctl kernel.perf_event_paranoid=1
+sudo sysctl kernel.kptr_restrict=0
+```
+
+To make these permanent across reboots, add to `/etc/sysctl.d/99-perf.conf`:
+```
+kernel.perf_event_paranoid=1
+kernel.kptr_restrict=0
+```
+
+If perf_events isn't available (e.g., inside a container), async-profiler
+automatically falls back to **ctimer** — no action needed.
+
+### Linux — container / Docker
+
+In containers, perf_events is typically restricted by seccomp. async-profiler
+still works via the itimer/ctimer fallback. If you want full perf_events inside a
+container, the container needs `--cap-add SYS_ADMIN` or `--privileged` (use
+judiciously in production).
+
+## Step 3 — Configure the JVM for better profiles
+
+Add these flags when starting your Java application. They're optional but make
+profiles significantly more accurate by allowing the JVM to provide stack frames
+even between safepoints:
+
+```bash
+java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -jar myapp.jar
+```
+
+If you're using a framework that manages JVM startup (Spring Boot, Quarkus, etc.),
+set these in `JAVA_TOOL_OPTIONS`:
+```bash
+export JAVA_TOOL_OPTIONS="-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints"
+```
+
+## Step 4 — Verify everything works
+
+Find your Java process PID first:
+```bash
+jps -l          # lists all JVM processes with their main class
+# or
+ps aux | grep java
+```
+
+Then run a quick 5-second test profile:
+```bash
+asprof -d 5 <PID>
+```
+
+You should see output like:
+```
+Profiling for 5 seconds
+--- Execution profile ---
+Total samples       : 453
+...
+```
+
+If it works, you're ready to profile. If you hit errors, see the troubleshooting
+section below.
+
+## Troubleshooting common issues
+
+**"Could not attach to <PID>"**
+- The JVM may need `-XX:+PerfDataSaveToFile` or you may lack permissions. Run as
+  the same user that owns the JVM process, or use `sudo`.
+
+**"Failed to open perf_events"**
+- Run the sysctl commands in Step 2, or use `-e itimer` to force the itimer engine.
+
+**"No such process"**
+- Double-check the PID with `jps -l`. JVM processes can restart under a new PID.
+
+**Homebrew install on macOS says "permission denied" running asprof**
+- `chmod +x $(brew --prefix async-profiler)/bin/asprof`
+
+**macOS Gatekeeper blocks the binary**
+- `xattr -d com.apple.quarantine /path/to/asprof` (removes the quarantine flag)
+
+## Using async-profiler as a Java agent
+
+If you can't attach dynamically (e.g., the JVM was started with
+`-XX:-UseDynamicCodeDeoptimization`), use the Java agent mode:
+
+```bash
+java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=profile.html \
+     -jar myapp.jar
+```
+
+This starts profiling from the first moment the JVM launches, which is useful
+for capturing startup performance.
+
+## What's next
+
+Once installed, use the **async-profiler-profile** skill to run a profiling
+session and choose the right event type for your problem (CPU, memory, wall-clock,
+or lock contention).