Skip to content

AMDResearch/intellikit

Repository files navigation

Intellikit Logo

IntelliKit

LLM-Ready Profiling and Analysis Toolkit for AMD CPU and GPUs

IntelliKit is a collection of intelligent tools designed to make CPU and GPU code development, profiling, and validation accessible to LLMs and human developers alike. Built for AMD ROCm, these tools provide clean abstractions over complex GPU internals.

Philosophy

Traditional CPU and GPU profiling and analysis tools expose raw hardware counters and assembly. IntelliKit tools are designed to:

  • Decode complexity: Turn hardware metrics into human-readable insights
  • Enable LLM integration: Provide clean APIs suitable for LLM-driven workflows (MCP-ready)

Tools

Accordo - Automated Kernel Validation

Automated correctness validation for GPU kernel optimizations.

Use cases:

  • Verify optimized kernels match reference implementation
  • Compare performance while ensuring correctness
  • Test multiple optimization candidates efficiently

Quick example:

from accordo import Accordo

# Create validator (auto-extracts kernel signature)
validator = Accordo(binary="./ref", kernel_name="reduce_sum")

# Capture snapshots from reference and optimized binaries
ref = validator.capture_snapshot(binary="./ref")
opt = validator.capture_snapshot(binary="./opt")

# Compare for correctness
result = validator.compare_snapshots(ref, opt, tolerance=1e-6)

if result.is_valid:
    print(f"✓ PASS: {result.num_arrays_validated} arrays matched")
else:
    print(result.summary())

Linex - Source-Level GPU Performance Profiling

Maps GPU performance metrics to your source code lines.

Use cases:

  • Identify performance hotspots at source code granularity
  • Understand cycle-level timing for each line of code
  • Analyze stall patterns and execution bottlenecks

Quick example:

from linex import Linex

profiler = Linex()
profiler.profile("./my_app", kernel_filter="my_kernel")

# Show hotspots
for line in profiler.source_lines[:5]:
    print(f"{line.file}:{line.line_number}")
    print(f"  {line.total_cycles:,} cycles ({line.stall_percent:.1f}% stalled)")

Compile kernels with -g for source-line mapping (file:line); without -g you still get instructions (ISA + cycles) but source_lines is empty and inst.file/inst.line are empty/0. See linex/README.md.

Metrix - Human-Readable GPU Metrics

Decodes hardware counters into actionable performance insights.

Use cases:

  • Profile GPU kernels with clean, understandable metrics
  • Identify memory bandwidth bottlenecks
  • Analyze compute utilization patterns

Quick example:

from metrix import Metrix

profiler = Metrix()
results = profiler.profile("./my_app", metrics=["memory.hbm_bandwidth_utilization"])

for kernel in results.kernels:
    print(f"{kernel.name}: {kernel.duration_us.avg:.2f} μs")
    print(f"Memory BW: {kernel.metrics['memory.hbm_bandwidth_utilization'].avg:.1f}%")

Nexus - HSA Packet Source Code Extractor

Intercepts GPU kernel launches and extracts source code + assembly from HSA packets.

Use cases:

  • Understand what code actually runs on the GPU
  • Debug kernel compilation and optimization
  • Trace HIP, Triton, and other GPU frameworks

Quick example:

from nexus import Nexus

nexus = Nexus(log_level=1)
trace = nexus.run(["python", "gpu_app.py"])

for kernel in trace:
    print(f"{kernel.name}: {len(kernel.assembly)} instructions")
    print(kernel.hip)  # Source code

ROCm-MCP - Model Context Protocol Servers of ROCm Tools

Enables LLMs to interact with ROCm tools via MCP.

Use cases:

  • Compile HIP code.
  • Access HIP reference guide.
  • Query device capabilities.

Quick example:

Add to your JSON MCP config:

{
  "mcpServers": {
    "hip-compiler-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/rocm_mcp", "hip-compiler-mcp"]
    }
  }
}

uprof-MCP - Model Context Protocol Server for uProf

Enables LLMs to interact with AMD uProf via MCP.

Use cases:

  • Profile applications using uProf.

Quick example: Add to your JSON MCP config:

{
  "mcpServers": {
    "uprof-profiler-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/uprof_mcp", "uprof-profiler-mcp"]
    }
  }
}

Installation

Install each tool from its subdirectory. No top-level metapackage.

Install all tools from Git (one command):

curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/tools/install.sh | bash

Options:

  • Custom pip command (for multiple Python versions):

    curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/tools/install.sh | bash -s -- --pip-cmd pip3.12
    # or
    curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/tools/install.sh | bash -s -- --pip-cmd "python3.12 -m pip"
  • Install from a specific branch/tag:

    curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/tools/install.sh | bash -s -- --ref my-branch
  • Dry-run (preview commands):

    curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/tools/install.sh | bash -s -- --dry-run
  • From a clone:

    ./install/tools/install.sh --pip-cmd pip3.12 --ref main --dry-run

Environment variables (CLI options take precedence): PIP_CMD, INTELLIKIT_REPO_URL, INTELLIKIT_REF

Install individual tools from Git:

pip install "git+https://github.com/AMDResearch/intellikit.git#subdirectory=accordo"
pip install "git+https://github.com/AMDResearch/intellikit.git#subdirectory=linex"
pip install "git+https://github.com/AMDResearch/intellikit.git#subdirectory=metrix"
pip install "git+https://github.com/AMDResearch/intellikit.git#subdirectory=nexus"
pip install "git+https://github.com/AMDResearch/intellikit.git#subdirectory=rocm_mcp"
pip install "git+https://github.com/AMDResearch/intellikit.git#subdirectory=uprof_mcp"

From a clone (editable installs from local paths):

git clone https://github.com/AMDResearch/intellikit.git
cd intellikit
pip install -e ./accordo
pip install -e ./linex
# ... or any subset of the tools

Agent Skills (AI agents)

Install IntelliKit skills so AI agents can discover and use Metrix, Accordo, Nexus, and Linex. Skills are installed as SKILL.md files under a single directory; agents that read that location get the instructions automatically.

Default: local (current workspace) - agents target

# One-liner: installs into ./.agents/skills/ (current directory)
curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/skills/install.sh | bash

Different agent targets (cursor, claude, codex, agents):

# Cursor
curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/skills/install.sh | bash -s -- --target cursor

# Claude
curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/skills/install.sh | bash -s -- --target claude

# Codex
curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/skills/install.sh | bash -s -- --target codex

Global (all projects)

# Install into ~/.cursor/skills/ (or ~/.claude/skills/, etc.)
curl -sSL https://raw.githubusercontent.com/AMDResearch/intellikit/main/install/skills/install.sh | bash -s -- --target cursor --global

From a clone

git clone https://github.com/AMDResearch/intellikit.git
cd intellikit
./install/skills/install.sh                    # local: ./.agents/skills/
./install/skills/install.sh --target cursor     # local: ./.cursor/skills/
./install/skills/install.sh --target claude --global  # global: ~/.claude/skills/
./install/skills/install.sh --dry-run          # show what would be installed

Resulting layout:

Skills are installed inside the target directory (e.g., ./.agents/skills/ for local agents, ~/.cursor/skills/ for global cursor) with one subdirectory per tool: metrix/SKILL.md, accordo/SKILL.md, nexus/SKILL.md, linex/SKILL.md.

Requirements

  • Python: >= 3.10
  • ROCm: >= 6.0 (7.0+ for linex)
  • Hardware: MI300+ GPUs

Documentation

Each tool has its own detailed documentation:

Example Workflow

# 1. Profile baseline kernel with Metrix
from metrix import Metrix
profiler = Metrix()
baseline_results = profiler.profile("./app_baseline")
baseline_bw = baseline_results.kernels[0].metrics['memory.hbm_bandwidth_utilization'].avg

# 2. Extract kernel source with Nexus
from nexus import Nexus
nexus = Nexus()
trace = nexus.run(["./app_baseline"])
for kernel in trace:
    print(kernel.hip)  # Source code

# 3. Apply optimization (external step)
# ... modify kernel ...

# 4. Validate with Accordo
from accordo import Accordo
validator = Accordo(binary="./app_baseline", kernel_name="my_kernel")

ref_snap = validator.capture_snapshot(binary="./app_baseline")
opt_snap = validator.capture_snapshot(binary="./app_opt")
result = validator.compare_snapshots(ref_snap, opt_snap, tolerance=1e-6)

if result.is_valid:
    opt_results = profiler.profile("./app_opt")
    opt_bw = opt_results.kernels[0].metrics['memory.hbm_bandwidth_utilization'].avg
    print(f"✓ PASS: {result.num_arrays_validated} arrays matched")
    print(f"BW Improvement: {opt_bw - baseline_bw:.1f}%")

Contributing

We welcome contributions and feedback! Open an issue or create a PR.

License

MIT License - Copyright (c) 2025-2026 Advanced Micro Devices, Inc.

See LICENSE for full details.

Support

Need help? Here's how to reach us:


Made with 🧠 for the future of LLM-assisted GPU development

About

IntelliKit is a collection of intelligent tools designed to make GPU kernel development, profiling, and validation accessible to LLMs and human developers alike. Built for AMD ROCm, these tools provide clean abstractions over complex GPU internals.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors