[Docs] Add Sphinx documentation website with GitHub Actions deployment by sunway513 · Pull Request #2167 · ROCm/aiter

sunway513 · 2026-03-04T05:44:54Z

Summary

Add Sphinx-based documentation website with automated GitHub Actions deployment to GitHub Pages
Include API reference docs for attention, GEMM, and operator modules
Add tutorials: basic usage guide and "How to Add a New Operator" guide
Include installation guide, quickstart, and deployment documentation

Files Added

.github/workflows/docs.yml - GitHub Actions workflow for building and deploying Sphinx docs
docs/conf.py, docs/index.rst, docs/Makefile, docs/requirements.txt - Sphinx build configuration
docs/api/ - API reference documentation (attention, gemm, operators)
docs/tutorials/ - User tutorials (basic usage, adding new operators)
docs/installation.rst, docs/quickstart.rst - Getting started guides

Test plan

Verify Sphinx builds successfully: cd docs && pip install -r requirements.txt && make html
Verify GitHub Actions workflow triggers on push to docs-website branch
Review generated HTML output for correctness

🤖 Generated with Claude Code

This commit introduces safeguards and documentation to prepare for a major repository cleanup that will reduce the repo size from 547 MB to ~130 MB (76% reduction). Changes: - Enhanced .gitignore to prevent large files (test data, build artifacts) - Created test data download script framework - Documented cleanup plan and migration process The actual history cleanup will be performed separately during a scheduled maintenance window, requiring all contributors to re-clone. See REPO_CLEANUP_PLAN.md for full details. Impact: No immediate changes to functionality. Protective measures only.

- Fix migration steps to preserve local changes using patch files instead of git stash - Update size reduction numbers to match actual test results (105MB vs aspirational 50MB) - Clarify that pre-commit hook for size checks is not included (to avoid conflict with existing hook) - Update hook installation instructions to align with existing CONTRIBUTE.md workflow - Fix test data download script to exit with error code when unconfigured - Remove references to non-existent files (paths_to_remove.txt, aiter_cleanup_results.md) All changes address feedback from Copilot code review.

Add comprehensive Sphinx-based documentation website for AITER. Features: - Installation guide with 3 installation methods - Quick start tutorial with runnable examples - API reference for attention, GEMM, and operators - Basic usage tutorial with performance comparisons - Configuration for doc.aiter.amd.com hosting Structure: - docs/conf.py: Sphinx configuration with AMD branding - docs/index.rst: Main documentation landing page - docs/installation.rst: Detailed installation instructions - docs/quickstart.rst: 5-minute getting started guide - docs/api/: Complete API reference documentation - docs/tutorials/: Hands-on tutorials with code examples The documentation can be built locally with: cd docs && pip install -r requirements.txt && make html This brings AITER documentation quality on par with FlashInfer. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add automated build and deployment workflow for Sphinx documentation. Features: - Automatic build on push to docs-website and main branches - Deploys to GitHub Pages via gh-pages branch - Build artifacts available for PR previews - Uses sphinx-build with all extensions - Caches pip dependencies for faster builds Workflow: 1. Checkout code 2. Install Python and dependencies 3. Build Sphinx HTML documentation 4. Upload build artifacts 5. Deploy to gh-pages branch (on push) Documentation will be available at: https://sunway513.github.io/aiter/ Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add detailed step-by-step guide for adding custom operators to AITER. Features: - Complete workflow from Python interface to ROCm kernel - Real code examples for each step - PyBind11 bindings setup - Testing and benchmarking guidelines - Best practices and debugging tips - Complete RMSNorm example as reference This addresses team feedback: "搞个how to add new op之类的就完美了" Includes: - Step 1: Define operator interface (Python) - Step 2: Implement ROCm/HIP kernel - Step 3: Create PyBind11 bindings - Step 4: Update build configuration - Step 5: Add comprehensive tests - Step 6: Build and install - Step 7: Register in main module Advanced topics: - CK (Composable Kernel) integration - Triton kernel development - Fused operations pattern - In-place operations - Autograd support for training Also updated: - docs/index.rst: Added Quick Links section highlighting the tutorial - docs/tutorials/index.rst: Added to Advanced Topics section Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix high-priority documentation errors discovered in factual accuracy audit: ## Critical Fixes - Fix incorrect package name (aiter → amd-aiter) in installation instructions - Replace non-working verification code with functional examples - Fix MOE quickstart example to use actual fmoe() API instead of non-existent grouped_gemm() ## Changes - docs/installation.rst: Update pip install command and verification code - docs/quickstart.rst: Replace grouped_gemm with working fmoe example - docs/DOCUMENTATION_AUDIT_REPORT.md: Add comprehensive audit findings ## Audit Summary Discovered 22 factual errors across documentation. This commit addresses the 3 highest-priority issues that would immediately block users. See DOCUMENTATION_AUDIT_REPORT.md for complete findings and recommendations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a Sphinx-based documentation site for AITER (API reference + tutorials) and introduces GitHub Actions automation for building and deploying the docs to GitHub Pages, alongside repo-size hygiene updates (gitignore + test-data download stub + cleanup plan).

Changes:

Introduce a Sphinx docs site scaffold (docs/), including API reference and tutorials.
Add a GitHub Actions workflow to build docs and deploy to GitHub Pages.
Add repo cleanup guidance and guardrails for large-file/test-data management (.gitignore, download script, cleanup plan).

Reviewed changes

Copilot reviewed 18 out of 21 changed files in this pull request and generated 24 comments.

Show a summary per file

File	Description
`.github/workflows/docs.yml`	CI job to build Sphinx docs and deploy HTML to GitHub Pages.
`.gitignore`	Adds patterns to prevent committing large test artifacts/data.
`REPO_CLEANUP_PLAN.md`	Documents planned repo history cleanup and migration steps.
`scripts/download_test_data.sh`	Placeholder script for externally-hosted test data downloads.
`docs/conf.py`	Sphinx configuration (extensions, theme, intersphinx).
`docs/index.rst`	Docs landing page + top-level navigation/toctrees.
`docs/installation.rst`	Installation guide and verification steps.
`docs/quickstart.rst`	“5 minute” quickstart examples across key features.
`docs/requirements.txt`	Python dependencies for building docs.
`docs/Makefile`	Sphinx build targets (html, livehtml, clean-all).
`docs/README.md`	Local build instructions and docs contribution guidance.
`docs/DEPLOYMENT.md`	Intended deployment process and operational notes.
`docs/DOCUMENTATION_AUDIT_REPORT.md`	Captures an accuracy audit and outstanding issues.
`docs/api/attention.rst`	Attention API reference via autodoc + narrative.
`docs/api/gemm.rst`	GEMM API reference via autodoc + narrative.
`docs/api/operators.rst`	Operator API reference via autodoc + narrative.
`docs/tutorials/index.rst`	Tutorials index + navigation/toctrees.
`docs/tutorials/basic_usage.rst`	Basic usage tutorial with code examples.
`docs/tutorials/add_new_op.rst`	Tutorial describing how to add a new operator.
`docs/_static/.gitkeep`	Keeps static asset directory tracked.
`docs/_templates/.gitkeep`	Keeps templates directory tracked.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T05:56:46Z

+
+   # Query with variable lengths per batch
+   query = torch.randn(5, 2048, 16, 64, device='cuda', dtype=torch.float16)
+
+   # Page table configuration (see tutorials for details)
+   page_table = torch.tensor([[0, 1, 2], [3, 4, 5]], device='cuda', dtype=torch.int32)
+
+   # KV cache in paged format
+   kv_cache = torch.randn(6, 16, 128, 64, device='cuda', dtype=torch.float16)
+
+   # Variable-length attention with page tables
+   output = aiter.flash_attn_with_kvcache(
+       query, kv_cache, page_table,


The quickstart's KV-cache example calls aiter.flash_attn_with_kvcache(query, kv_cache, page_table, block_size=...), but flash_attn_with_kvcache is not part of the public aiter namespace and the implementation that exists expects separate k_cache/v_cache (and uses block_table naming). As written, this snippet will raise AttributeError/TypeError; please update it to match the actual exported API or document the correct module path + signature.

Suggested change

# Query with variable lengths per batch

query = torch.randn(5, 2048, 16, 64, device='cuda', dtype=torch.float16)

# Page table configuration (see tutorials for details)

page_table = torch.tensor([[0, 1, 2], [3, 4, 5]], device='cuda', dtype=torch.int32)

# KV cache in paged format

kv_cache = torch.randn(6, 16, 128, 64, device='cuda', dtype=torch.float16)

# Variable-length attention with page tables

output = aiter.flash_attn_with_kvcache(

query, kv_cache, page_table,

from aiter.kvcache import flash_attn_with_kvcache

# Query with variable lengths per batch

query = torch.randn(5, 2048, 16, 64, device='cuda', dtype=torch.float16)

# Block table configuration (see tutorials for details)

block_table = torch.tensor([[0, 1, 2], [3, 4, 5]], device='cuda', dtype=torch.int32)

# K/V caches in paged format

k_cache = torch.randn(6, 16, 128, 64, device='cuda', dtype=torch.float16)

v_cache = torch.randn(6, 16, 128, 64, device='cuda', dtype=torch.float16)

# Variable-length attention with page tables

output = flash_attn_with_kvcache(

query, k_cache, v_cache, block_table,

Copilot · 2026-03-04T05:56:47Z

+.. autofunction:: aiter.rmsnorm
+
+Root Mean Square Layer Normalization, commonly used in LLMs like Llama.
+
+**Parameters:**
+
+* **x** (*torch.Tensor*) - Input tensor of shape ``(..., hidden_dim)``
+* **weight** (*torch.Tensor*) - Scaling weights of shape ``(hidden_dim,)``
+* **eps** (*float*, optional) - Epsilon for numerical stability. Default: ``1e-6``
+
+**Returns:**
+
+* **output** (*torch.Tensor*) - Normalized tensor with same shape as input
+


aiter.rmsnorm is an in-place style API that takes an out tensor first (rmsnorm(out, input, weight, epsilon)), but this doc section describes it as a functional API returning a new tensor and uses eps naming. Please update the description/examples to match the real signature (or document the correct functional wrapper if users shouldn't call this low-level op).

Copilot · 2026-03-04T05:56:47Z

+----
+
+.. autofunction:: aiter.gelu
+
+Fast GELU activation function.
+
+**Parameters:**
+
+* **x** (*torch.Tensor*) - Input tensor
+* **approximate** (*str*, optional) - Approximation method. Options: ``'none'``, ``'tanh'``. Default: ``'none'``
+
+**Returns:**
+
+* **output** (*torch.Tensor*) - GELU output
+
+**Example:**
+
+.. code-block:: python
+
+   import torch
+   import aiter
+
+   x = torch.randn(2, 1024, 4096, device='cuda', dtype=torch.float16)
+
+   # Exact GELU
+   output_exact = aiter.gelu(x)
+
+   # Fast approximate GELU
+   output_approx = aiter.gelu(x, approximate='tanh')
+
+SwiGLU
+------
+
+.. autofunction:: aiter.swiglu
+
+Swish-Gated Linear Unit activation.


aiter.gelu / aiter.swiglu are referenced here, but the exported activation APIs are gelu_and_mul / gelu_tanh_and_mul (and there is no top-level swiglu). These autodoc directives will fail and the examples won't run; please update this page to the actual available activation functions (or add wrappers matching the documented names).

Copilot · 2026-03-04T05:56:47Z

+Rotary Position Embedding (RoPE)
+---------------------------------
+
+.. autofunction:: aiter.apply_rotary_pos_emb
+
+Apply rotary position embeddings to query and key tensors.
+
+**Parameters:**
+
+* **q** (*torch.Tensor*) - Query tensor ``(batch, seq_len, num_heads, head_dim)``
+* **k** (*torch.Tensor*) - Key tensor ``(batch, seq_len, num_heads, head_dim)``
+* **cos** (*torch.Tensor*) - Cosine embeddings ``(seq_len, head_dim // 2)``
+* **sin** (*torch.Tensor*) - Sine embeddings ``(seq_len, head_dim // 2)``
+* **position_ids** (*torch.Tensor*, optional) - Position indices
+
+**Returns:**
+
+* **q_rot** (*torch.Tensor*) - Rotated query
+* **k_rot** (*torch.Tensor*) - Rotated key
+
+**Example:**
+
+.. code-block:: python
+
+   import torch
+   import aiter
+
+   seq_len, head_dim = 1024, 64
+   q = torch.randn(2, seq_len, 16, head_dim, device='cuda', dtype=torch.float16)
+   k = torch.randn(2, seq_len, 16, head_dim, device='cuda', dtype=torch.float16)
+
+   # Precompute RoPE embeddings
+   cos, sin = aiter.precompute_rope_embeddings(seq_len, head_dim)
+
+   # Apply rotation
+   q_rot, k_rot = aiter.apply_rotary_pos_emb(q, k, cos, sin)
+


This RoPE section documents aiter.apply_rotary_pos_emb and aiter.precompute_rope_embeddings, but those symbols don't exist in the aiter package (the exported APIs are rope_* functions). Sphinx autodoc will fail and the example code will error; please replace these with the real RoPE API entry points or add wrappers with the documented names.

Copilot · 2026-03-04T05:56:48Z

+      - name: Deploy to GitHub Pages
+        uses: peaceiris/actions-gh-pages@v3
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_dir: ./html
+          cname: doc.aiter.amd.com  # Custom domain
+          commit_message: 'docs: deploy documentation'
+


The deployment step uses peaceiris/actions-gh-pages@v3, which is an old major version and may be pinned to deprecated Node runtimes. Consider upgrading to the latest supported major version and pinning the action by full commit SHA to reduce supply-chain risk.

Copilot · 2026-03-04T05:56:52Z

+   # Router logits and expert selection
+   router_logits = torch.randn(num_tokens, num_experts, device='cuda', dtype=torch.float16)
+
+   # Fused MOE operation (gate + up projection + down projection)
+   output = aiter.fmoe(
+       x, w1, w2, router_logits,
+       topk=top_k,
+       renormalize=True
+   )
+


The MoE example uses aiter.fmoe(...) as if it returns an output tensor and accepts (x, w1, w2, router_logits, topk=..., renormalize=...), but the exported aiter.fmoe is a low-level in-place op that requires an out tensor plus precomputed sorted routing metadata. This example won't run; consider either switching to the high-level MoE API (if any) or expanding the example to show how to compute topk_weights/topk_ids and the required sorted indices before calling the kernel.

Suggested change

# Router logits and expert selection

router_logits = torch.randn(num_tokens, num_experts, device='cuda', dtype=torch.float16)

# Fused MOE operation (gate + up projection + down projection)

output = aiter.fmoe(

x, w1, w2, router_logits,

topk=top_k,

renormalize=True

)

# Router logits

router_logits = torch.randn(num_tokens, num_experts, device='cuda', dtype=torch.float16)

# Compute routing probabilities and top-k experts per token

router_probs = torch.softmax(router_logits, dim=-1)

topk_weights, topk_ids = torch.topk(router_probs, k=top_k, dim=-1)

# Flatten and sort by expert id to form grouped GEMM batches

flat_expert_ids = topk_ids.reshape(-1) # [num_tokens * top_k]

flat_token_ids = torch.arange(num_tokens, device=x.device).repeat_interleave(top_k)

sorted_expert_ids, sort_indices = torch.sort(flat_expert_ids, stable=True)

sorted_token_ids = flat_token_ids[sort_indices]

# Allocate output buffer for the low-level fused MoE kernel

output = torch.empty_like(x)

# Low-level fused MoE operation (gate + up projection + down projection).

# NOTE: aiter.fmoe is an in-place kernel that writes into `output` and expects

# precomputed routing metadata. Refer to the AITER documentation for the exact

# argument order. A typical call structure looks like:

#

# aiter.fmoe(

# x,

# w1,

# w2,

# topk_weights,

# topk_ids,

# sorted_token_ids,

# sorted_expert_ids,

# out=output,

# )

Copilot · 2026-03-04T05:56:52Z

+Grouped Query Attention (GQA)
+------------------------------
+
+.. autofunction:: aiter.grouped_query_attention
+
+Efficient grouped query attention for models like Llama 2.
+
+**Parameters:**
+
+* **query** (*torch.Tensor*) - ``(batch, seq_len, num_q_heads, head_dim)``
+* **key** (*torch.Tensor*) - ``(batch, seq_len, num_kv_heads, head_dim)``
+* **value** (*torch.Tensor*) - ``(batch, seq_len, num_kv_heads, head_dim)``
+* **num_groups** (*int*) - Number of query heads per KV head
+* **causal** (*bool*, optional) - Causal masking. Default: ``False``
+
+**Returns:**
+
+* **output** (*torch.Tensor*) - ``(batch, seq_len, num_q_heads, head_dim)``
+
+Multi-Query Attention (MQA)
+----------------------------
+
+.. autofunction:: aiter.multi_query_attention
+
+Multi-query attention where all query heads share single key/value heads.
+
+**Parameters:**
+
+* **query** (*torch.Tensor*) - ``(batch, seq_len, num_heads, head_dim)``
+* **key** (*torch.Tensor*) - ``(batch, seq_len, 1, head_dim)``
+* **value** (*torch.Tensor*) - ``(batch, seq_len, 1, head_dim)``
+* **causal** (*bool*, optional) - Causal masking. Default: ``False``
+
+**Returns:**
+
+* **output** (*torch.Tensor*) - ``(batch, seq_len, num_heads, head_dim)``
+
+Variable Sequence Attention
+----------------------------
+
+.. autofunction:: aiter.variable_length_attention
+
+Attention with variable-length sequences using page tables.
+


These attention variants (grouped_query_attention, multi_query_attention, variable_length_attention) are documented but don't exist anywhere in the aiter Python package. Sphinx autodoc will fail and users will hit AttributeError; either remove these sections or rename them to the actual available APIs (e.g. flash_attn_varlen_func, paged-attention helpers, etc.).

Copilot · 2026-03-04T05:56:53Z

+.. autofunction:: aiter.layernorm
+


aiter.layernorm is referenced for autodoc, but there is no layernorm function exported at the top level (only layernorm2d_* variants exist). This will break Sphinx autodoc and mislead users; please update the docs to reference the actual exported LayerNorm API or add a real layernorm wrapper in the library.

Suggested change

.. autofunction:: aiter.layernorm

Copilot · 2026-03-04T05:56:53Z

+       # Import the C++ extension
+       from aiter._C import my_custom_op_impl
+
+       # Input validation
+       assert input.is_cuda, "Input must be on CUDA device"
+       assert input.dtype in [torch.float16, torch.bfloat16], \
+           "Only FP16/BF16 supported"
+
+       # Call C++ implementation
+       return my_custom_op_impl(input, weight, bias, activation)


The tutorial suggests importing my_custom_op_impl from aiter._C, but this repository doesn't define an aiter._C extension module. Please update the tutorial to match AITER's actual JIT/compile_ops workflow for building and loading custom kernels, or clearly state that the snippet is illustrative only (and not expected to run as-is).

Suggested change

# Import the C++ extension

from aiter._C import my_custom_op_impl

# Input validation

assert input.is_cuda, "Input must be on CUDA device"

assert input.dtype in [torch.float16, torch.bfloat16], \

"Only FP16/BF16 supported"

# Call C++ implementation

return my_custom_op_impl(input, weight, bias, activation)

# NOTE: This is illustrative. The actual implementation of

# `my_custom_op_impl` depends on how you build and load your

# custom kernel using AITER's JIT/`compile_ops` workflow

# described in Step 3: Create Python Bindings.

# Input validation

assert input.is_cuda, "Input must be on CUDA device"

assert input.dtype in [torch.float16, torch.bfloat16], \

"Only FP16/BF16 supported"

# Call into the compiled implementation (to be provided by your

# bindings; see Step 3 of this tutorial for details).

raise NotImplementedError(

"my_custom_op_impl is provided by the custom kernel bindings; "

"see 'Step 3: Create Python Bindings' for how to wire this up."

)

Copilot · 2026-03-04T05:56:53Z

+deploy-to-amd:
+  needs: build-docs
+  runs-on: ubuntu-latest
+  if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+
+  steps:
+    - name: Download documentation artifacts
+      uses: actions/download-artifact@v4
+      with:
+        name: documentation
+        path: ./html
+
+    - name: Deploy to AMD doc server
+      uses: easingthemes/ssh-deploy@v4
+      with:
+        SSH_PRIVATE_KEY: ${{ secrets.AMD_DOC_SERVER_KEY }}
+        REMOTE_HOST: doc.aiter.amd.com
+        REMOTE_USER: deploy
+        SOURCE: "html/"
+        TARGET: "/var/www/doc.aiter.amd.com/html"


This deployment guide recommends using the third-party GitHub Action easingthemes/ssh-deploy@v4 with a private SSH key passed via SSH_PRIVATE_KEY, but pins it only to the mutable v4 tag. If that action or its supply chain is compromised, an attacker could use your deployment key to modify content on doc.aiter.amd.com or pivot further into your infrastructure. To mitigate this, pin the action to a specific commit SHA (and update it deliberately over time) or replace it with a first-party/organization-maintained deployment mechanism that you control more directly.

sunway513 · 2026-03-09T01:36:29Z

Hi @valarLip @gyohuangxin — friendly ping for review on this documentation website PR. This adds Sphinx-based docs with GitHub Actions auto-deployment to GitHub Pages, covering API reference and tutorials. Would appreciate your review when you get a chance. Thanks!

gyohuangxin

docs.yml should only trigger when docs-related files are modified. Please add path filters:

on:
  push:
    branches:
      - main
      - docs-website
    paths:
      - 'docs/**'
      - '.github/workflows/docs.yml'
  pull_request:
    branches:
      - main
    paths:
      - 'docs/**'
      - '.github/workflows/docs.yml'
  workflow_dispatch:

This avoids unnecessary CI runs when non-doc files are changed.

gyohuangxin · 2026-03-09T02:47:34Z

@sunway513 Could you also fix code style issues listed here: https://github.com/ROCm/aiter/actions/runs/22835735370/job/66231715645?pr=2167

@gyohuangxin

Only trigger docs build/deploy when docs/** or the workflow file itself is modified, avoiding unnecessary CI runs on non-doc changes. Addresses review feedback from @gyohuangxin.

Replace single quotes with double quotes and add trailing commas to pass CI code style check.

@gyohuangxin

#2167) * Prepare repository for size optimization This commit introduces safeguards and documentation to prepare for a major repository cleanup that will reduce the repo size from 547 MB to ~130 MB (76% reduction). Changes: - Enhanced .gitignore to prevent large files (test data, build artifacts) - Created test data download script framework - Documented cleanup plan and migration process The actual history cleanup will be performed separately during a scheduled maintenance window, requiring all contributors to re-clone. See REPO_CLEANUP_PLAN.md for full details. Impact: No immediate changes to functionality. Protective measures only. * Address Copilot code review feedback - Fix migration steps to preserve local changes using patch files instead of git stash - Update size reduction numbers to match actual test results (105MB vs aspirational 50MB) - Clarify that pre-commit hook for size checks is not included (to avoid conflict with existing hook) - Update hook installation instructions to align with existing CONTRIBUTE.md workflow - Fix test data download script to exit with error code when unconfigured - Remove references to non-existent files (paths_to_remove.txt, aiter_cleanup_results.md) All changes address feedback from Copilot code review. * docs: add documentation website Add comprehensive Sphinx-based documentation website for AITER. Features: - Installation guide with 3 installation methods - Quick start tutorial with runnable examples - API reference for attention, GEMM, and operators - Basic usage tutorial with performance comparisons - Configuration for doc.aiter.amd.com hosting Structure: - docs/conf.py: Sphinx configuration with AMD branding - docs/index.rst: Main documentation landing page - docs/installation.rst: Detailed installation instructions - docs/quickstart.rst: 5-minute getting started guide - docs/api/: Complete API reference documentation - docs/tutorials/: Hands-on tutorials with code examples The documentation can be built locally with: cd docs && pip install -r requirements.txt && make html This brings AITER documentation quality on par with FlashInfer. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci: add GitHub Actions workflow for documentation Add automated build and deployment workflow for Sphinx documentation. Features: - Automatic build on push to docs-website and main branches - Deploys to GitHub Pages via gh-pages branch - Build artifacts available for PR previews - Uses sphinx-build with all extensions - Caches pip dependencies for faster builds Workflow: 1. Checkout code 2. Install Python and dependencies 3. Build Sphinx HTML documentation 4. Upload build artifacts 5. Deploy to gh-pages branch (on push) Documentation will be available at: https://sunway513.github.io/aiter/ Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci: trigger documentation workflow * fix: trigger workflow on docs-website branch * fix: add missing _static and _templates directories for Sphinx * fix: simplify Sphinx build to avoid treating warnings as errors * docs: add comprehensive 'How to Add a New Operator' tutorial Add detailed step-by-step guide for adding custom operators to AITER. Features: - Complete workflow from Python interface to ROCm kernel - Real code examples for each step - PyBind11 bindings setup - Testing and benchmarking guidelines - Best practices and debugging tips - Complete RMSNorm example as reference This addresses team feedback: "搞个how to add new op之类的就完美了" Includes: - Step 1: Define operator interface (Python) - Step 2: Implement ROCm/HIP kernel - Step 3: Create PyBind11 bindings - Step 4: Update build configuration - Step 5: Add comprehensive tests - Step 6: Build and install - Step 7: Register in main module Advanced topics: - CK (Composable Kernel) integration - Triton kernel development - Fused operations pattern - In-place operations - Autograd support for training Also updated: - docs/index.rst: Added Quick Links section highlighting the tutorial - docs/tutorials/index.rst: Added to Advanced Topics section Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * docs: fix critical factual errors in documentation Fix high-priority documentation errors discovered in factual accuracy audit: ## Critical Fixes - Fix incorrect package name (aiter → amd-aiter) in installation instructions - Replace non-working verification code with functional examples - Fix MOE quickstart example to use actual fmoe() API instead of non-existent grouped_gemm() ## Changes - docs/installation.rst: Update pip install command and verification code - docs/quickstart.rst: Replace grouped_gemm with working fmoe example - docs/DOCUMENTATION_AUDIT_REPORT.md: Add comprehensive audit findings ## Audit Summary Discovered 22 factual errors across documentation. This commit addresses the 3 highest-priority issues that would immediately block users. See DOCUMENTATION_AUDIT_REPORT.md for complete findings and recommendations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci: add path filters to docs workflow Only trigger docs build/deploy when docs/** or the workflow file itself is modified, avoiding unnecessary CI runs on non-doc changes. Addresses review feedback from @gyohuangxin. * fix: apply black formatting to docs/conf.py Replace single quotes with double quotes and add trailing commas to pass CI code style check. --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Xin Huang <Xin.Huang@amd.com> Co-authored-by: valarLip <340077269@qq.com>

@gyohuangxin

ROCm#2167) * Prepare repository for size optimization This commit introduces safeguards and documentation to prepare for a major repository cleanup that will reduce the repo size from 547 MB to ~130 MB (76% reduction). Changes: - Enhanced .gitignore to prevent large files (test data, build artifacts) - Created test data download script framework - Documented cleanup plan and migration process The actual history cleanup will be performed separately during a scheduled maintenance window, requiring all contributors to re-clone. See REPO_CLEANUP_PLAN.md for full details. Impact: No immediate changes to functionality. Protective measures only. * Address Copilot code review feedback - Fix migration steps to preserve local changes using patch files instead of git stash - Update size reduction numbers to match actual test results (105MB vs aspirational 50MB) - Clarify that pre-commit hook for size checks is not included (to avoid conflict with existing hook) - Update hook installation instructions to align with existing CONTRIBUTE.md workflow - Fix test data download script to exit with error code when unconfigured - Remove references to non-existent files (paths_to_remove.txt, aiter_cleanup_results.md) All changes address feedback from Copilot code review. * docs: add documentation website Add comprehensive Sphinx-based documentation website for AITER. Features: - Installation guide with 3 installation methods - Quick start tutorial with runnable examples - API reference for attention, GEMM, and operators - Basic usage tutorial with performance comparisons - Configuration for doc.aiter.amd.com hosting Structure: - docs/conf.py: Sphinx configuration with AMD branding - docs/index.rst: Main documentation landing page - docs/installation.rst: Detailed installation instructions - docs/quickstart.rst: 5-minute getting started guide - docs/api/: Complete API reference documentation - docs/tutorials/: Hands-on tutorials with code examples The documentation can be built locally with: cd docs && pip install -r requirements.txt && make html This brings AITER documentation quality on par with FlashInfer. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci: add GitHub Actions workflow for documentation Add automated build and deployment workflow for Sphinx documentation. Features: - Automatic build on push to docs-website and main branches - Deploys to GitHub Pages via gh-pages branch - Build artifacts available for PR previews - Uses sphinx-build with all extensions - Caches pip dependencies for faster builds Workflow: 1. Checkout code 2. Install Python and dependencies 3. Build Sphinx HTML documentation 4. Upload build artifacts 5. Deploy to gh-pages branch (on push) Documentation will be available at: https://sunway513.github.io/aiter/ Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci: trigger documentation workflow * fix: trigger workflow on docs-website branch * fix: add missing _static and _templates directories for Sphinx * fix: simplify Sphinx build to avoid treating warnings as errors * docs: add comprehensive 'How to Add a New Operator' tutorial Add detailed step-by-step guide for adding custom operators to AITER. Features: - Complete workflow from Python interface to ROCm kernel - Real code examples for each step - PyBind11 bindings setup - Testing and benchmarking guidelines - Best practices and debugging tips - Complete RMSNorm example as reference This addresses team feedback: "搞个how to add new op之类的就完美了" Includes: - Step 1: Define operator interface (Python) - Step 2: Implement ROCm/HIP kernel - Step 3: Create PyBind11 bindings - Step 4: Update build configuration - Step 5: Add comprehensive tests - Step 6: Build and install - Step 7: Register in main module Advanced topics: - CK (Composable Kernel) integration - Triton kernel development - Fused operations pattern - In-place operations - Autograd support for training Also updated: - docs/index.rst: Added Quick Links section highlighting the tutorial - docs/tutorials/index.rst: Added to Advanced Topics section Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * docs: fix critical factual errors in documentation Fix high-priority documentation errors discovered in factual accuracy audit: ## Critical Fixes - Fix incorrect package name (aiter → amd-aiter) in installation instructions - Replace non-working verification code with functional examples - Fix MOE quickstart example to use actual fmoe() API instead of non-existent grouped_gemm() ## Changes - docs/installation.rst: Update pip install command and verification code - docs/quickstart.rst: Replace grouped_gemm with working fmoe example - docs/DOCUMENTATION_AUDIT_REPORT.md: Add comprehensive audit findings ## Audit Summary Discovered 22 factual errors across documentation. This commit addresses the 3 highest-priority issues that would immediately block users. See DOCUMENTATION_AUDIT_REPORT.md for complete findings and recommendations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci: add path filters to docs workflow Only trigger docs build/deploy when docs/** or the workflow file itself is modified, avoiding unnecessary CI runs on non-doc changes. Addresses review feedback from @gyohuangxin. * fix: apply black formatting to docs/conf.py Replace single quotes with double quotes and add trailing commas to pass CI code style check. --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Xin Huang <Xin.Huang@amd.com> Co-authored-by: valarLip <340077269@qq.com>

sunway513 and others added 10 commits February 14, 2026 02:37

ci: trigger documentation workflow

193b6a7

fix: trigger workflow on docs-website branch

da79808

fix: add missing _static and _templates directories for Sphinx

db5f1fa

fix: simplify Sphinx build to avoid treating warnings as errors

74922ef

sunway513 requested review from a team and Copilot March 4, 2026 05:44

Copilot started reviewing on behalf of sunway513 March 4, 2026 05:45 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

sunway513 mentioned this pull request Mar 9, 2026

Documentation Websites for OSS Projects sunway513/aiter#53

Closed

Merge branch 'main' into docs-website

fac2e7e

gyohuangxin requested changes Mar 9, 2026

View reviewed changes

sunway513 and others added 2 commits March 9, 2026 14:44

ci: add path filters to docs workflow

25f110a

Only trigger docs build/deploy when docs/** or the workflow file itself is modified, avoiding unnecessary CI runs on non-doc changes. Addresses review feedback from @gyohuangxin.

fix: apply black formatting to docs/conf.py

0c2869a

Replace single quotes with double quotes and add trailing commas to pass CI code style check.

valarLip merged commit fa38683 into ROCm:main Mar 13, 2026
22 of 25 checks passed

-   # Router logits and expert selection
-   router_logits = torch.randn(num_tokens, num_experts, device='cuda', dtype=torch.float16)
-   # Fused MOE operation (gate + up projection + down projection)
-   output = aiter.fmoe(
-       x, w1, w2, router_logits,
-       topk=top_k,
-       renormalize=True
-   )
+   # Router logits
+   router_logits = torch.randn(num_tokens, num_experts, device='cuda', dtype=torch.float16)
+   # Compute routing probabilities and top-k experts per token
+   router_probs = torch.softmax(router_logits, dim=-1)
+   topk_weights, topk_ids = torch.topk(router_probs, k=top_k, dim=-1)
+   # Flatten and sort by expert id to form grouped GEMM batches
+   flat_expert_ids = topk_ids.reshape(-1)              # [num_tokens * top_k]
+   flat_token_ids = torch.arange(num_tokens, device=x.device).repeat_interleave(top_k)
+   sorted_expert_ids, sort_indices = torch.sort(flat_expert_ids, stable=True)
+   sorted_token_ids = flat_token_ids[sort_indices]
+   # Allocate output buffer for the low-level fused MoE kernel
+   output = torch.empty_like(x)
+   # Low-level fused MoE operation (gate + up projection + down projection).
+   # NOTE: aiter.fmoe is an in-place kernel that writes into `output` and expects
+   # precomputed routing metadata. Refer to the AITER documentation for the exact
+   # argument order. A typical call structure looks like:
+   #
+   # aiter.fmoe(
+   #     x,
+   #     w1,
+   #     w2,
+   #     topk_weights,
+   #     topk_ids,
+   #     sorted_token_ids,
+   #     sorted_expert_ids,
+   #     out=output,
+   # )

Conversation

sunway513 commented Mar 4, 2026

Summary

Files Added

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

sunway513 commented Mar 9, 2026

Uh oh!

gyohuangxin left a comment

Choose a reason for hiding this comment

Uh oh!

gyohuangxin commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants