Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
cdb1193
quick install script on top of MLIR docker image
learning-chip May 13, 2026
217dc2a
add reference result for top->vop expansion
learning-chip May 13, 2026
c5b540d
low-level python binding example to generate vpto IR
learning-chip May 13, 2026
8697f5e
initial prototype of high-level dsl builder api
learning-chip May 13, 2026
cf4ece0
initial prototype of softmax IR builder
learning-chip May 13, 2026
de75cad
script to check IR equal
learning-chip May 13, 2026
8d1a834
avoid raw MLIR `Type.parse`
learning-chip May 13, 2026
8dc1a4a
more Pythonic builder style suggestions
learning-chip May 13, 2026
63e590a
major refactor of dsl syntax and impl
learning-chip May 13, 2026
1ac8d0d
[vpto] Add ptodsl tracing POC
Zhendong404 May 14, 2026
2c2cf6d
[vpto] Allow structured loops without vecscope
Zhendong404 May 14, 2026
16303bd
Add user guides
Zhendong404 May 14, 2026
60f4c6a
Add a flash attention demo
Zhendong404 May 15, 2026
d094fa2
Completed the first version of PTODSL user guide
Zhendong404 May 15, 2026
d8db04e
Complete the mlir text emission of the FA demo
Zhendong404 May 18, 2026
78c4cf8
pip install ptoas
castigli May 19, 2026
8de1968
use pip install in CI (#385)
castigli May 23, 2026
36bb9c5
feature(ptodsl): align ptodsl implementation with user guide
Zhendong404 May 19, 2026
f17c5c7
chore(ptodsl): normalize docs test headers
Zhendong404 May 21, 2026
ac4d5ff
python builder to reproduce tilelang_st/tadd.pto
learning-chip May 21, 2026
84e6f48
Switch to new kernel surface
Zhendong404 May 22, 2026
5ba043d
Clean up the pending docs-as-test in the user guide
Zhendong404 May 23, 2026
ade5bf7
Clarify the pto.jit kernel signature
Zhendong404 May 23, 2026
c236b24
Refine the online softmax demo
Zhendong404 May 25, 2026
d5eba97
Support ptr plus scalar JIT entries
jimmychou0 May 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 2 additions & 16 deletions .github/workflows/build_wheel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -156,22 +156,8 @@ jobs:
- name: Build PTOAS
run: |
export PATH="${PY_PATH}/bin:$PATH"
cd $PTO_SOURCE_DIR
cmake -C "$PTO_SOURCE_DIR/cmake/LinuxHardeningCache.cmake" -G Ninja \
-S . \
-B build \
-DLLVM_DIR=$LLVM_BUILD_DIR/lib/cmake/llvm \
-DMLIR_DIR=$LLVM_BUILD_DIR/lib/cmake/mlir \
-DPython3_ROOT_DIR=${PY_PATH} \
-DPython3_EXECUTABLE=${PY_PATH}/bin/python \
-DPython3_FIND_STRATEGY=LOCATION \
-Dpybind11_DIR=$(${PY_PATH}/bin/python -m pybind11 --cmakedir) \
-DMLIR_PYTHON_PACKAGE_DIR=${LLVM_BUILD_DIR}/tools/mlir/python_packages/mlir_core \
-DPTOAS_RELEASE_VERSION_OVERRIDE=${PTOAS_VERSION} \
-DCMAKE_INSTALL_PREFIX=${PTO_INSTALL_DIR} \
-DCMAKE_BUILD_TYPE=Release
ninja -C build
ninja -C build install
PTOAS_RELEASE_VERSION_OVERRIDE="${PTOAS_VERSION}" \
pip install . --no-build-isolation

- name: Create Python wheel
if: false
Expand Down
18 changes: 2 additions & 16 deletions .github/workflows/build_wheel_mac.yml
Original file line number Diff line number Diff line change
Expand Up @@ -154,22 +154,8 @@ jobs:

- name: Build PTOAS
run: |
cd $PTO_SOURCE_DIR
cmake -G Ninja \
-S . \
-B build \
-DLLVM_DIR=$LLVM_BUILD_DIR/lib/cmake/llvm \
-DMLIR_DIR=$LLVM_BUILD_DIR/lib/cmake/mlir \
-DPython3_ROOT_DIR=${PY_PATH} \
-DPython3_EXECUTABLE=$(which python) \
-DPython3_FIND_STRATEGY=LOCATION \
-Dpybind11_DIR=$(python -m pybind11 --cmakedir) \
-DMLIR_PYTHON_PACKAGE_DIR=${LLVM_BUILD_DIR}/tools/mlir/python_packages/mlir_core \
-DPTOAS_RELEASE_VERSION_OVERRIDE=${PTOAS_VERSION} \
-DCMAKE_INSTALL_PREFIX=${PTO_INSTALL_DIR} \
-DCMAKE_BUILD_TYPE=Release
ninja -C build
ninja -C build install
PTOAS_RELEASE_VERSION_OVERRIDE="${PTOAS_VERSION}" \
pip install . --no-build-isolation

- name: Create Python wheel
if: false
Expand Down
30 changes: 5 additions & 25 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -200,20 +200,9 @@ jobs:

- name: Build PTOAS
run: |
export PYBIND11_CMAKE_DIR="$(python3 -m pybind11 --cmakedir)"
cmake -C "${GITHUB_WORKSPACE}/cmake/LinuxHardeningCache.cmake" -G Ninja -S . -B build \
-DLLVM_DIR="${LLVM_DIR}/lib/cmake/llvm" \
-DMLIR_DIR="${LLVM_DIR}/lib/cmake/mlir" \
-DPython3_EXECUTABLE=python3 \
-DPython3_FIND_STRATEGY=LOCATION \
-Dpybind11_DIR="${PYBIND11_CMAKE_DIR}" \
-DMLIR_ENABLE_BINDINGS_PYTHON=ON \
-DMLIR_PYTHON_PACKAGE_DIR="${LLVM_DIR}/tools/mlir/python_packages/mlir_core" \
-DCMAKE_INSTALL_PREFIX="${PTO_INSTALL_DIR}" \
-DCMAKE_BUILD_TYPE=Release
ninja -C build ptoas
ninja -C build ptobc
ninja -C build install
# LLVM_BUILD_DIR is the env var read by the build backend (_ptoas_build_backend.py).
# PTO_INSTALL_DIR is already set at the job level.
LLVM_BUILD_DIR="${LLVM_DIR}" pip install . --no-build-isolation

- name: Run lit tests
shell: bash
Expand Down Expand Up @@ -398,17 +387,8 @@ jobs:
shell: bash
run: |
set -euo pipefail
export PYBIND11_CMAKE_DIR="$(python3 -m pybind11 --cmakedir)"
cmake -G Ninja -S . -B build \
-DLLVM_DIR="${LLVM_DIR}/lib/cmake/llvm" \
-DMLIR_DIR="${LLVM_DIR}/lib/cmake/mlir" \
-DPython3_EXECUTABLE=python3 \
-DPython3_FIND_STRATEGY=LOCATION \
-Dpybind11_DIR="${PYBIND11_CMAKE_DIR}" \
-DMLIR_ENABLE_BINDINGS_PYTHON=ON \
-DMLIR_PYTHON_PACKAGE_DIR="${LLVM_DIR}/tools/mlir/python_packages/mlir_core" \
-DCMAKE_BUILD_TYPE=Release
ninja -C build ptoas
# LLVM_BUILD_DIR is the env var read by the build backend (_ptoas_build_backend.py).
LLVM_BUILD_DIR="${LLVM_DIR}" pip install . --no-build-isolation

- name: Resolve simulator environment
shell: bash
Expand Down
15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
# Build artifacts
build/
build_plain/
build_plan/
install/

# TileLang ST standalone build outputs (see temp_docs/standalone_st.md)
test/tilelang_st/npu/a5/src/st/build/
test/tilelang_st/npu/a5/src/st/build_plain/
test/tilelang_st/npu/a5/src/st/build_plan/
cmake-build-*/
CMakeFiles/
CMakeCache.txt
Expand Down Expand Up @@ -49,13 +56,21 @@ venv/
dist/

# Logs/temp
tmp/
*.log
*.tmp
*.swp
*.swo
.cache/
.pytest_cache/

# PTODSL JIT / msprof simulator artifacts
.ptodsl_jit/
.ptodsl_cache/
msprof_res/
ptodsl/examples/jit/.cache/
ptodsl/examples/jit/msprof_res/

# Remote/NPU validation artifacts
/payload/
/payload.tgz
Expand Down
236 changes: 236 additions & 0 deletions README_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
# ptoas (PTO Assembler & Optimizer)

## 1. Introduction

**ptoas** is a specialized compiler toolchain built on top of **LLVM/MLIR (llvmorg-19.1.7)** *(Commit cd708029e0b2869e80abe31ddb175f7c35361f90)*, designed specifically for **PTO Bytecode** (Programming Tiling Operator Bytecode).

Acting as the bridge between upper-level AI frameworks and underlying NPU/GPGPU/CPU hardware, `ptoas` is built in an **Out-of-Tree** architecture and provides complete C++ and Python interfaces. Its primary responsibilities include:

1. **IR Parsing & Verification**: Parses `.pto` input files and verifies the semantic correctness of PTO Dialect operations (Ops).
2. **Compilation & Optimization (Passes)**: Executes optimization passes targeting the Da Vinci Architecture, such as operator fusion and automatic synchronization insertion.
3. **Code Generation (Lowering)**: Supports lowering PTO IR to `EmitC` / `Linalg` dialects, ultimately generating code that calls the `pto-isa` C++ library.
4. **Python Bindings**: Provides seamlessly integrated Python modules. Through integration with MLIR Core bindings, frameworks such as **PyPTO**, **TileLang**, and **CuTile** can build, manipulate, and compile PTO Bytecode directly from Python.

---

## 2. Directory Structure

```text
PTOAS/
├── include/
│ └── PTO/ # PTO Dialect headers and TableGen definitions (.td)
├── lib/
│ ├── PTO/ # Dialect core implementation (IR) and Pass logic (Transforms)
│ ├── CAPI/ # C language interface exposure
│ └── Bindings/Python/ # Python Binding C++ implementation (Pybind11)
├── python/ # Python module build scripts and helper code
├── test/
│ └── samples/ # Test cases
├── tools/
│ ├── ptoas/ # ptoas command-line tool entry point (Output: ptoas)
│ └── ptobc/ # ptobc command-line tool entry point (Output: ptobc)
└── CMakeLists.txt # Top-level build configuration
```

---

## 3. Build Instructions

⚠️ **Important**: This project strictly requires **LLVM llvmorg-19.1.7**.

### 3.0 Environment Variable Configuration

To simplify the build process, **first modify and run the following commands according to your environment**. Subsequent steps reference these variables directly.

```bash
# ================= Configuration (edit here) =================
# Set your workspace root directory
# (recommended: a dedicated directory for LLVM and PTOAS)
export WORKSPACE_DIR=$HOME/llvm-workspace

# LLVM source and build paths
export LLVM_SOURCE_DIR=$WORKSPACE_DIR/llvm-project
export LLVM_BUILD_DIR=$LLVM_SOURCE_DIR/build-shared

# PTOAS source and install paths
export PTO_SOURCE_DIR=$WORKSPACE_DIR/PTOAS
export PTO_INSTALL_DIR=$PTO_SOURCE_DIR/install
# =============================================================

# Create the workspace directory
mkdir -p $WORKSPACE_DIR
```

### 3.1 Prerequisites

* **OS**: Linux (Ubuntu 20.04+ recommended)
* **Compiler**: GCC >= 9 or Clang (C++17 support required)
* **Build System**: CMake >= 3.20, Ninja
* **Python**: 3.8+
* **Python Packages**: `pybind11`, `numpy`

```bash
python3 -m pip install pybind11==2.12.0 numpy
```

> **Note**: The current LLVM/MLIR Python bindings are not compatible with `pybind11` 3.x.
> If you encounter errors like `def_property family does not currently support keep_alive`
> when building LLVM, run the downgrade command above first.

### 3.2 Step 1: Build LLVM/MLIR (Dependency)

Download the LLVM source, check out the `llvmorg-19.1.7` tag, and build with **shared libraries** to ensure correct linking for Python bindings.

```bash
# 1. Clone LLVM
cd $WORKSPACE_DIR
git clone https://github.com/llvm/llvm-project.git
cd $LLVM_SOURCE_DIR

# 2. [Critical] Check out llvmorg-19.1.7
git checkout llvmorg-19.1.7

# 3. Configure CMake (build shared libs with Python bindings enabled)
cmake -G Ninja -S llvm -B $LLVM_BUILD_DIR \
-DLLVM_ENABLE_PROJECTS="mlir;clang" \
-DBUILD_SHARED_LIBS=ON \
-DMLIR_ENABLE_BINDINGS_PYTHON=ON \
-DPython3_EXECUTABLE=$(which python3) \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_TARGETS_TO_BUILD="host"

# 4. Build LLVM (this step takes a long time)
ninja -C $LLVM_BUILD_DIR
```

### 3.3 Step 2: Build PTOAS (Out-of-Tree)

Clone the PTOAS source and build against the LLVM 19 you just compiled.

```bash
# 1. Clone PTOAS
cd $WORKSPACE_DIR
git clone https://gitcode.com/cann/pto-as.git PTOAS
cd $PTO_SOURCE_DIR

# 2. Build and install via pip
# The build backend (pyproject.toml) drives CMake + Ninja automatically.
pip install .
```

This produces the same artifacts as a manual CMake build:

```text
# CLI tools
$PTO_SOURCE_DIR/build/tools/ptoas/ptoas
$PTO_SOURCE_DIR/build/tools/ptobc/ptobc

# Native extension installed into the MLIR Python package
$LLVM_BUILD_DIR/tools/mlir/python_packages/mlir_core/
└── mlir
└── _mlir_libs
└── _pto.cpython-*.so

# Python dialect files
$PTO_INSTALL_DIR/
└── mlir
└── dialects
├── pto.py
└── _pto_ops_gen.py
```

### 3.4 Step 3: Python Editable Install (Optional, for Python development)

If you want to develop and test Python code against the in-tree build without reinstalling after every C++ change, use an **editable install**.

```bash
pip install -e . --no-build-isolation
```

> **Why `--no-build-isolation`?** Without this flag, pip uses a temporary virtual environment for the build, records its pybind11 path in `CMakeCache.txt`, then deletes the venv — breaking any subsequent `ninja` reconfigure.

If you previously ran `pip install -e .` without the flag and your build is now broken, fix the existing `CMakeCache.txt` with:

```bash
cmake -B build -Dpybind11_DIR=$(python3 -m pybind11 --cmakedir)
```

---

## 4. Usage

### 4.1 Command-Line Interface (CLI)

```bash
# Parse and print PTO IR
ptoas test/lit/pto/empty_func.pto

# Run the AutoSyncInsert pass
ptoas test/lit/pto/empty_func.pto --enable-insert-sync -o outputfile.cpp

# Specify target hardware architecture (A3 / A5)
ptoas test/lit/pto/empty_func.pto --pto-arch=a5 -o outputfile.cpp

# Specify build level (level3 disables PlanMemory/InsertSync)
ptoas test/lit/pto/empty_func.pto --pto-level=level3 -o outputfile.cpp

# Print the current ptoas release version
ptoas --version
```

### 4.2 Python API

After configuring the environment variables, the PTO Dialect is loaded as part of `mlir.dialects`.

```python
from mlir.ir import Context, Module, Location
# [Key] Import pto from mlir.dialects — the standard pattern for out-of-tree bindings
from mlir.dialects import pto

with Context() as ctx, Location.unknown():
pto.register_dialect(ctx, load=True)
module = Module.create()
print("PTO Dialect registered successfully!")
```

### 4.3 Running Tests

```bash
# Run Python binding tests
cd $PTO_SOURCE_DIR/test/samples/MatMul/
python3 ./tmatmulk.py > ./tmatmulk.pto

# Run ptoas tests
$PTO_SOURCE_DIR/build/tools/ptoas/ptoas ./tmatmulk.pto -o ./tmatmulk.cpp
```

### 4.4 On-Board Validation

This flow generates NPU validation test cases from the `.cpp` files produced by ptoas (under `test/samples/`) and runs them on an NPU. The example below reuses `MatMul/tmatmulk.cpp` generated in section 4.3.

> For compile-only validation on a machine without an NPU card, see [docs/no_npu_compile_only_guide_zh.md](docs/no_npu_compile_only_guide_zh.md).

```bash
# 1) Generate the npu_validation test directory
# (creates npu_validation/ under the current sample directory)

# A2/A3 example:
python3 test/npu_validation/scripts/generate_testcase.py \
--input test/samples/MatMul/tmatmulk.cpp \
--run-mode npu \
--soc-version Ascend910B1

# A5 example:
python3 test/npu_validation/scripts/generate_testcase.py \
--input test/samples/MatMul/tmatmulk.cpp \
--run-mode npu \
--soc-version Ascend950

# 2) Run validation (run.sh requires no additional arguments)
test/samples/MatMul/npu_validation/tmatmulk/run.sh
```

Notes:
- `test/samples/MatMul/npu_validation/tmatmulk/` will contain `tmatmulk_kernel.cpp`, `main.cpp`, `golden.py`, `compare.py`, `run.sh`, and `CMakeLists.txt`.
- `golden.py` generates random inputs by default; outputs default to all zeros (only the count, shape, and data type of inputs/outputs match the kernel parameters).
- `compare.py` compares `golden*.bin` against `output*.bin` and reports an error if they differ.
Loading