PTO-DSL

Pythonic interface and JIT compiler for PTO-ISA

PTO-DSL provides a programming abstraction similar to cuTile, but native to NPU.

Key features:

Automatic software pipelining without manual synchronization
Easily interface with torch-npu
Lightweight, open-source compiler stack using PTO Assembler

Installation

See docker/README.md for full reproducible dependencies on NPU.

Then, install this lightweight DSL package itself:

# install latest commit
pip install git+https://github.com/huawei-csl/pto-dsl.git

# or stable tag
pip install git+https://github.com/huawei-csl/pto-dsl.git@0.1.0

For in-place development:

git clone https://github.com/huawei-csl/pto-dsl.git
cd pto-dsl
pip install -e .

Usage

See examples and tests

Contribute

See contribute_guide.md

Compare to other frameworks

PTO-DSL aims for low-level, explicit, NPU-native primitives that can match the performance of programming in hardware intrinsics. Compared to other (also very good) kernel programming frameworks, it has a bit different scope by design:

vs tilelang-ascend: tilelang can also use PTO-ISA as codegen backend. PTO-DSL intentionally exposes lower-level control, for example L2 swizzling is one-liner T.use_swizzle in tilelang, but is a user-defined custom function in PTO-DSL -- see this matmul optimization example. Once PTO-DSL is more stabilized, it might serve as a component like the CuteDSL backend for tilelang.
vs triton-ascend: Both frameworks automate software pipelining based on some MLIR dialects for NPU. PTO-DSL exposes more NPU-native memory hierarchy such as L0/L1/UB. Also, pto.load/pto.store always maps to native efficient DMA instructions, while tl.load/tl.store tries to do GPU-style memory coalescing.
vs Catlass: Catlass provides expert-optimized template collections, while PTO-DSL is more like the CuteDSL layer of Cutlass, offering explicit low-level primitives.
vs PyPTO: PyPTO is a full MPMD dynamic runtime stack, which also uses PTO-ISA as lowest-level primitive. PyPTO's Tensor API abstraction is closer to PyTorch/JAX level, while a PTO-DSL kernel is still SPMD and is closer to CuTile/CuteDSL level.

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
.agent/skills/translate_cpp2py		.agent/skills/translate_cpp2py
.github/workflows		.github/workflows
docker		docker
examples		examples
ptodsl		ptodsl
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
conftest.py		conftest.py
contribute_guide.md		contribute_guide.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTO-DSL

Installation

Usage

Contribute

Compare to other frameworks

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PTO-DSL

Installation

Usage

Contribute

Compare to other frameworks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages