IREE plugin for Tenstorrent AI Accelerators
tt-iree enables execution of ML models compiled with IREE on Tenstorrent AI accelerators (P100A/Blackhole architecture).
# Goal: Seamless deployment to Tenstorrent
import iree.compiler as compiler
import iree.runtime as runtime
# Compile for Tenstorrent
compiled = compiler.compile_str(mlir_code, target_backends=["tenstorrent"])
# Run on Tenstorrent hardware
config = runtime.Config("tenstorrent")
result = runtime.invoke(compiled, inputs)Early Development (PoC Phase)
WIP & Planned:
- Compiler backend registration
- HAL driver implementation
- Mock mode execution
- Basic operation support (elementwise add, single-tile matmul smoke)
- TTNN integration
- Hardware execution
| Dependency | Version | Notes |
|---|---|---|
| IREE | v3.9.0 | Compiler & runtime infrastructure |
| tt-metal | v0.65.0 | Tenstorrent SDK (TTNN, TT-Metalium) |
tt-iree/
├── compiler/ # IREE compiler plugin
│ └── plugins/target/tenstorrent/
├── runtime/ # IREE HAL driver
│ └── src/iree/hal/drivers/tenstorrent/
├── third_party/
│ ├── iree/ # IREE v3.9.0 (submodule)
│ └── tt-metal/ # tt-metal v0.65.0 (submodule)
├── docs/ # Documentation
├── test/ # Tests
└── examples/ # Example programs
- CMake 3.21+
- Ninja
- Clang/LLVM 15+
- Python 3.9+
# Ubuntu 22.04
sudo apt-get install cmake ninja-build clang lld python3-pipgit clone https://github.com/user/tt-iree.git
cd tt-iree
# Setup submodules with pinned versions
./scripts/setup_submodules.shNote: Submodule initialization takes ~10-20 minutes due to IREE's large dependency tree.
cmake -G Ninja -B build \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DTT_IREE_ENABLE_MOCK=ON
cmake --build build# Set up TT-Metal environment
source third_party/tt-metal/build/python_env/bin/activate
export TT_METAL_HOME=$(pwd)/third_party/tt-metal
cmake -G Ninja -B build \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DTT_IREE_ENABLE_MOCK=OFF \
-DTT_IREE_ENABLE_TTNN=ON
cmake --build buildOut-of-tree IREE plugin for Tenstorrent hardware, consisting of two main components:
Extends IREE's compiler to generate code for Tenstorrent's tile-based architecture:
- Converts IREE's HAL dialect to Tenstorrent-specific representation
- Handles 32x32 tile layout transformation
- Maps workloads to 8x8 Tensix core grid
- Generates TT-Metal kernel code
Implements IREE's HAL(Hardware Abstraction Layer) interface:
| Component | Description |
|---|---|
tt_driver |
Driver registration and device enumeration |
tt_device |
Device lifecycle and capability management |
tt_allocator |
Buffer allocation (DRAM/L1) |
tt_buffer |
Memory management with tile layout |
tt_command_buffer |
Command recording and dispatch |
tt_executable |
Kernel loading and execution |
PyTorch/JAX/TensorFlow
↓
StableHLO/TOSA
↓
IREE Compiler (Flow → Stream → HAL)
↓
Tenstorrent Backend (HAL → TT-Metal)
↓
.vmfb (VM FlatBuffer)
↓
IREE Runtime + HAL Driver
↓
TT-Metal Runtime → P100A Hardware
For detailed architecture documentation, see docs/.
| Phase | Goal | Status |
|---|---|---|
| PoC | Single operation, mock execution | WIP |
| MVP | MNIST inference on P100A | Planned |
| Alpha | ResNet-18, multi-core dispatch | Planned |
| Beta | LLM inference (GPT-2 scale) | Planned |
| v1.0 | Production release | Planned |
- IREE - ML compiler infrastructure
- tt-metal - Tenstorrent low-level SDK
- tt-mlir - Tenstorrent MLIR compiler
- iree-amd-aie - Reference for out-of-tree IREE backend
Apache 2.0 with LLVM Exceptions. See LICENSE.
- Developed as part of the Tenstorrent Korea Open Source Developer Program.