[TIRx] Phase out flat device-intrinsic op aliases by tlopex · Pull Request #19838 · apache/tvm

tlopex · 2026-06-19T04:49:35Z

PR #19677 registered every CUDA / Trainium device intrinsic under two Op names: a flat tirx.<ns>_<name> alias plus the canonical tirx.<ns>.<name>. The flat aliases were a migration shim; passes and codegen that match an intrinsic had to check both spellings (the dual-name IsOp pattern). The Python builders and TVMScript parser already canonicalize, so every real Call already carries the canonical op and the flat aliases were dead weight.

This pr removes the flat device-intrinsic aliases, keeping only the canonical namespaced ops:

RegisterDeviceIntrinsic (backend/cuda) and RegisterNKIIntrinsic (backend/trn) register only the canonical name.
Drop the flat-only macro registrations for device intrinsics; the canonical op with all attrs is registered from the alias table. The WMMA tvm_sync / mma_store / mma_fill builtins and the profiling timer_cuda builtins keep their flat names (no namespace / canonical form, category "builtin").
Remove the redundant flat tirx.ptx_fetch_register registration.
C++ consumers that resolved a flat op by name string now use the canonical name; the ptx_elect_sync / cuda_func_call dual-name matchers collapse to the canonical check.
Python: the InjectPTXAsyncCopy round-trip Op.get and the matching test assertion use the canonical name. call_intrin keeps its flat->canonical rewrite for back-compat, so user-facing wrappers are unchanged.
test_op_namespace_cleanup asserts device_intrin op names are canonical so a flat alias cannot silently reappear.

Generated CUDA is byte-identical: helper names are literals and codegen dispatches by op name, with the registry resolving the canonical name to the same helper.

gemini-code-assist

Code Review

This pull request refactors TVM's backend intrinsics by replacing flat names (e.g., tirx.ptx_cp_async, tirx.cuda_func_call, tirx.nki_load) with their canonical namespace-qualified equivalents (e.g., tirx.ptx.cp_async, tirx.cuda.func_call, tirx.nki.load), removing legacy flat macro registrations. The review feedback highlights a regression where phasing out these flat registrations lost the TScriptDtypePrintLocation attribute for key PTX intrinsics, which is necessary for correct TVMScript printing and parsing. The reviewer suggests extending the registration struct and macros to restore this attribute, and also recommends simplifying the redundant dual-name matching logic for ptx_elect_sync in tile_primitive_dispatch.cc.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

PR apache#19677 registered every CUDA / Trainium device intrinsic under two Op names: a flat `tirx.<ns>_<name>` alias plus the canonical `tirx.<ns>.<name>`. The flat aliases were a migration shim; passes and codegen that match an intrinsic had to check both spellings. The canonical namespaced form is the intended end state. Remove the flat device-intrinsic aliases, keeping only the canonical namespaced ops: - RegisterDeviceIntrinsic (backend/cuda) and RegisterNKIIntrinsic (backend/trn) register only the canonical name. - Drop the flat-only macro registrations for device intrinsics; the canonical op with all attrs is registered from the alias table. The WMMA tvm_*_sync / mma_store / mma_fill builtins and the profiling timer_*_cuda builtins keep their flat names (no namespace / canonical form, category "builtin"). - Remove the redundant flat tirx.ptx_fetch_register registration. - C++ consumers that resolved a flat op by name string now use the canonical name; the ptx_elect_sync / cuda_func_call dual-name matchers collapse to a single canonical same_as check. - Python: every device-intrinsic wrapper call site (backend/cuda/op.py, backend/trn/op.py, backend/cuda/script.py, gemm_async/tcgen05.py) now passes the canonical op name to call_intrin; no Python source names a removed flat op. _canonical_device_intrin_name is retained only as the codegen-registration bridge (basename stays underscore because it generates the C helper name) plus a fallback for user-passed flat names. The remaining underscore strings in operator/intrinsics (cp_async.py / wgmma.py CODEGEN_REGISTRY keys) are codegen variant names, not TIR ops. The flat tirx.ptx_cp_async op was not purely redundant: it carried TScriptDtypePrintLocation=kFirst so the raw InjectPTXAsyncCopy-emitted form (element dtype in Call.dtype) round-trips dtype-first, while the fork-native form (void dtype) did not. To preserve that without a flat alias, the raw form now uses a dedicated canonical op tirx.ptx.cp_async_raw (printer name ptx.cp_async, so the TVMScript surface is unchanged); the fork-native form keeps tirx.ptx.cp_async. Only this op carries dtype-first; ptx.mma / ptx.ldmatrix keep void Call.dtype and must not (printing a leading dtype would corrupt their script form). Generated CUDA is byte-identical: helper names are literals and codegen dispatches by op name, with the registry resolving the canonical name to the same helper.

gemini-code-assist Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread src/backend/cuda/op/target_builtin.cc

Comment thread src/backend/cuda/op/target_builtin.cc

Comment thread src/backend/cuda/op/target_builtin.cc Outdated

Comment thread src/tirx/transform/tile_primitive_dispatch.cc

tlopex force-pushed the phase-out-flat-device-intrins branch 2 times, most recently from eabcbbb to 102b8ca Compare June 19, 2026 05:34

tlopex force-pushed the phase-out-flat-device-intrins branch from 102b8ca to 6fc3c15 Compare June 19, 2026 05:45

tlopex requested a review from spectrometerHBH June 19, 2026 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIRx] Phase out flat device-intrinsic op aliases#19838

[TIRx] Phase out flat device-intrinsic op aliases#19838
tlopex wants to merge 1 commit into
apache:mainfrom
tlopex:phase-out-flat-device-intrins

tlopex commented Jun 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tlopex commented Jun 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant