Skip to content

[TIRx] Phase out flat device-intrinsic op aliases#19838

Open
tlopex wants to merge 1 commit into
apache:mainfrom
tlopex:phase-out-flat-device-intrins
Open

[TIRx] Phase out flat device-intrinsic op aliases#19838
tlopex wants to merge 1 commit into
apache:mainfrom
tlopex:phase-out-flat-device-intrins

Conversation

@tlopex

@tlopex tlopex commented Jun 19, 2026

Copy link
Copy Markdown
Member

PR #19677 registered every CUDA / Trainium device intrinsic under two Op names: a flat tirx.<ns>_<name> alias plus the canonical tirx.<ns>.<name>. The flat aliases were a migration shim; passes and codegen that match an intrinsic had to check both spellings (the dual-name IsOp pattern). The Python builders and TVMScript parser already canonicalize, so every real Call already carries the canonical op and the flat aliases were dead weight.

This pr removes the flat device-intrinsic aliases, keeping only the canonical namespaced ops:

  • RegisterDeviceIntrinsic (backend/cuda) and RegisterNKIIntrinsic (backend/trn) register only the canonical name.
  • Drop the flat-only macro registrations for device intrinsics; the canonical op with all attrs is registered from the alias table. The WMMA tvm_sync / mma_store / mma_fill builtins and the profiling timer_cuda builtins keep their flat names (no namespace / canonical form, category "builtin").
  • Remove the redundant flat tirx.ptx_fetch_register registration.
  • C++ consumers that resolved a flat op by name string now use the canonical name; the ptx_elect_sync / cuda_func_call dual-name matchers collapse to the canonical check.
  • Python: the InjectPTXAsyncCopy round-trip Op.get and the matching test assertion use the canonical name. call_intrin keeps its flat->canonical rewrite for back-compat, so user-facing wrappers are unchanged.
  • test_op_namespace_cleanup asserts device_intrin op names are canonical so a flat alias cannot silently reappear.

Generated CUDA is byte-identical: helper names are literals and codegen dispatches by op name, with the registry resolving the canonical name to the same helper.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors TVM's backend intrinsics by replacing flat names (e.g., tirx.ptx_cp_async, tirx.cuda_func_call, tirx.nki_load) with their canonical namespace-qualified equivalents (e.g., tirx.ptx.cp_async, tirx.cuda.func_call, tirx.nki.load), removing legacy flat macro registrations. The review feedback highlights a regression where phasing out these flat registrations lost the TScriptDtypePrintLocation attribute for key PTX intrinsics, which is necessary for correct TVMScript printing and parsing. The reviewer suggests extending the registration struct and macros to restore this attribute, and also recommends simplifying the redundant dual-name matching logic for ptx_elect_sync in tile_primitive_dispatch.cc.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/backend/cuda/op/target_builtin.cc
Comment thread src/backend/cuda/op/target_builtin.cc
Comment thread src/backend/cuda/op/target_builtin.cc Outdated
Comment thread src/tirx/transform/tile_primitive_dispatch.cc
@tlopex tlopex force-pushed the phase-out-flat-device-intrins branch 2 times, most recently from eabcbbb to 102b8ca Compare June 19, 2026 05:34
PR apache#19677 registered every CUDA / Trainium device intrinsic under two Op
names: a flat `tirx.<ns>_<name>` alias plus the canonical
`tirx.<ns>.<name>`. The flat aliases were a migration shim; passes and
codegen that match an intrinsic had to check both spellings. The
canonical namespaced form is the intended end state.

Remove the flat device-intrinsic aliases, keeping only the canonical
namespaced ops:

- RegisterDeviceIntrinsic (backend/cuda) and RegisterNKIIntrinsic
  (backend/trn) register only the canonical name.
- Drop the flat-only macro registrations for device intrinsics; the
  canonical op with all attrs is registered from the alias table. The
  WMMA tvm_*_sync / mma_store / mma_fill builtins and the profiling
  timer_*_cuda builtins keep their flat names (no namespace / canonical
  form, category "builtin").
- Remove the redundant flat tirx.ptx_fetch_register registration.
- C++ consumers that resolved a flat op by name string now use the
  canonical name; the ptx_elect_sync / cuda_func_call dual-name matchers
  collapse to a single canonical same_as check.
- Python: every device-intrinsic wrapper call site (backend/cuda/op.py,
  backend/trn/op.py, backend/cuda/script.py, gemm_async/tcgen05.py) now
  passes the canonical op name to call_intrin; no Python source names a
  removed flat op. _canonical_device_intrin_name is retained only as the
  codegen-registration bridge (basename stays underscore because it
  generates the C helper name) plus a fallback for user-passed flat
  names. The remaining underscore strings in operator/intrinsics
  (cp_async.py / wgmma.py CODEGEN_REGISTRY keys) are codegen variant
  names, not TIR ops.

The flat tirx.ptx_cp_async op was not purely redundant: it carried
TScriptDtypePrintLocation=kFirst so the raw InjectPTXAsyncCopy-emitted
form (element dtype in Call.dtype) round-trips dtype-first, while the
fork-native form (void dtype) did not. To preserve that without a flat
alias, the raw form now uses a dedicated canonical op
tirx.ptx.cp_async_raw (printer name ptx.cp_async, so the TVMScript
surface is unchanged); the fork-native form keeps tirx.ptx.cp_async.
Only this op carries dtype-first; ptx.mma / ptx.ldmatrix keep void
Call.dtype and must not (printing a leading dtype would corrupt their
script form).

Generated CUDA is byte-identical: helper names are literals and codegen
dispatches by op name, with the registry resolving the canonical name to
the same helper.
@tlopex tlopex force-pushed the phase-out-flat-device-intrins branch from 102b8ca to 6fc3c15 Compare June 19, 2026 05:45
@tlopex tlopex requested a review from spectrometerHBH June 19, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant