Skip to content

[WIP] Feature low precision dtype#183

Draft
KurrinQu wants to merge 116 commits into
mouliangyu:feature-vpto-backendfrom
KurrinQu:feature-low-precision-dtype
Draft

[WIP] Feature low precision dtype#183
KurrinQu wants to merge 116 commits into
mouliangyu:feature-vpto-backendfrom
KurrinQu:feature-low-precision-dtype

Conversation

@KurrinQu
Copy link
Copy Markdown
Collaborator

No description provided.

mouliangyu and others added 30 commits April 19, 2026 23:15
Explain block/subblock runtime queries in workload-partitioning terms and remove redundant supported-forms wording from conversion ops docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add detailed mode parameter documentation (mode=0 vs mode=1)
- Add 'Why get_buf/rls_buf is More Programmer-Friendly' section:
  - No manual priming/draining for ping/pong loops
  - No loop peeling for complex/nested loop dependencies
  - Simpler mental model (buffer ID + program order)
- Add quick example comparison showing set_flag overhead vs get_buf simplicity
- Update Example 2 and 3b with explicit mode=0 in code
- Update comparison table with 'Loop peeling' row
- set_flag/wait_flag: 2 IDs per buffer (1 forward + 1 reverse pipe-pair)
- get_buf/rls_buf: 1 ID per buffer (handles both directions automatically)
- 8 per pipe-pair is HW limit, not a formula
- set_flag/wait_flag: 8 IDs per pipe-pair direction (HW limit)
- get_buf/rls_buf: 1 buffer ID per shared resource (HW limit: 32 global), same ID used across all pipelines
- Event ID mgmt: each buffer occupies 1 ID per direction (removed misleading 4 IDs calc)
- Drain example: use concrete EVT_*_0/EVT_*_1 instead of {(N-1)%2} expressions
- 4 set_flag + 4 wait_flag (not 8)
- 4 IDs = 2 pipe-pair directions × 2 ping/pong buffers
- set_flag/wait_flag: 1 MTE2 load, 8 Vector slices — must peel set/wait outside loop
- get_buf/rls_buf: same pattern but acquire/release can stay inside or outside
- Acquire/release per slice inside loop
- Iteration 0 blocks until MTE2 done, iterations 1-7 proceed immediately
Add the merged v0.3 PTO micro-instruction release spec document for A5,
including ISA group references and updated synchronization notes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
    Introduce a three-pass pipeline that lowers PTO tile ops to vector-level
    implementations via TileLang DSL templates:

    - ExpandTileOp: invokes TileLang Python DSL to instantiate template
      functions and replaces tile ops with func.call. SpecKey covers all
      operands; tile_buf operands are passed through without bridging.
    - PTOInlineLibCall: extended to recognize tilelang instance functions via
      the  attribute set by the DSL frontend.
    - FoldTileBufIntrinsics: resolves pto.tile_buf_addr / tile_valid_rows /
      tile_valid_cols, including dynamic valid-shape via pto.bind_tile chain
      tracing.
    - MemrefToTileBuf: recovers tile_buf types from memref + bind_tile
      metadata after PlanMemory/InsertSync.
    - PTOViewToMemref: insert pto.bind_tile anchors for tile_buf function
      args so MemrefToTileBuf can recover them.

    Adds new PTO ops (tile_buf_addr/tile_valid_rows/tile_valid_cols),
    ptoas pipeline wiring, design docs, and unit tests.
qukelin and others added 25 commits April 19, 2026 23:39
Implement missing predicate-op surfaces end-to-end in TileLang DSL v1, including semantic/lowering/test coverage and docs updates.

Use PredicateDist enum for predicate load/store dist operands instead of raw strings.

Closes hw-native-sys#124
@KurrinQu KurrinQu force-pushed the feature-low-precision-dtype branch from cdf2dcf to 64af063 Compare April 22, 2026 07:23
@KurrinQu KurrinQu marked this pull request as draft April 24, 2026 16:38
@KurrinQu KurrinQu changed the title Feature low precision dtype [WIP] Feature low precision dtype Apr 24, 2026
@mouliangyu mouliangyu force-pushed the feature-vpto-backend branch from a8c45f2 to 352427a Compare April 27, 2026 21:41
@mouliangyu mouliangyu force-pushed the feature-vpto-backend branch from 5e223fb to 42b74f9 Compare May 14, 2026 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants