Alias-aware token threading for better parallelism

All memory operations (loads, stores, atomics) are threaded through a single global token chain. This is correct but conservative—operations on independent arrays are serialized unnecessarily.

## Proposed improvement

Implement alias-aware token threading:
1. Alias analysis: Compute which pointers may refer to the same memory region (alias sets)
2. Per-set token chains: Thread tokens only between operations that may alias
3. Loop parallel stores: Identify stores in loops with non-overlapping indices across iterations—these can skip token dependencies entirely

## Why

The current sequential approach prevents parallelism between independent memory operations. For example, loading from array a and storing to array b don't need ordering constraints if they're provably disjoint. Alias-aware threading preserves correctness while enabling the hardware to execute independent operations concurrently.

## Reference implementation

cuTile Python implements this in:
- https://github.com/NVIDIA/cutile-python/blob/main/src/cuda/tile/_passes/alias_analysis.py — dataflow analysis propagating alias sets until fixed-point
- https://github.com/NVIDIA/cutile-python/blob/main/src/cuda/tile/_passes/token_order.py — maps tokens to alias sets via TokenKey, with special handling for loop-parallel stores


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alias-aware token threading for better parallelism #1

Proposed improvement

Why

Reference implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Alias-aware token threading for better parallelism #1

Description

Proposed improvement

Why

Reference implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions