After set_node_indices (pass 12), the attention kernel's FX graph contains permute operations that the water emitter serializes as wave.permute ops in MLIR. The FX importer's _convert_ops dispatcher in fx_emitter.py has no handler for wave.permute and raises ValueError("Unsupported op in MLIR-to-FX conversion: wave.permute").
Permute ops appear in attention because the online softmax computes reductions along a dimension that differs from the matmul accumulation dimension, requiring dimension reordering between the two matmuls. GEMM does not use permute ops.
This blocks passes 12-13 for attention.
Probable Fix:
Add a _handle_permute_op function to fx_emitter.py that creates the corresponding Permute FX node, and register it in the _convert_ops match block. The handler needs to extract the source and target dimension orderings from the MLIR op's attributes and reconstruct the FX node accordingly.
After
set_node_indices(pass 12), the attention kernel's FX graph contains permute operations that the water emitter serializes aswave.permuteops in MLIR. The FX importer's_convert_opsdispatcher infx_emitter.pyhas no handler forwave.permuteand raisesValueError("Unsupported op in MLIR-to-FX conversion: wave.permute").Permute ops appear in attention because the online softmax computes reductions along a dimension that differs from the matmul accumulation dimension, requiring dimension reordering between the two matmuls. GEMM does not use permute ops.
This blocks passes 12-13 for attention.
Probable Fix:
Add a
_handle_permute_opfunction tofx_emitter.pythat creates the correspondingPermuteFX node, and register it in the_convert_opsmatch block. The handler needs to extract the source and target dimension orderings from the MLIR op's attributes and reconstruct the FX node accordingly.