aws-neuron · tonyz0x0 · May 19, 2026
@@ -0,0 +1,16 @@
+.. _error-code-evrf036:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF036.
+
+NCC_EVRF036
+===========
+
+**Error message**: QuantizeMX custom call has invalid backend_config JSON.
+
+The ``QuantizeMX`` custom call requires a ``backend_config`` attribute that
+is a valid JSON string with the fields ``dtype``, ``dim``, ``block_size`` and
+``scale_method``. The compiler raises this error when the attribute string
+cannot be parsed as JSON.
+
+To fix this error, ensure ``backend_config`` is valid JSON. The logical fields validated by separate downstream errors (EVRF038/039/040/041) are ``dtype``, ``dim``, ``block_size``, ``scale_method``.
@@ -0,0 +1,16 @@
+.. _error-code-evrf037:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF037.
+
+NCC_EVRF037
+===========
+
+**Error message**: QuantizeMX custom call operand count must be exactly 1 (input tensor).
+
+The ``QuantizeMX`` custom call takes a single input tensor and produces a
+tuple of two outputs (the quantized data and the per-block scale). The
+compiler raises this error when the call has any number of operands other
+than one.
+
+To fix this error, pass exactly one input tensor as the operand.
@@ -0,0 +1,13 @@
+.. _error-code-evrf038:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF038.
+
+NCC_EVRF038
+===========
+
+**Error message**: QuantizeMX custom call dim is invalid for input tensor rank.
+
+The ``QuantizeMX`` custom call implements OCP MXFP-8 microscaling quantization (https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf) and produces a tuple of (quantized_data, scale). The ``dim`` parameter specifies which dimension to quantize along. Only the last dimension (dim=-1) or second-to-last dimension (dim=-2) are supported.
+
+To fix this error, use dim=-1 (last dimension) or dim=-2 (second-to-last dimension).
@@ -0,0 +1,13 @@
+.. _error-code-evrf039:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF039.
+
+NCC_EVRF039
+===========
+
+**Error message**: QuantizeMX custom call block_size must be 32.
+
+The ``QuantizeMX`` custom call implements OCP MXFP-8 microscaling quantization (https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). The OCP MXFP specification requires a block size of 32 elements per scaling factor. The compiler raises this error when a different block_size value is provided in the backend_config.
+
+To fix this error, use ``block_size=32``.
@@ -0,0 +1,13 @@
+.. _error-code-evrf040:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF040.
+
+NCC_EVRF040
+===========
+
+**Error message**: QuantizeMX custom call scale_method is unsupported.
+
+The ``QuantizeMX`` custom call implements OCP MXFP-8 microscaling quantization (https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). Currently, only the 'EMAX' scale computation method is supported. The compiler raises this error when a different scale_method value is provided in the backend_config.
+
+To fix this error, use ``scale_method='EMAX'``.
@@ -0,0 +1,13 @@
+.. _error-code-evrf041:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF041.
+
+NCC_EVRF041
+===========
+
+**Error message**: QuantizeMX custom call input type is unsupported.
+
+The ``QuantizeMX`` custom call implements OCP MXFP-8 microscaling quantization (https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). Only BF16 and F16 input tensors are supported for quantization. The compiler raises this error when the input tensor has a different element type.
+
+To fix this error, cast the input tensor to BF16 or F16 before quantization.
@@ -0,0 +1,23 @@
+.. _error-code-evrf042:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF042.
+
+NCC_EVRF042
+===========
+
+**Error message**: QuantizeMX custom call validation failed.
+
+The ``QuantizeMX`` custom call implements OCP MXFP-8 microscaling quantization (https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf) and produces a tuple of (quantized_data, scale). This error aggregates several output type and shape validation failures. The compiler raises this error when:
+
+1. The backend_config ``dtype`` field is not ``float8_e5m2`` or ``float8_e4m3fn``
+2. The quantized_data element type does not match the expected FP8 dtype
+3. The quantized_data shape does not match the expected dimensions
+
+The specific failure is one of:
+
+- the ``backend_config`` ``dtype`` is not ``float8_e5m2`` or ``float8_e4m3fn``
+- the ``quantized_data`` element type does not match the ``dtype`` declared in ``backend_config``
+- the ``quantized_data`` shape does not match the input tensor shape
+
+To fix this error, ensure the ``QuantizeMX`` result tuple uses the FP8 element type that matches the ``backend_config`` ``dtype`` field, with a shape that matches the input tensor.
@@ -0,0 +1,13 @@
+.. _error-code-evrf043:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF043.
+
+NCC_EVRF043
+===========
+
+**Error message**: ScaledMatmul custom call must have exactly 4 operands.
+
+The ``__op$block_scaled_dot`` custom call performs matrix multiplication on MXFP8-quantized tensors produced by ``QuantizeMX``. It requires exactly 4 operands in order: lhs (left-hand side matrix), rhs (right-hand side matrix), lhs_scale (per-block scales for lhs), and rhs_scale (per-block scales for rhs). The compiler raises this error when a different number of operands is provided.
+
+To fix this error, pass all 4 operands (lhs, rhs, lhs_scale, rhs_scale).
@@ -0,0 +1,13 @@
+.. _error-code-evrf044:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF044.
+
+NCC_EVRF044
+===========
+
+**Error message**: ScaledMatmul custom call LHS input type is unsupported.
+
+The ``__op$block_scaled_dot`` custom call performs matrix multiplication on MXFP8-quantized tensors. The LHS (left-hand side) operand must be an FP8 tensor (``f8E5M2`` or ``f8E4M3FN``) produced by the ``QuantizeMX`` custom call. The compiler raises this error when the LHS operand has a different element type.
+
+To fix this error, ensure the LHS operand is the FP8 quantized data tensor returned by ``QuantizeMX`` (or any equivalent FP8 ``f8E5M2`` / ``f8E4M3FN`` tensor).
@@ -0,0 +1,13 @@
+.. _error-code-evrf045:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF045.
+
+NCC_EVRF045
+===========
+
+**Error message**: ScaledMatmul custom call output type is unsupported.
+
+The ``__op$block_scaled_dot`` custom call performs matrix multiplication on MXFP8-quantized tensors and dequantizes the result. Only F32 and BF16 output types are supported. The output dtype is controlled by the ``dequantize_type`` field in the backend_config. The compiler raises this error when the result tensor has a different element type.
+
+To fix this error, declare the result as F32 or BF16.
@@ -0,0 +1,13 @@
+.. _error-code-evrf046:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF046.
+
+NCC_EVRF046
+===========
+
+**Error message**: ScaledMatmul custom call LHS tensor must have rank >= 2.
+
+The ``__op$block_scaled_dot`` custom call performs matrix multiplication on MXFP8-quantized tensors. Matrix operations require at least 2-dimensional tensors (matrices). The LHS (left-hand side) operand must have rank >= 2. The compiler raises this error when the LHS tensor is a scalar or 1-D vector.
+
+To fix this error, reshape the LHS to have at least 2 dimensions.
@@ -0,0 +1,13 @@
+.. _error-code-evrf047:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF047.
+
+NCC_EVRF047
+===========
+
+**Error message**: ScaledMatmul custom call RHS tensor must have rank >= 2.
+
+The ``__op$block_scaled_dot`` custom call performs matrix multiplication on MXFP8-quantized tensors. Matrix operations require at least 2-dimensional tensors (matrices). The RHS (right-hand side) operand must have rank >= 2. The compiler raises this error when the RHS tensor is a scalar or 1-D vector.
+
+To fix this error, reshape the RHS to have at least 2 dimensions.
@@ -0,0 +1,13 @@
+.. _error-code-evrf048:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF048.
+
+NCC_EVRF048
+===========
+
+**Error message**: ScaledMatmul custom call batch dimension mismatch.
+
+The ``__op$block_scaled_dot`` custom call performs batched matrix multiplication on MXFP8-quantized tensors. The product of LHS batch dimension sizes must equal the product of RHS batch dimension sizes. The compiler raises this error when the total batch sizes do not match.
+
+To fix this error, ensure the product of LHS and RHS batch dimension sizes match.
@@ -0,0 +1,23 @@
+.. _error-code-evrf049:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF049.
+
+NCC_EVRF049
+===========
+
+**Error message**: ScaledMatmul custom call could not parse backend_config.
+
+The ``__op$block_scaled_dot`` custom call performs matrix multiplication on MXFP8-quantized tensors. It requires a ``backend_config`` attribute containing valid JSON with a ``scaled_dot_backend_config`` object. The compiler raises this error when the backend_config JSON is malformed.
+
+EVRF049 fires only when the ``backend_config`` JSON itself is malformed; missing fields default rather than triggering this error. Output dtype is determined by the result-type element type, not by a JSON field.
+
+The optional fields in ``scaled_dot_backend_config`` with their defaults are:
+
+- ``lhs_batch_dimensions``: array of batch dimension indices for LHS (default ``[]``)
+- ``rhs_batch_dimensions``: array of batch dimension indices for RHS (default ``[]``)
+- ``lhs_contracting_dimensions``: array of contracting dimension indices for LHS (default ``[]``)
+- ``rhs_contracting_dimensions``: array of contracting dimension indices for RHS (default ``[]``)
+- ``element_dtype``: FP8 element dtype ``"float8_e5m2"`` or ``"float8_e4m3fn"`` if specified (default empty string)
+
+To fix this error, provide properly-formatted JSON for the ``scaled_dot_backend_config`` object.
@@ -0,0 +1,13 @@
+.. _error-code-evrf050:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF050.
+
+NCC_EVRF050
+===========
+
+**Error message**: ScaledMatmul custom call contracting dimension sizes mismatch.
+
+The ``__op$block_scaled_dot`` custom call performs matrix multiplication on MXFP8-quantized tensors. In matrix multiplication, the contracting dimensions (the dimensions that are summed over) must have equal sizes in both operands. For example, in C = A @ B where A is [M, K] and B is [K, N], the contracting dimension K must match. The compiler raises this error when LHS and RHS contracting dimensions have different sizes.
+
+To fix this error, ensure the LHS and RHS contracting dimension sizes match.
@@ -0,0 +1,15 @@
+.. _error-code-evrf051:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF051.
+
+NCC_EVRF051
+===========
+
+**Error message**: Data type F8E4M3FN is not supported on TRN1/TRN2.
+
+The F8E4M3FN (8-bit floating point with 4-bit exponent and 3-bit mantissa) data type is only supported on Trainium3 (Trn3) and later hardware generations. The compiler raises this error when a model uses F8E4M3FN quantization but targets Trn1 or Trn2 architectures.
+
+To fix this error, either target Trn3 or use the experimental flag to cast F8E4M3FN to F8E4M3 (``--experimental-unsafe-fp8e4m3fn-as-fp8e4m3``).
+
+* More information on supported data types: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/data-types.html
@@ -0,0 +1,13 @@
+.. _error-code-evrf053:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF053.
+
+NCC_EVRF053
+===========
+
+**Error message**: ScaledMatmul custom call contracting dimension overlaps with batch dimension.
+
+The ``__op$block_scaled_dot`` custom call performs batched matrix multiplication on MXFP8-quantized tensors. Batch dimensions and contracting dimensions must be disjoint sets. A dimension cannot be both a batch dimension and a contracting dimension. The compiler raises this error when a contracting dimension index also appears in the batch dimensions list.
+
+To fix this error, ensure batch dimensions and contracting dimensions are disjoint.
@@ -0,0 +1,13 @@
+.. _error-code-evrf054:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF054.
+
+NCC_EVRF054
+===========
+
+**Error message**: ScaledMatmul custom call batch dimension index out of bounds.
+
+The ``__op$block_scaled_dot`` custom call performs batched matrix multiplication on MXFP8-quantized tensors. Batch dimension indices specified in the backend_config must be valid (0 <= dim < rank). The compiler raises this error when a batch dimension index is negative or exceeds the tensor rank.
+
+To fix this error, use dimension indices within the valid range (0 <= dim < rank).
@@ -0,0 +1,13 @@
+.. _error-code-evrf055:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF055.
+
+NCC_EVRF055
+===========
+
+**Error message**: ScaledMatmul custom call contracting dimension index out of bounds.
+
+The ``__op$block_scaled_dot`` custom call performs matrix multiplication on MXFP8-quantized tensors. Contracting dimension indices specified in the backend_config must be valid (0 <= dim < rank). The compiler raises this error when a contracting dimension index is negative or exceeds the tensor rank.
+
+To fix this error, use dimension indices within the valid range (0 <= dim < rank).
@@ -0,0 +1,13 @@
+.. _error-code-evrf057:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF057.
+
+NCC_EVRF057
+===========
+
+**Error message**: QuantizeMX custom call must return a tuple with exactly 2 outputs.
+
+The ``QuantizeMX`` custom call implements OCP MXFP-8 microscaling quantization (https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf) and must produce a 2-element tuple: the quantized data tensor and the per-block scale tensor. The compiler raises this error when the result type is not a 2-element tuple.
+
+To fix this error, declare a 2-element tuple result type ``(quantized_data, scale)``.
@@ -0,0 +1,13 @@
+.. _error-code-evrf058:
+
+.. meta::
+   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF058.
+
+NCC_EVRF058
+===========
+
+**Error message**: QuantizeMX custom call input dimension must be divisible by 4.
+
+The ``QuantizeMX`` custom call implements OCP MXFP-8 microscaling quantization (https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). The quantization dimension size must be divisible by 4. The compiler raises this error when the input tensor's quantization dimension size is not a multiple of 4.
+
+To fix this error, pad or reshape the input tensor so the quantization dimension size is a multiple of 4.