Skip to content

[Enhancement] PR#655 related: Case1-aligned spmd_paged_attention_highperf cases crash with aicore exception on a2a3 #677

@chenshengxin2026

Description

@chenshengxin2026

Platform

a2a3 (Ascend 910B/C hardware)

Runtime Variant

tensormap_and_ringbuffer

Description

When aligning the standalone high-performance paged-attention test scripts with Case1 from
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py,
both of the newly added cases fail at runtime on hardware instead of completing successfully.

Affected files:

  • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.py
  • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.py

Reference shape from spmd_paged_attention Case1:

  • batch=256
  • num_heads=16
  • kv_head_num=1
  • head_dim=128
  • block_size=128
  • context_len=8192
  • max_model_len=32768
  • dtype=bfloat16

The newly added highperf cases were intended to match that shape, but both crash:

  • bench_pa_performance.py: ("Qwen3-8B b256 h16/kv1 kv8192", 256, 16, 1, 128, 8192, 128)
  • test_pa_accuracy.py: {"batch": 256, "num_heads": 16, "num_kv_heads": 1, "head_dim": 128, "kv_seq": 8192, "block_size": 128}

Steps to Reproduce

  1. Use commit 57e7a6dd3ac15a28c08b878716a171e65420f26a.
  2. Modify the highperf scripts to add the Case1-aligned shape:
    • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.py
    • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.py
  3. Build the standalone kernel library if needed:
    cd tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels
    bash ./compile.sh
  4. Run the accuracy script:
    python ./test_pa_accuracy.py
  5. Run the benchmark script:
    python ./bench_pa_performance.py --bf16

Expected Behavior

The Case1-aligned shape should run successfully in both scripts, matching the behavior of
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py
Case1, and should produce either correctness results (test_pa_accuracy.py) or benchmark
numbers (bench_pa_performance.py) without device/runtime exceptions.

Actual Behavior

Both scripts fail on hardware.

Observed errors include:

EE9999[PID: 3700733] 2026-04-25-16:02:42.600.249 (EE9999):  rtDeviceSynchronizeWithTimeout execution failed,
reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
TraceBack (most recent call last):
wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]

and

RuntimeError: npuSynchronizeDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:564 NPU function error:
SUSPECT REMOTE ERROR, error code is 507057

Git Commit ID

57e7a6d

CANN Version

8.5.0.alpha001

Driver Version

Unknown

Host Platform

Linux (aarch64)

Additional Context

Relevant reference case:

  • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py Case1

Relevant modified files:

  • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.py
  • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.py

This issue is related to the Case1-alignment work associated with PR #655.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions