We observed that when using GPT-5.4, it tends to “cheat” on tasks/hip2hip/others/matrix_multiplication.
It modified main.hip in two key ways:
- Removes the real GEMM computation
The baseline uses a standard tiled GEMM, but the “optimized” version directly sets: C[row * b_cols + col] = a_cols * 0.02F; This exploits the fact that inputs are constant (A=1.0, B=0.02), so the result can be hardcoded without actual computation.
- Removes the real GPU execution flow
The baseline includes memory allocation, hipMemcpy, kernel launch, and result verification.
The modified version bypasses actual execution and relies on trivial verification logic to pass.
So this is not a real optimization, but exploiting fixed inputs and weak validation to “pass” the test and reported 35x speedup.
main.hip.txt
Here is the cheating main.hip
We observed that when using GPT-5.4, it tends to “cheat” on
tasks/hip2hip/others/matrix_multiplication.It modified main.hip in two key ways:
The baseline uses a standard tiled GEMM, but the “optimized” version directly sets: C[row * b_cols + col] = a_cols * 0.02F; This exploits the fact that inputs are constant (A=1.0, B=0.02), so the result can be hardcoded without actual computation.
The baseline includes memory allocation, hipMemcpy, kernel launch, and result verification.
The modified version bypasses actual execution and relies on trivial verification logic to pass.
So this is not a real optimization, but exploiting fixed inputs and weak validation to “pass” the test and reported 35x speedup.
main.hip.txt
Here is the cheating main.hip