Skip to content

GEMM example fails verification on NPU (output is all zeros) #112

@hungryDodo

Description

@hungryDodo

Description

When running the example_gemm/gemm_bf16.py script on a Ryzen AI 9 HX 370 processor, the project compiles successfully, but the final verification step fails with a large number of errors. The execution log indicates that the NPU output buffer (bufC) is filled with zeros, while the expected output (srcVec2) contains non-zero values. This suggests the NPU kernel may not have executed correctly or produced any output.

I found a potentially related issue, #70, which also reports verification errors. However, there seems to be a key difference. In issue #70, the errors are minor floating-point discrepancies (e.g., 472.356537!=472.356232). In my case, the output is entirely zero (e.g., 8.000000!=0.000000), indicating a more fundamental problem, rather than a precision issue.

Any help in diagnosing this issue would be greatly appreciated. Thank you!

My Environment

Item Details
CPU AMD Ryzen AI 9 HX 370
Operating System Ubuntu 24.04.2 LTS
Aries Project Version c54706b
Toolchain Info mlir_aie and llvm-aie were installed using the Aries/utils/quick_setup.sh script.
Linux Kernel 6.14.0-28-generic
Vitis Version 2024.2
XRT Version 2.20.0
NPU Firmware Version 255.0.2.7

Full xbutil examine output:

System Configuration
 OS Name : Linux
 Release : 6.14.0-28-generic
 Machine : x86_64
 CPU Cores : 24
 Memory : 23640 MB
 Distribution : Ubuntu 24.04.2 LTS
 GLIBC : 2.39
 Model : AI Series
 BIOS Vendor : American Megatrends International, LLC.
 BIOS Version : 1.04

XRT
 Version : 2.20.0
 Branch : HEAD
 Hash : a62adc1020c901af79529457c46f210aa05f15a3
 Hash Date : 2025-08-22 19:59:38
 amdxdna : 2.20.0_20250822, e9d2788a884784e3531e95d65b923c2252a1132e
 virtio-pci : unknown, unknown
 NPU Firmware Version : 255.0.2.7

Device(s) Present
|BDF |Name |
|----------------|-----------|
|[0000:c5:00.1] |NPU Strix |

Full Log

Here is the complete log from the execution of the script with make run.

mkdir -p build
/home/ai/Aries/my_install/llvm-aie/bin/clang++ -O2 -v -std=c++20 --target=aie2-none-unknown-elf -Wno-parentheses -Wno-attributes -Wno-macro-redefined -DNDEBUG -I /home/ai/Aries/example_new/example_NPU/example_gemm/../../../templates/aie2/origin/common -I /home/ai/Aries/my_install/mlir_aie/include -c aie/kernel_gemm.cc -o build/kernel_gemm.o
mkdir -p .
cd build && /home/ai/Aries/my_install/mlir_aie/bin/aiecc.py \
		--alloc-scheme=basic-sequential \
		--aie-generate-cdo \
		--no-compile-host \
		--xclbin-name=gemm.xclbin \
		--no-xchesscc \
		--no-xbridge \
		--peano /home/ai/Aries/my_install/llvm-aie \
		--aie-generate-npu --npu-insts-name=insts.txt ../gemm.adf.mlir


****** Bootgen v2024.2
  **** Build date : Nov  8 2024-16:21:57
    ** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
    ** Copyright 2022-2024 Advanced Micro Devices, Inc. All Rights Reserved.


[INFO]   : Bootimage generated successfully

Info: Embedded Metadata section is missing project.platform.device.core element, adding it.
Found xchesscc at /tools/Xilinx/Vitis/2024.2/aietools
 AIE Compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 17/17 4 Workers
Generating: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/build/gemm.adf.mlir.prj/aie_cdo_elfs.bin
Generating: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/build/gemm.adf.mlir.prj/aie_cdo_init.bin
Generating: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/build/gemm.adf.mlir.prj/aie_cdo_enable.bin
rm -rf _build
mkdir -p _build
cd _build &&  cmake ../ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DMLIR_AIE_DIR=/home/ai/Aries/my_install/mlir_aie/../../externals/mlir-aie/ -D CMAKE_C_COMPILER=gcc-13 -D CMAKE_CXX_COMPILER=g++-13 -DTARGET_NAME=hostexe -Dsubdir=host \
					&&  cmake --build . --config Release
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc-13 - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++-13 - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0")
-- Configuring done (0.3s)
-- Generating done (0.0s)
-- Build files have been written to: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build
gmake[1]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[2]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[3]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[3]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[3]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
[ 33%] Building CXX object CMakeFiles/hostexe.dir/home/ai/Aries/externals/mlir-aie/runtime_lib/test_lib/test_utils.cpp.o
[ 66%] Building CXX object CMakeFiles/hostexe.dir/host/host.cpp.o
[100%] Linking CXX executable hostexe
gmake[3]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
[100%] Built target hostexe
gmake[2]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[1]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
cp -r _build/hostexe ./
./hostexe -x build/gemm.xclbin -i build/insts.txt -k MLIR_AIE -v 2 --verify 1 --warmup 10 --iters 20
Sequence instr count: 1092
Loading xclbin: build/gemm.xclbin
Kernel opcode: MLIR_AIE
Name: MLIR_AIE
Registering xclbin: build/gemm.xclbin
Getting hardware context.
Getting handle to kernel:MLIR_AIE
Warmup Kernel.
Running Kernel.
NPU execution time: 4.096s
Error found srcVec2[0]!=bufC[0], 8.000000!=0.000000 
Error found srcVec2[1]!=bufC[1], 11.000000!=0.000000 
Error found srcVec2[3]!=bufC[3], -5.000000!=0.000000 
Error found srcVec2[4]!=bufC[4], 1.000000!=0.000000 
Error found srcVec2[5]!=bufC[5], -2.000000!=0.000000 
Error found srcVec2[6]!=bufC[6], 3.000000!=0.000000 
Error found srcVec2[7]!=bufC[7], 2.000000!=0.000000 
Error found srcVec2[8]!=bufC[8], -6.000000!=0.000000 
Error found srcVec2[9]!=bufC[9], 1.000000!=0.000000 
Error found srcVec2[10]!=bufC[10], 2.000000!=0.000000 
...
...
Error found srcVec2[1048565]!=bufC[1048565], 4.000000!=0.000000 
Error found srcVec2[1048566]!=bufC[1048566], -1.000000!=0.000000 
Error found srcVec2[1048567]!=bufC[1048567], 5.000000!=0.000000 
Error found srcVec2[1048568]!=bufC[1048568], -4.000000!=0.000000 
Error found srcVec2[1048570]!=bufC[1048570], -3.000000!=0.000000 
Error found srcVec2[1048571]!=bufC[1048571], -2.000000!=0.000000 
Error found srcVec2[1048572]!=bufC[1048572], -2.000000!=0.000000 
Error found srcVec2[1048573]!=bufC[1048573], -1.000000!=0.000000 
Error found srcVec2[1048574]!=bufC[1048574], -1.000000!=0.000000 
Error found srcVec2[1048575]!=bufC[1048575], 4.000000!=0.000000 
TEST failed with 916200 errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions