Description
When running the example_gemm/gemm_bf16.py script on a Ryzen AI 9 HX 370 processor, the project compiles successfully, but the final verification step fails with a large number of errors. The execution log indicates that the NPU output buffer (bufC) is filled with zeros, while the expected output (srcVec2) contains non-zero values. This suggests the NPU kernel may not have executed correctly or produced any output.
I found a potentially related issue, #70, which also reports verification errors. However, there seems to be a key difference. In issue #70, the errors are minor floating-point discrepancies (e.g., 472.356537!=472.356232). In my case, the output is entirely zero (e.g., 8.000000!=0.000000), indicating a more fundamental problem, rather than a precision issue.
Any help in diagnosing this issue would be greatly appreciated. Thank you!
My Environment
| Item |
Details |
| CPU |
AMD Ryzen AI 9 HX 370 |
| Operating System |
Ubuntu 24.04.2 LTS |
| Aries Project Version |
c54706b |
| Toolchain Info |
mlir_aie and llvm-aie were installed using the Aries/utils/quick_setup.sh script. |
| Linux Kernel |
6.14.0-28-generic |
| Vitis Version |
2024.2 |
| XRT Version |
2.20.0 |
| NPU Firmware Version |
255.0.2.7 |
Full xbutil examine output:
System Configuration
OS Name : Linux
Release : 6.14.0-28-generic
Machine : x86_64
CPU Cores : 24
Memory : 23640 MB
Distribution : Ubuntu 24.04.2 LTS
GLIBC : 2.39
Model : AI Series
BIOS Vendor : American Megatrends International, LLC.
BIOS Version : 1.04
XRT
Version : 2.20.0
Branch : HEAD
Hash : a62adc1020c901af79529457c46f210aa05f15a3
Hash Date : 2025-08-22 19:59:38
amdxdna : 2.20.0_20250822, e9d2788a884784e3531e95d65b923c2252a1132e
virtio-pci : unknown, unknown
NPU Firmware Version : 255.0.2.7
Device(s) Present
|BDF |Name |
|----------------|-----------|
|[0000:c5:00.1] |NPU Strix |
Full Log
Here is the complete log from the execution of the script with make run.
mkdir -p build
/home/ai/Aries/my_install/llvm-aie/bin/clang++ -O2 -v -std=c++20 --target=aie2-none-unknown-elf -Wno-parentheses -Wno-attributes -Wno-macro-redefined -DNDEBUG -I /home/ai/Aries/example_new/example_NPU/example_gemm/../../../templates/aie2/origin/common -I /home/ai/Aries/my_install/mlir_aie/include -c aie/kernel_gemm.cc -o build/kernel_gemm.o
mkdir -p .
cd build && /home/ai/Aries/my_install/mlir_aie/bin/aiecc.py \
--alloc-scheme=basic-sequential \
--aie-generate-cdo \
--no-compile-host \
--xclbin-name=gemm.xclbin \
--no-xchesscc \
--no-xbridge \
--peano /home/ai/Aries/my_install/llvm-aie \
--aie-generate-npu --npu-insts-name=insts.txt ../gemm.adf.mlir
****** Bootgen v2024.2
**** Build date : Nov 8 2024-16:21:57
** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
** Copyright 2022-2024 Advanced Micro Devices, Inc. All Rights Reserved.
[INFO] : Bootimage generated successfully
Info: Embedded Metadata section is missing project.platform.device.core element, adding it.
Found xchesscc at /tools/Xilinx/Vitis/2024.2/aietools
AIE Compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 17/17 4 Workers
Generating: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/build/gemm.adf.mlir.prj/aie_cdo_elfs.bin
Generating: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/build/gemm.adf.mlir.prj/aie_cdo_init.bin
Generating: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/build/gemm.adf.mlir.prj/aie_cdo_enable.bin
rm -rf _build
mkdir -p _build
cd _build && cmake ../ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DMLIR_AIE_DIR=/home/ai/Aries/my_install/mlir_aie/../../externals/mlir-aie/ -D CMAKE_C_COMPILER=gcc-13 -D CMAKE_CXX_COMPILER=g++-13 -DTARGET_NAME=hostexe -Dsubdir=host \
&& cmake --build . --config Release
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc-13 - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++-13 - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0")
-- Configuring done (0.3s)
-- Generating done (0.0s)
-- Build files have been written to: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build
gmake[1]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[2]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[3]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[3]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[3]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
[ 33%] Building CXX object CMakeFiles/hostexe.dir/home/ai/Aries/externals/mlir-aie/runtime_lib/test_lib/test_utils.cpp.o
[ 66%] Building CXX object CMakeFiles/hostexe.dir/host/host.cpp.o
[100%] Linking CXX executable hostexe
gmake[3]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
[100%] Built target hostexe
gmake[2]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[1]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
cp -r _build/hostexe ./
./hostexe -x build/gemm.xclbin -i build/insts.txt -k MLIR_AIE -v 2 --verify 1 --warmup 10 --iters 20
Sequence instr count: 1092
Loading xclbin: build/gemm.xclbin
Kernel opcode: MLIR_AIE
Name: MLIR_AIE
Registering xclbin: build/gemm.xclbin
Getting hardware context.
Getting handle to kernel:MLIR_AIE
Warmup Kernel.
Running Kernel.
NPU execution time: 4.096s
Error found srcVec2[0]!=bufC[0], 8.000000!=0.000000
Error found srcVec2[1]!=bufC[1], 11.000000!=0.000000
Error found srcVec2[3]!=bufC[3], -5.000000!=0.000000
Error found srcVec2[4]!=bufC[4], 1.000000!=0.000000
Error found srcVec2[5]!=bufC[5], -2.000000!=0.000000
Error found srcVec2[6]!=bufC[6], 3.000000!=0.000000
Error found srcVec2[7]!=bufC[7], 2.000000!=0.000000
Error found srcVec2[8]!=bufC[8], -6.000000!=0.000000
Error found srcVec2[9]!=bufC[9], 1.000000!=0.000000
Error found srcVec2[10]!=bufC[10], 2.000000!=0.000000
...
...
Error found srcVec2[1048565]!=bufC[1048565], 4.000000!=0.000000
Error found srcVec2[1048566]!=bufC[1048566], -1.000000!=0.000000
Error found srcVec2[1048567]!=bufC[1048567], 5.000000!=0.000000
Error found srcVec2[1048568]!=bufC[1048568], -4.000000!=0.000000
Error found srcVec2[1048570]!=bufC[1048570], -3.000000!=0.000000
Error found srcVec2[1048571]!=bufC[1048571], -2.000000!=0.000000
Error found srcVec2[1048572]!=bufC[1048572], -2.000000!=0.000000
Error found srcVec2[1048573]!=bufC[1048573], -1.000000!=0.000000
Error found srcVec2[1048574]!=bufC[1048574], -1.000000!=0.000000
Error found srcVec2[1048575]!=bufC[1048575], 4.000000!=0.000000
TEST failed with 916200 errors
Description
When running the
example_gemm/gemm_bf16.pyscript on a Ryzen AI 9 HX 370 processor, the project compiles successfully, but the final verification step fails with a large number of errors. The execution log indicates that the NPU output buffer (bufC) is filled with zeros, while the expected output (srcVec2) contains non-zero values. This suggests the NPU kernel may not have executed correctly or produced any output.I found a potentially related issue, #70, which also reports verification errors. However, there seems to be a key difference. In issue #70, the errors are minor floating-point discrepancies (e.g., 472.356537!=472.356232). In my case, the output is entirely zero (e.g., 8.000000!=0.000000), indicating a more fundamental problem, rather than a precision issue.
Any help in diagnosing this issue would be greatly appreciated. Thank you!
My Environment
mlir_aieandllvm-aiewere installed using theAries/utils/quick_setup.shscript.Full xbutil examine output:
Full Log
Here is the complete log from the execution of the script with
make run.