Skip to content

Add a new GEMV kernel to BRGEMM and enable it in MatMul#5077

Merged
densamoilov merged 12 commits intomainfrom
dsamoylo/main/gemv
Apr 30, 2026
Merged

Add a new GEMV kernel to BRGEMM and enable it in MatMul#5077
densamoilov merged 12 commits intomainfrom
dsamoylo/main/gemv

Conversation

@densamoilov
Copy link
Copy Markdown
Contributor

@densamoilov densamoilov commented Apr 24, 2026

This PR adds a new GEMV kernel to BRGEMM to support the remaining cases and complete GEMV coverage.

The existing and the new GEMV kernels enable all four GEMV cases required for full support across layout and parameter combinations.

GEMV coverage in MatMul

Vector dimension A layout B layout Corresponding BRGEMV Operation BRGEMV transA parameter BRGEMV treat_y_as_row parameter
N = 1 ab ab,ba y = A * x false n/a
M = 1 ab,ba ba yᵀ = xᵀ * Aᵀ false if true output is yᵀ
N = 1 ba ab,ba y = Aᵀ * x true n/a
M = 1 ab,ba ab yᵀ = xᵀ * A true if true output is yᵀ

Note

  • transA: selects whether the BRGEMV uses A or Aᵀ
  • treat_y_as_row: for M=1, interprets y as a row vector
  • Batch dimensions are supported
  • Bias, post-ops and scales are supported

At the matmul level, these GEMV configurations are represented via gemv_strategy_t.
At the BRGEMM level, they are implemented using transA and treat_y_as_row.

Performance
Performance was evaluated on ADL and SRF showing parity with auto-generated GEMM kernels.
As a result, GEMM implementations are no longer used in performance validation and have been fully replaced by BRGEMM matmul.
image
image

@densamoilov densamoilov requested a review from a team as a code owner April 24, 2026 20:30
@github-actions github-actions Bot added the platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 label Apr 24, 2026
@densamoilov
Copy link
Copy Markdown
Contributor Author

make test

Comment thread src/cpu/x64/brgemm/jit_brgemm_kernel.cpp Outdated
Comment thread src/cpu/x64/brgemm/brgemm_types.hpp
Comment thread src/cpu/x64/brgemm/brgemm_types.hpp
Comment thread src/cpu/x64/brgemm/jit_brgemm_kernel.cpp Outdated
Comment thread src/cpu/x64/matmul/brgemm_matmul_utils.cpp
Comment thread src/cpu/x64/brgemm/brgemm_utils.cpp
@densamoilov densamoilov force-pushed the dsamoylo/main/gemv branch 2 times, most recently from 8d09d25 to 4b4f4ff Compare April 27, 2026 20:11
@densamoilov
Copy link
Copy Markdown
Contributor Author

make test

@densamoilov
Copy link
Copy Markdown
Contributor Author

make test

Comment thread src/cpu/x64/brgemm/jit_brgemm_kernel.cpp Outdated
@densamoilov
Copy link
Copy Markdown
Contributor Author

make test

This kernel will enable matmul for the following cases:
- A is a matrix, B is a vector, and A is transposed
- A is a vector, B is a matrix, and B is not transposed
Redirect GEMV cases to GEMV code path when fpmath is not default
because it's expected to be faster than the GEMM path.
brgemm_matmul now has broad support for GEMV cases. The only exception is
cases with unusual input/output layouts. However, the GEMV code path in
auto-generated GEMM is also not expected to support them. Therefore, the
decision is to always use brgemm_matmul, whether for the GEMV path or the
regular GEMM path for those exceptions and avoid falling back to
auto-generated GEMM.
@densamoilov densamoilov merged commit 4f7899a into main Apr 30, 2026
14 of 17 checks passed
@densamoilov densamoilov deleted the dsamoylo/main/gemv branch April 30, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants