Add a new GEMV kernel to BRGEMM and enable it in MatMul by densamoilov · Pull Request #5077 · uxlfoundation/oneDNN

densamoilov · 2026-04-24T20:30:06Z

This PR adds a new GEMV kernel to BRGEMM to support the remaining cases and complete GEMV coverage.

The existing and the new GEMV kernels enable all four GEMV cases required for full support across layout and parameter combinations.

GEMV coverage in MatMul

Vector dimension	A layout	B layout	Corresponding BRGEMV Operation	BRGEMV `transA` parameter	BRGEMV `treat_y_as_row` parameter
N = 1	ab	ab,ba	`y = A * x`	false	n/a
M = 1	ab,ba	ba	`yᵀ = xᵀ * Aᵀ`	false	if true output is `yᵀ`
N = 1	ba	ab,ba	`y = Aᵀ * x`	true	n/a
M = 1	ab,ba	ab	`yᵀ = xᵀ * A`	true	if true output is `yᵀ`

Note

transA: selects whether the BRGEMV uses A or Aᵀ
treat_y_as_row: for M=1, interprets y as a row vector
Batch dimensions are supported
Bias, post-ops and scales are supported

At the matmul level, these GEMV configurations are represented via gemv_strategy_t.
At the BRGEMM level, they are implemented using transA and treat_y_as_row.

Performance
Performance was evaluated on ADL and SRF showing parity with auto-generated GEMM kernels.
As a result, GEMM implementations are no longer used in performance validation and have been fully replaced by BRGEMM matmul.

densamoilov · 2026-04-24T21:06:13Z

make test

densamoilov · 2026-04-27T20:12:05Z

make test

densamoilov · 2026-04-27T21:40:13Z

make test

densamoilov · 2026-04-28T20:14:15Z

make test

This kernel will enable matmul for the following cases: - A is a matrix, B is a vector, and A is transposed - A is a vector, B is a matrix, and B is not transposed

Redirect GEMV cases to GEMV code path when fpmath is not default because it's expected to be faster than the GEMM path.

brgemm_matmul now has broad support for GEMV cases. The only exception is cases with unusual input/output layouts. However, the GEMV code path in auto-generated GEMM is also not expected to support them. Therefore, the decision is to always use brgemm_matmul, whether for the GEMV path or the regular GEMM path for those exceptions and avoid falling back to auto-generated GEMM.

densamoilov requested a review from a team as a code owner April 24, 2026 20:30

github-actions Bot added the platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 label Apr 24, 2026

ankalinin approved these changes Apr 24, 2026

View reviewed changes

georgen117 reviewed Apr 25, 2026

View reviewed changes

densamoilov force-pushed the dsamoylo/main/gemv branch 2 times, most recently from 8d09d25 to 4b4f4ff Compare April 27, 2026 20:11

densamoilov force-pushed the dsamoylo/main/gemv branch from 4b4f4ff to c3970f3 Compare April 27, 2026 21:38

densamoilov force-pushed the dsamoylo/main/gemv branch from c3970f3 to 4793a28 Compare April 27, 2026 21:52

inteldimitrius approved these changes Apr 28, 2026

View reviewed changes

Comment thread src/cpu/x64/brgemm/jit_brgemm_kernel.cpp Outdated

densamoilov added 12 commits April 30, 2026 07:44

cpu: x64: brgemm: simplify reduce_gemv_accumulators function

e65bba4

cpu: x64: brgemm: add comments for gemv fields in brgemm desc

7bd3de9

cpu: x64: brgemm: unify avx mask for gemv and non-gemv

6109ff9

cpu: x64: brgemm: simplify gemv_microkernel for non-transA case

88b7fc5

cpu: x64: brgemm: introduce gemv-specific a/b vmm getters

7e7ce7d

cpu: x64: brgemm: mark const member funcitons

ee55c1b

cpu: x64: brgemm: move wei application to a separate function

29f1a4a

cpu: x64: brgemm: introduce transA gemv kernel

5bc87f9

cpu: x64: matmul: disable copy A for bad lda for gemv cases

dd9d704

cpu: x64: matmul: enable transA gemv brgemm kernel

3f57331

This kernel will enable matmul for the following cases: - A is a matrix, B is a vector, and A is transposed - A is a vector, B is a matrix, and B is not transposed

cpu: x64: matmul: allow fpmath for gemv code path

741b3bd

Redirect GEMV cases to GEMV code path when fpmath is not default because it's expected to be faster than the GEMM path.

densamoilov force-pushed the dsamoylo/main/gemv branch from 4793a28 to 182a729 Compare April 30, 2026 14:44

densamoilov merged commit 4f7899a into main Apr 30, 2026
14 of 17 checks passed

densamoilov deleted the dsamoylo/main/gemv branch April 30, 2026 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new GEMV kernel to BRGEMM and enable it in MatMul#5077

Add a new GEMV kernel to BRGEMM and enable it in MatMul#5077
densamoilov merged 12 commits intomainfrom
dsamoylo/main/gemv

densamoilov commented Apr 24, 2026 •

edited

Loading

Uh oh!

densamoilov commented Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

densamoilov commented Apr 27, 2026

Uh oh!

densamoilov commented Apr 27, 2026

Uh oh!

Uh oh!

densamoilov commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

densamoilov commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

densamoilov commented Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

densamoilov commented Apr 27, 2026

Uh oh!

densamoilov commented Apr 27, 2026

Uh oh!

Uh oh!

densamoilov commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

densamoilov commented Apr 24, 2026 •

edited

Loading