aarch64: support for per_dim_0 scales and bf16 dst_dt in jit int8 matmul by michalowski-arm · Pull Request #4987 · uxlfoundation/oneDNN

michalowski-arm · 2026-04-09T10:38:41Z

Description

This change adds AArch64 jit:int8 matmul support for row-wise source scales (src:per_dim_0) and bf16 destination output.

The original motivation for this change is to improve W8A8 serving performance for workloads such as Llama and Whisper in vLLM. In these models, activations are symmetrically quantized with per-token scales, which map to matmul src:per_dim_0 scales (the matmul M dimension).

Before this change, the AArch64 jit:int8 matmul path did not support this case fully, so vLLM had to run an additional epilogue to apply the activation scales and convert the result to bf16. With this PR, the scale application and bf16 destination handling can stay inside the oneDNN matmul path, removing that extra epilogue and reducing extra memory traffic and output-side work.

In vLLM testing, this improved output-token throughput by roughly 5–10%, depending on the model.

Checklist

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

dzarukin

Please share the motivation of extending functional capabilities and confirm the readiness to enable it uniformly across the stack. The further guidance will follow based on those answers. Thank you.

dzarukin · 2026-04-09T16:19:20Z

+                res->reason = skip_reason::case_not_supported;
+                return;
+            }
+        }


The preferred way of doing things is to enable reference (shared implementations between backends) to avoid multiple parties handling the case that is going to be supported in a single point of a single backend.

That makes sense. I can rework it to use ref_matmul_int8 as the generic fallback instead. If I’m not wrong, the support is already there and the case is only rejected during pd creation. Would that work?

Yes, it will.
Shared parts besides touched also include matmul_pd::attr_scales_ok() function and ref_matmul.{c,h}pp files.

Let me know if the approach I took with the latest commit works.

dzarukin · 2026-04-30T19:15:10Z

            const std::vector<int> &supported_qmodes
-            = {quantization_mode::static_sazp}) const {
+            = {quantization_mode::static_sazp},
+            bool allow_src_per_m = false) const {


It seems to me the signature of this function is not very suitable for mask support extension in one or two specific matmul implementations and should be further extended.

If you need to wrap up this activity earlier, I suggest to expand implementation local conditions and let new mask in.
If you are up to take an extra mile, I think the idea below should suit for the future:

// This function covers a common ground across ALL implementations. Masks that are supported // by exclusive implementations can be passed through `extra_masks`. It takes an elements as pairs of // `{arg, {1, 5, ...}}` and will be additionally checked. virtual bool attr_scales_ok(const std::vector<int> &supported_args = {DNNL_ARG_SRC, DNNL_ARG_WEIGHTS, DNNL_ARG_DST}, const std::vector<int> &supported_qmodes = {quantization_mode::static_sazp}, const std::map<int, std::vector<int>> &extra_masks = {}) { ... // Masks supported in all implementations. bool mask_ok = utils::one_of(mask, 0, src_qmask_K(), src_qmask_M() + src_qmask_K(), full_tensor_mask()); // If mask passed wasn't found, let check extra masks coming from the impl. if (!mask_ok) { if (extra_masks.find(arg) != extra_masks.end()) { for (auto &em : extra_masks.at(arg).second) if (mask == em) mask_ok = true; } } ...

Having it this way should prevent from updating other backends without affecting their testing results and support capabilities.

michalowski-arm requested review from a team as code owners April 9, 2026 10:38

github-actions Bot added platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 component:tests Codeowner: @oneapi-src/onednn-arch component:common labels Apr 9, 2026

dzarukin reviewed Apr 9, 2026

View reviewed changes

aditew01 mentioned this pull request Apr 15, 2026

[AARCH64] Enable MKLDNN backend for AArch64 for INT8 Matmul, using torch op _int_mm_acc. pytorch/pytorch#180455

Open

michalowski-arm force-pushed the jitint8@final branch from 7056882 to 3cc0836 Compare April 30, 2026 10:11

dzarukin reviewed Apr 30, 2026

View reviewed changes

michalowski-arm added 3 commits May 7, 2026 10:24

aarch64: matmul: add support for src scales per_dim_0 in jit int8

33e5495

aarch64: matmul: add support for dst_dt bf16 in jit int8

5505a63

cpu: matmul: generalize src M scale mask support

0915e90

michalowski-arm force-pushed the jitint8@final branch from 3cc0836 to 0915e90 Compare May 7, 2026 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aarch64: support for per_dim_0 scales and bf16 dst_dt in jit int8 matmul#4987

aarch64: support for per_dim_0 scales and bf16 dst_dt in jit int8 matmul#4987
michalowski-arm wants to merge 3 commits intouxlfoundation:mainfrom
michalowski-arm:jitint8@final

michalowski-arm commented Apr 9, 2026 •

edited

Loading

Uh oh!

dzarukin left a comment

Uh oh!

Uh oh!

dzarukin Apr 9, 2026

Uh oh!

michalowski-arm Apr 10, 2026

Uh oh!

dzarukin Apr 17, 2026

Uh oh!

michalowski-arm Apr 30, 2026

Uh oh!

dzarukin Apr 30, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

michalowski-arm commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

dzarukin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dzarukin Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

michalowski-arm Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

dzarukin Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

michalowski-arm Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

dzarukin Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michalowski-arm commented Apr 9, 2026 •

edited

Loading