cpu: aarch64: enable ACL's inner-product for BF16 by fadara01 · Pull Request #5024 · uxlfoundation/oneDNN

fadara01 · 2026-04-15T14:21:07Z

Description

cpu: aarch64: enable ACL's inner-product for BF16

This benchdnn repro with ACL built with ARM-software/ComputeLibrary#1279 now goes to ACL which is > 100x faster:

ONEDNN_VERBOSE=all ./tests/benchdnn/benchdnn --ip --mode=C --dir=FWD_I  --bia-dt=bf16  --dt=bf16 mb1024ic1024oc1024

This PR + ARM-software/ComputeLibrary#1279 fix: pytorch/pytorch#180447

Fixes # (github issue)

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

Performance improvements

Have you submitted performance data that demonstrates performance improvements?

New features

Have you published an RFC for the new feature?
Was the RFC approved?
Have you added relevant tests?

Bug fixes

Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
Have you added relevant regression tests?

RFC PR

Does RFC document follow the template?
Have you added a link to the rendered document?

This PR + ARM-software/ComputeLibrary#1279 fix: pytorch/pytorch#180447

fadara01 · 2026-04-15T14:22:33Z

cc: @jondea @Sqvid

aditew01 · 2026-04-15T14:50:16Z

do you need to upgrade ACL version once this is merged? ARM-software/ComputeLibrary#1279

fadara01 · 2026-04-15T14:52:49Z

do you need to upgrade ACL version once this is merged? ARM-software/ComputeLibrary#1279

Nope, I think we can just merge this without updating ACL version - in this case oneDNN will just bail out a bit later when doing ACL validate.

aditew01 · 2026-04-15T14:57:36Z

Nope, I think we can just merge this without updating ACL version - in this case oneDNN will just bail out a bit later when doing ACL validate.

Yeah, i was referring to this:

This benchdnn repro now goes to ACL which is > 100x faster

fadara01 · 2026-04-15T15:03:24Z

@aditew01 - I see your point, I updated the description.

fadara01 · 2026-04-16T08:04:55Z

I assume we have CI here to test oneDNN main vs ACL main? In which case let's wait for the ACL PR ARM-software/ComputeLibrary#1279 to hit main before we merge this.

aditew01 · 2026-04-16T08:06:02Z

            && attr()->has_default_values(
                    smask_t::post_ops | smask_t::fpmath_mode, f32);
+    const bool is_bf16_ok = expect_data_types(bf16, bf16, bf16, bf16, undef)
+            && attr()->has_default_values(smask_t::post_ops, bf16);


I'm not sure if this extra check is needed here / is covered in ACL.

Suggested change

&& attr()->has_default_values(smask_t::post_ops, bf16);

&& attr()->has_default_values(smask_t::post_ops, bf16) && platform::has_data_type_support(data_type::bf16);

I'm not sure tbh, I think most of these checks are redundant anyways because the same checks will happen by calls to validate / has_opt_impl across ACL interfaces from CpuFullyConnected all the way to arm_gemm.

We have c6g CI here, we can test this in CI once the ACL PR is merged

jondea · 2026-04-16T09:01:04Z

I assume we have CI here to test oneDNN main vs ACL main

I don't think we have this currently

fadara01 · 2026-04-16T09:24:01Z

I don't think we have this currently

okay, I'll run the nightly test suite locally with both the oneDNN and ACL PRs.

fadara01 · 2026-04-16T11:00:33Z

OK, I built:

ACL with main + feat: Enable BF16 I/O for CpuFullyConnected in the experimental Operator API ARM-software/ComputeLibrary#1279 and
oneDNN with main + this PR + above ACL build, with -DDNNL_TEST_SET=NIGHTLY

then I ran the full nightly suite (246 tests) with ctest on Neoverse-V2 and everything passed

100% tests passed, 0 tests failed out of 246

Sqvid · 2026-04-20T13:48:52Z

@fadara01 If you want this change in the oneDNN 3.12 release then please raise a backport PR.

fadara01 · 2026-04-21T06:39:28Z

@Sqvid - I raised this backport PR against 3.12: #5061

cpu: aarch64: enable BF16 inner-product

16f5d54

This PR + ARM-software/ComputeLibrary#1279 fix: pytorch/pytorch#180447

fadara01 requested a review from a team as a code owner April 15, 2026 14:21

github-actions Bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Apr 15, 2026

fadara01 changed the title ~~cpu: aarch64: enable BF16 inner-product~~ cpu: aarch64: enable ACL's inner-product for BF16 Apr 15, 2026

fadara01 mentioned this pull request Apr 15, 2026

Poor BF16 torch.compile + Freezing perf on AArch64 CPUs pytorch/pytorch#180447

Open

aditew01 reviewed Apr 16, 2026

View reviewed changes

aditew01 approved these changes Apr 16, 2026

View reviewed changes

jondea approved these changes Apr 16, 2026

View reviewed changes

jondea merged commit 8318721 into uxlfoundation:main Apr 20, 2026
25 checks passed

fadara01 mentioned this pull request Apr 21, 2026

backport: cpu: aarch64: enable BF16 inner-product #5061

Merged

10 tasks

	&& attr()->has_default_values(smask_t::post_ops, bf16);
	&& attr()->has_default_values(smask_t::post_ops, bf16) && platform::has_data_type_support(data_type::bf16);

Conversation

fadara01 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

General

Performance improvements

New features

Bug fixes

RFC PR

Uh oh!

fadara01 commented Apr 15, 2026

Uh oh!

aditew01 commented Apr 15, 2026

Uh oh!

fadara01 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aditew01 commented Apr 15, 2026

Uh oh!

fadara01 commented Apr 15, 2026

Uh oh!

fadara01 commented Apr 16, 2026

Uh oh!

aditew01 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

fadara01 Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jondea commented Apr 16, 2026

Uh oh!

fadara01 commented Apr 16, 2026

Uh oh!

fadara01 commented Apr 16, 2026

Uh oh!

Uh oh!

Sqvid commented Apr 20, 2026

Uh oh!

fadara01 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fadara01 commented Apr 15, 2026 •

edited

Loading

fadara01 commented Apr 15, 2026 •

edited

Loading

fadara01 Apr 16, 2026 •

edited

Loading