Skip to content

cpu: aarch64: run ACL post-op eltwise on current buffer#5101

Open
leoken01 wants to merge 3 commits intouxlfoundation:mainfrom
leoken01:fix/art-eltwise-po
Open

cpu: aarch64: run ACL post-op eltwise on current buffer#5101
leoken01 wants to merge 3 commits intouxlfoundation:mainfrom
leoken01:fix/art-eltwise-po

Conversation

@leoken01
Copy link
Copy Markdown

@leoken01 leoken01 commented Apr 30, 2026

Description

This PR fixes ACL post-op eltwise execution for AArch64 primitives so nested
eltwise post-ops run on the current intermediate buffer instead of always using
the original destination argument.

The issue affects ACL post-op chains where an eltwise post-op is executed after
another post-op and needs to consume the current post-op buffer. The fix executes
nested eltwise post-ops through the primitive interface with runtime memory bound
to the active source buffer.

This PR also registers nested scratchpads for ACL post-op primitives in the
affected AArch64 primitives.

Fixes # N/A

Validation

ONEDNN_VERBOSE=profile_exec ./build/tests/benchdnn/benchdnn --graph --mode=R --in-shapes=0:16x256+1:256x1+2:1x1
--case=pattern/int8/int8_matmul_logistic_fusion.json

Result:
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?

Bug fixes

@leoken01 leoken01 requested a review from a team as a code owner April 30, 2026 12:28
@github-actions github-actions Bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Apr 30, 2026
@leoken01 leoken01 force-pushed the fix/art-eltwise-po branch from 29873cd to 2e5edc0 Compare April 30, 2026 13:32
= dynamic_cast<acl_eltwise_fwd_t *>(post_op.get());
if (eltwise_post_op == nullptr) return status::runtime_error;

if (dst_data_type == data_type::f16) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we dropped our special handling for f16 eltwise in this patch?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do f16 post ops need to be included specifically? This was changed to allow oneDNN selected primitives to define the data types, @puneetmatharu may know more on this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previouslty numerically incorrect, the way to do f16 post ops is to do a f16*f16->f32 matmul, do post ops in f32 and then cast down to f16 after. It is worth checking that we haven't dropped support for anything though @leoken01. I would build up a benchdnn input set which previously ran with ACL, and just check they all still go to ACL after

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, checking and will report back

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of criteria or checks should the benchdnn input set look for when checking if they still go after acl? Not sure what to 'base' the input set on, to make sure it's correct and we're not missing anything.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with tests/benchdnn/inputs/matmul/test_matmul_ci as a good starting point, and then maybe try to stress test different eltwise post-ops combinations from there. Look for the number of problems dispatched to ACL matmul before and after the patch and any correctness issues that might show.

Comment thread src/cpu/aarch64/matmul/acl_lowp_matmul.cpp Outdated
= dynamic_cast<acl_eltwise_fwd_t *>(post_op.get());
if (eltwise_post_op == nullptr) return status::runtime_error;

if (dst_data_type == data_type::f16) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previouslty numerically incorrect, the way to do f16 post ops is to do a f16*f16->f32 matmul, do post ops in f32 and then cast down to f16 after. It is worth checking that we haven't dropped support for anything though @leoken01. I would build up a benchdnn input set which previously ran with ACL, and just check they all still go to ACL after

Comment thread src/cpu/aarch64/acl_post_ops.hpp Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you have generalised these post ops to use not just acl, I think it would make sense to rename this as something more descriptive. E.g.

post_ops_fallback_t

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want this just for acl_post_ops.hpp, or (I;m assuming) all files within this workspace? (acl_lowp_matmul.cpp etc...)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just this post ops class. The class is no longer specific to ACL, but all the other classes still are

@leoken01 leoken01 force-pushed the fix/art-eltwise-po branch from 5173c56 to 85a23ad Compare May 7, 2026 14:41
@Sqvid Sqvid requested a review from jondea May 8, 2026 08:30
Run nested eltwise post-op primitives on the current post-op buffer by passing runtime memory objects through the execution context, and register nested scratchpads for ACL post-op users.

Co-authored-by: Puneet Matharu <[email protected]>
@leoken01 leoken01 force-pushed the fix/art-eltwise-po branch from 85a23ad to d01ba7c Compare May 8, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants