cpu: aarch64: run ACL post-op eltwise on current buffer#5101
cpu: aarch64: run ACL post-op eltwise on current buffer#5101leoken01 wants to merge 3 commits intouxlfoundation:mainfrom
Conversation
(cherry picked from commit 0abc32b)
29873cd to
2e5edc0
Compare
| = dynamic_cast<acl_eltwise_fwd_t *>(post_op.get()); | ||
| if (eltwise_post_op == nullptr) return status::runtime_error; | ||
|
|
||
| if (dst_data_type == data_type::f16) { |
There was a problem hiding this comment.
Looks like we dropped our special handling for f16 eltwise in this patch?
There was a problem hiding this comment.
Do f16 post ops need to be included specifically? This was changed to allow oneDNN selected primitives to define the data types, @puneetmatharu may know more on this?
There was a problem hiding this comment.
This was previouslty numerically incorrect, the way to do f16 post ops is to do a f16*f16->f32 matmul, do post ops in f32 and then cast down to f16 after. It is worth checking that we haven't dropped support for anything though @leoken01. I would build up a benchdnn input set which previously ran with ACL, and just check they all still go to ACL after
There was a problem hiding this comment.
ACK, checking and will report back
There was a problem hiding this comment.
What kind of criteria or checks should the benchdnn input set look for when checking if they still go after acl? Not sure what to 'base' the input set on, to make sure it's correct and we're not missing anything.
There was a problem hiding this comment.
I would go with tests/benchdnn/inputs/matmul/test_matmul_ci as a good starting point, and then maybe try to stress test different eltwise post-ops combinations from there. Look for the number of problems dispatched to ACL matmul before and after the patch and any correctness issues that might show.
| = dynamic_cast<acl_eltwise_fwd_t *>(post_op.get()); | ||
| if (eltwise_post_op == nullptr) return status::runtime_error; | ||
|
|
||
| if (dst_data_type == data_type::f16) { |
There was a problem hiding this comment.
This was previouslty numerically incorrect, the way to do f16 post ops is to do a f16*f16->f32 matmul, do post ops in f32 and then cast down to f16 after. It is worth checking that we haven't dropped support for anything though @leoken01. I would build up a benchdnn input set which previously ran with ACL, and just check they all still go to ACL after
There was a problem hiding this comment.
Now you have generalised these post ops to use not just acl, I think it would make sense to rename this as something more descriptive. E.g.
post_ops_fallback_t
There was a problem hiding this comment.
do we want this just for acl_post_ops.hpp, or (I;m assuming) all files within this workspace? (acl_lowp_matmul.cpp etc...)
There was a problem hiding this comment.
Just this post ops class. The class is no longer specific to ACL, but all the other classes still are
5173c56 to
85a23ad
Compare
Run nested eltwise post-op primitives on the current post-op buffer by passing runtime memory objects through the execution context, and register nested scratchpads for ACL post-op users. Co-authored-by: Puneet Matharu <[email protected]>
85a23ad to
d01ba7c
Compare
Description
This PR fixes ACL post-op eltwise execution for AArch64 primitives so nested
eltwise post-ops run on the current intermediate buffer instead of always using
the original destination argument.
The issue affects ACL post-op chains where an eltwise post-op is executed after
another post-op and needs to consume the current post-op buffer. The fix executes
nested eltwise post-ops through the primitive interface with runtime memory bound
to the active source buffer.
This PR also registers nested scratchpads for ACL post-op primitives in the
affected AArch64 primitives.
Fixes # N/A
Validation
Checklist
General
Bug fixes
Fixes segfault here: revert: cpu: aarch64: "support arbitrary eltwise post-ops..." #3950
Enables feature here: cpu: aarch64: support arbitrary eltwise post-ops in acl_post_ops_t #3767