cpu: aarch64: enable ACL's inner-product for BF16#5024
cpu: aarch64: enable ACL's inner-product for BF16#5024jondea merged 1 commit intouxlfoundation:mainfrom
Conversation
|
do you need to upgrade ACL version once this is merged? ARM-software/ComputeLibrary#1279 |
Nope, I think we can just merge this without updating ACL version - in this case oneDNN will just bail out a bit later when doing ACL validate. |
Yeah, i was referring to this:
|
|
@aditew01 - I see your point, I updated the description. |
|
I assume we have CI here to test oneDNN main vs ACL main? In which case let's wait for the ACL PR ARM-software/ComputeLibrary#1279 to hit main before we merge this. |
| && attr()->has_default_values( | ||
| smask_t::post_ops | smask_t::fpmath_mode, f32); | ||
| const bool is_bf16_ok = expect_data_types(bf16, bf16, bf16, bf16, undef) | ||
| && attr()->has_default_values(smask_t::post_ops, bf16); |
There was a problem hiding this comment.
I'm not sure if this extra check is needed here / is covered in ACL.
| && attr()->has_default_values(smask_t::post_ops, bf16); | |
| && attr()->has_default_values(smask_t::post_ops, bf16) && platform::has_data_type_support(data_type::bf16); |
There was a problem hiding this comment.
I'm not sure tbh, I think most of these checks are redundant anyways because the same checks will happen by calls to validate / has_opt_impl across ACL interfaces from CpuFullyConnected all the way to arm_gemm.
We have c6g CI here, we can test this in CI once the ACL PR is merged
I don't think we have this currently |
okay, I'll run the nightly test suite locally with both the oneDNN and ACL PRs. |
|
OK, I built:
then I ran the full nightly suite (246 tests) with |
|
@fadara01 If you want this change in the oneDNN 3.12 release then please raise a backport PR. |
Description
cpu: aarch64: enable ACL's inner-product for BF16
This benchdnn repro with ACL built with ARM-software/ComputeLibrary#1279 now goes to ACL which is > 100x faster:
This PR + ARM-software/ComputeLibrary#1279 fix: pytorch/pytorch#180447
Fixes # (github issue)
Checklist
General
make testandmake test_benchdnn_*) pass locally for each commit?Performance improvements
New features
Bug fixes
RFC PR