Skip to content

MFDNN-14690: Replace XE3P_35_10/11/UNKNOWN Core enum values with Xe3p#4981

Open
dyoussif wants to merge 1 commit intomainfrom
dyoussif/hw_rebase
Open

MFDNN-14690: Replace XE3P_35_10/11/UNKNOWN Core enum values with Xe3p#4981
dyoussif wants to merge 1 commit intomainfrom
dyoussif/hw_rebase

Conversation

@dyoussif
Copy link
Copy Markdown
Contributor

@dyoussif dyoussif commented Apr 8, 2026

Collapse ngen::Core::XE3P_35_10, XE3P_35_11, and XE3P_UNKNOWN into a single ngen::Core::Xe3p value. Use ngen::ProductFamily to distinguish hardware-specific features where needed (512 GRFs, F4 DPAS support, SLM capacity).

addresses MFDNN-14960

@dyoussif dyoussif requested a review from a team as a code owner April 8, 2026 23:32
@github-actions github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel third_party labels Apr 8, 2026
@@ -1300,7 +1293,7 @@ class GRF : public Register
static constexpr int maxRegs() { return 512; }
static constexpr int maxRegs(HW hw) {
return (hw < HW::XeHP) ? 128
: (hw == HW::XE3P_35_11) ? 512
: (hw >= HW::Xe3p) ? 512
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential issue here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my understanding of the spec, all xe3p variants should support 512 grf.

static int max_slm_size(gpu_arch_t gpu_arch, gpu_product_t product);
static int max_slm_size_per_tg(gpu_arch_t gpu_arch, gpu_product_t product);
static int max_slm_size_per_tg(gpu_arch_t gpu_arch, int tg_size,
bool large_grf_mode, gpu_product_t product);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to remove the gpu_arch as inputs, as gpu_product() alone contains all the necessary information.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment thread src/gpu/intel/gemm/jit/generator/pieces/hw_utils.hpp Outdated
Comment thread src/gpu/intel/jit/utils/type_bridge.hpp Outdated
Comment thread src/gpu/intel/jit/utils/type_bridge.hpp Outdated
Comment thread src/gpu/intel/gemm/jit/include/gemmstone/problem.hpp Outdated
@dyoussif dyoussif force-pushed the dyoussif/hw_rebase branch from da01903 to c2f22a0 Compare April 13, 2026 20:25
Comment thread src/gpu/intel/conv/jit/config.cpp Outdated
Comment thread src/gpu/intel/conv/jit/plan.cpp Outdated
Comment thread src/gpu/intel/gemm/jit/generator/pieces/hw_utils.hpp Outdated
Comment thread src/gpu/intel/gemm/jit/include/gemmstone/generator.hpp Outdated
Comment thread src/gpu/intel/gemm/jit/include/gemmstone/problem.hpp Outdated
Comment on lines 288 to 289
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change these to NVLP and CRI as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the values we were instructed to use for unembargo.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with @karturov, we can name these however we like. We should use NVLP and CRI to match upstream naming and other names in the enum.

static int max_slm_size_per_tg(gpu_arch_t gpu_arch);
static int max_slm_size_per_tg(
gpu_arch_t gpu_arch, int tg_size, bool large_grf_mode = false);
static int max_slm_size(gpu_product_t product);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only called using a product coming from a device_info object, can we convert this to a non-static function with no arguments?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_slm_size is used in a static way here, doesn't look like product is coming from device_info:

size_t max_slm_size = compute::device_info_t::max_slm_size_per_tg(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's max_slm_size_per_tg(), not max_slm_size(). I'd be ok with a shortcut device info accessor in addition to the static version if you want to keep the static one around.

Comment thread src/gpu/intel/compute/device_info.cpp Outdated
Comment thread src/gpu/intel/compute/device_info.cpp Outdated
Comment thread src/gpu/intel/conv/jit/plan.cpp Outdated
Comment thread src/gpu/intel/gemm/jit/selector/kernel_selector.cpp Outdated
Comment thread src/gpu/intel/jit/utils/type_bridge.hpp Outdated
Comment thread src/gpu/intel/gemm/jit/generator/strategy.cpp Outdated
Comment thread src/gpu/intel/gemm/jit/generator/pieces/copy_plan.cpp Outdated
Comment thread src/gpu/intel/gemm/jit/generator/pieces/hw_utils.hpp Outdated
@dyoussif
Copy link
Copy Markdown
Contributor Author

make test
set test_scope=NIGHTLY
disable test_device_cpu
disable benchdnn_all
enable benchdnn_conv
enable benchdnn_deconv
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg
enable arch_gpu_xe3p-lpg

@dyoussif dyoussif force-pushed the dyoussif/hw_rebase branch from c2f22a0 to f62297a Compare April 14, 2026 23:41
@dyoussif
Copy link
Copy Markdown
Contributor Author

make test
set test_scope=NIGHTLY
disable test_device_cpu
disable benchdnn_all
enable benchdnn_conv
enable benchdnn_deconv
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg
enable arch_gpu_xe3p-lpg

Collapse ngen::Core::XE3P_35_10, XE3P_35_11, and XE3P_UNKNOWN into a
single ngen::Core::Xe3p value. Use ngen::ProductFamily to distinguish
hardware-specific features where needed (F4 DPAS support,
SLM capacity).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dyoussif dyoussif force-pushed the dyoussif/hw_rebase branch from f62297a to 60fc6b4 Compare April 15, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel third_party

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants