MFDNN-14690: Replace XE3P_35_10/11/UNKNOWN Core enum values with Xe3p by dyoussif · Pull Request #4981 · uxlfoundation/oneDNN

dyoussif · 2026-04-08T23:32:25Z

Collapse ngen::Core::XE3P_35_10, XE3P_35_11, and XE3P_UNKNOWN into a single ngen::Core::Xe3p value. Use ngen::ProductFamily to distinguish hardware-specific features where needed (512 GRFs, F4 DPAS support, SLM capacity).

addresses MFDNN-14960

umar456 · 2026-04-08T23:44:21Z

@@ -1300,7 +1293,7 @@ class GRF : public Register
    static constexpr int maxRegs()                         { return 512; }
    static constexpr int maxRegs(HW hw) {
        return (hw < HW::XeHP) ? 128
-            : (hw == HW::XE3P_35_11) ? 512
+            : (hw >= HW::Xe3p) ? 512


Potential issue here.

Per my understanding of the spec, all xe3p variants should support 512 grf.

rjoursler · 2026-04-09T14:34:02Z

+    static int max_slm_size(gpu_arch_t gpu_arch, gpu_product_t product);
+    static int max_slm_size_per_tg(gpu_arch_t gpu_arch, gpu_product_t product);
+    static int max_slm_size_per_tg(gpu_arch_t gpu_arch, int tg_size,
+            bool large_grf_mode, gpu_product_t product);


We should be able to remove the gpu_arch as inputs, as gpu_product() alone contains all the necessary information.

Simonsays095 · 2026-04-14T18:54:39Z

Please change these to NVLP and CRI as well.

These are the values we were instructed to use for unembargo.

Confirmed with @karturov, we can name these however we like. We should use NVLP and CRI to match upstream naming and other names in the enum.

Simonsays095 · 2026-04-14T19:00:39Z

-    static int max_slm_size_per_tg(gpu_arch_t gpu_arch);
-    static int max_slm_size_per_tg(
-            gpu_arch_t gpu_arch, int tg_size, bool large_grf_mode = false);
+    static int max_slm_size(gpu_product_t product);


Since this is only called using a product coming from a device_info object, can we convert this to a non-static function with no arguments?

max_slm_size is used in a static way here, doesn't look like product is coming from device_info:

oneDNN/src/gpu/intel/jit/codegen/kernel_ext.cpp

Line 108 in f62297a

size_t max_slm_size = compute::device_info_t::max_slm_size_per_tg(

That's max_slm_size_per_tg(), not max_slm_size(). I'd be ok with a shortcut device info accessor in addition to the static version if you want to keep the static one around.

dyoussif · 2026-04-14T23:39:23Z

make test
set test_scope=NIGHTLY
disable test_device_cpu
disable benchdnn_all
enable benchdnn_conv
enable benchdnn_deconv
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg
enable arch_gpu_xe3p-lpg

dyoussif · 2026-04-14T23:41:57Z

make test
set test_scope=NIGHTLY
disable test_device_cpu
disable benchdnn_all
enable benchdnn_conv
enable benchdnn_deconv
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg
enable arch_gpu_xe3p-lpg

Collapse ngen::Core::XE3P_35_10, XE3P_35_11, and XE3P_UNKNOWN into a single ngen::Core::Xe3p value. Use ngen::ProductFamily to distinguish hardware-specific features where needed (F4 DPAS support, SLM capacity). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dyoussif requested a review from a team as a code owner April 8, 2026 23:32

github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel third_party labels Apr 8, 2026

umar456 reviewed Apr 8, 2026

View reviewed changes

rjoursler reviewed Apr 9, 2026

View reviewed changes

dyoussif force-pushed the dyoussif/hw_rebase branch from da01903 to c2f22a0 Compare April 13, 2026 20:25