Feature/lapack peel off#237
Closed
realbabilu wants to merge 2 commits into
Closed
Conversation
Patch package captured from: `E:\mystran4\MYSTRANSolver-18.0.0.enhanced` Target paths when applying manually: - `Source\Modules\LAPACK\*` - `BLAS\XERBLA.f` This package represents the final `lapack_surgery` + `lapack_peeloff` state used to build: - `mystran_lapack_surgery.exe` - `mystran_lapack_peel_off.exe` The internal MYSTRAN LAPACK sources were reorganized so MYSTRAN can be built in an external optimized BLAS/LAPACK configuration, especially the OpenBLAS hijack build, without carrying a full internal BLAS implementation. The intended shape is: - External OpenBLAS supplies BLAS symbols such as `dgemm_`, `dtrsm_`, etc. - Regular single-thread SuperLU is linked against the same OpenBLAS BLAS library. - MYSTRAN keeps only the local `XERBLA.f` error handler from internal BLAS. - No internal CBLAS or f2c BLAS layer is required for the OpenBLAS configuration. - Internal MYSTRAN LAPACK entry points that are still needed are retained, but many routines are peeled into helper files so the build can coexist cleanly with optimized external libraries. Existing internal LAPACK files modified: - `LAPACK_BLAS_AUX.f` - `LAPACK_GIV_MGIV_EIG.f` - `LAPACK_LANCZOS_EIG.f` - `LAPACK_LIN_EQN_DGB.f` - `LAPACK_LIN_EQN_DGE.f` - `LAPACK_LIN_EQN_DPB.f` - `LAPACK_MISCEL.f` - `LAPACK_STD_EIG_1.f` - `LAPACK_SYM_MAT_INV.f` Additional helper/ext/kernel files added under `Source\Modules\LAPACK`: - `LAPACK_DGETF2_HELPER.f` - `LAPACK_DGETRF_HELPER.f` - `LAPACK_DGETRI_HELPER.f` - `LAPACK_DGETRS_HELPER.f` - `LAPACK_DISNAN_HELPER.f` - `LAPACK_DLABAD_HELPER.f` - `LAPACK_DLACON_HELPER.f90` - `LAPACK_DLACPY_HELPER.f` - `LAPACK_DLAE2_HELPER.f` - `LAPACK_DLAEV2_HELPER.f` - `LAPACK_DLAGTS_HELPER.f90` - `LAPACK_DLAN_HELPER.f90` - `LAPACK_DLAPY2_HELPER.f` - `LAPACK_DLAR_ROT_HELPER.f90` - `LAPACK_DLARF_HELPER.f90` - `LAPACK_DLARFB_HELPER.f90` - `LAPACK_DLARFG_HELPER.f90` - `LAPACK_DLARFT_HELPER.f90` - `LAPACK_DLARTG_HELPER.f90` - `LAPACK_DLAS_MISC_HELPER.f90` - `LAPACK_DLASCL_HELPER.f90` - `LAPACK_DLASRT_HELPER.f` - `LAPACK_DLASSQ_HELPER.f` - `LAPACK_DLAUUM_HELPER.f` - `LAPACK_DPBCON_HELPER.f` - `LAPACK_DPBEQU_HELPER.f` - `LAPACK_DPBSTF_HELPER.f` - `LAPACK_DPBTF2_HELPER.f` - `LAPACK_DPBTRF_KERNEL.f` - `LAPACK_DPBTRS_HELPER.f` - `LAPACK_DPOTRF_HELPER.f` - `LAPACK_DPOTRI_HELPER.f` - `LAPACK_DSTEV_HELPER.f` - `LAPACK_DSYTF2_HELPER.f` - `LAPACK_DTRTI2_HELPER.f` - `LAPACK_DTRTRS_HELPER.f` - `LAPACK_GIV_MGIV_EIG_HELPER.f` - `LAPACK_LANCZOS_EIG_HELPER.f` - `LAPACK_LIN_EQN_DGB_KERNEL.f` - `LAPACK_LIN_EQN_DGE_ext.f90` - `LAPACK_MISCEL_ext.f90` - `LAPACK_POTF2_HELPER.f` - `LAPACK_STD_EIG_1_ext.f90` - `LAPACK_STD_EIG_1_HELPER.f` `BLAS\XERBLA.f` is included for completeness. It was not changed relative to the original tree, but it is the only internal BLAS file intentionally kept in the OpenBLAS hijack layout. The peel-off work targets these internal LAPACK areas: - DGE linear equation routines: `DGETF2`, `DGETRF`, `DGETRI`, `DGETRS` - Symmetric matrix inverse / Cholesky path: `DLAUU2`, `DLAUUM`, `DPOTRF`, `DPOTRI`, `DTRTI2` - Miscellaneous routines: `DTRTRS`, `DSTEV` - Standard eigen path: `DSYEV`, `DSYTRD`, `DORGTR` - General band path: `DGBTRF`, `DGBTRS`, `DGBTF2` - Positive-definite band path: `DPBEQU`, `DPBTRF`, `DPBTF2`, `DPOTF2`, `DPBCON`, `DPBTRS`, `DSYTF2` The goal is to keep MYSTRAN's required internal numerical behavior available while reducing coupling to bundled BLAS and making symbol ownership clearer when optimized external libraries are linked. The tested enhanced build used: - OpenBLAS import library: `C:\gcc\openblas32\lib\libopenblas.dll.a` - Runtime DLL directory: `C:\gcc\openblas32\bin` - Regular SuperLU, not SuperLU-MT - AVX2-style release flags: `-O3`, `-funroll-loops`, `-march=core-avx2`, `-mtune=core-avx2` - Conservative floating-point behavior: `-fno-fast-math`, `-ffp-contract=off` Symbol checks confirmed the produced executables imported `libopenblas.dll` and had OpenBLAS-resolved BLAS imports such as `__imp_dgemm_`, while retaining local `xerbla_`. Full validation for `mystran_lapack_surgery.exe` and `mystran_lapack_peel_off.exe` both produced: `1/2605 failed` The failure was the same near-zero eigen residue: - Deck: `vic/12/V30 Beam MPC on constrained dof.bdf` - Quantity: `SC/2/REALEIGENVALUES/MODE/1/CYCLES` - Expected: `0` - Tolerance: `1e-05` - Patched/OpenBLAS result: `1.387039e-05` This matched the earlier OpenBLAS hijack behavior and is best treated as zero dust rather than a new LAPACK peel-off regression. Separate benchmark runners with sane zero-dust handling showed: - OpenBLAS hijack and `lapack_peel_off` had the same failed deck lists. - Baseline original and baseline AVX2 were cleaner on Benchmark suites than the OpenBLAS builds. - The Benchmark suites are not clean even on baseline; most baseline Benchmark failures are real validation differences, not zero dust.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Patch package captured from my local:
Target paths when applying manually:
Source\Modules\LAPACK*
BLAS\XERBLA.f
Purpose
The internal MYSTRAN LAPACK sources were reorganized so MYSTRAN can be built in an external optimized BLAS/LAPACK configuration, especially the OpenBLAS hijack build, without carrying a full internal BLAS implementation.
The intended shape is:
External OpenBLAS supplies BLAS symbols such as dgemm_, dtrsm_, etc.
Regular single-thread SuperLU is linked against the same OpenBLAS BLAS library.
MYSTRAN keeps only the local XERBLA.f error handler from internal BLAS.
No internal CBLAS or f2c BLAS layer is required for the OpenBLAS configuration.
Internal MYSTRAN LAPACK entry points that are still needed are retained, but many routines are peeled into helper files so the build can coexist cleanly with optimized external libraries.
What Changed
Existing internal LAPACK files modified:
LAPACK_BLAS_AUX.f
LAPACK_GIV_MGIV_EIG.f
LAPACK_LANCZOS_EIG.f
LAPACK_LIN_EQN_DGB.f
LAPACK_LIN_EQN_DGE.f
LAPACK_LIN_EQN_DPB.f
LAPACK_MISCEL.f
LAPACK_STD_EIG_1.f
LAPACK_SYM_MAT_INV.f
Additional helper/ext/kernel files added under Source\Modules\LAPACK:
LAPACK_DGETF2_HELPER.f
LAPACK_DGETRF_HELPER.f
LAPACK_DGETRI_HELPER.f
LAPACK_DGETRS_HELPER.f
LAPACK_DISNAN_HELPER.f
LAPACK_DLABAD_HELPER.f
LAPACK_DLACON_HELPER.f90
LAPACK_DLACPY_HELPER.f
LAPACK_DLAE2_HELPER.f
LAPACK_DLAEV2_HELPER.f
LAPACK_DLAGTS_HELPER.f90
LAPACK_DLAN_HELPER.f90
LAPACK_DLAPY2_HELPER.f
LAPACK_DLAR_ROT_HELPER.f90
LAPACK_DLARF_HELPER.f90
LAPACK_DLARFB_HELPER.f90
LAPACK_DLARFG_HELPER.f90
LAPACK_DLARFT_HELPER.f90
LAPACK_DLARTG_HELPER.f90
LAPACK_DLAS_MISC_HELPER.f90
LAPACK_DLASCL_HELPER.f90
LAPACK_DLASRT_HELPER.f
LAPACK_DLASSQ_HELPER.f
LAPACK_DLAUUM_HELPER.f
LAPACK_DPBCON_HELPER.f
LAPACK_DPBEQU_HELPER.f
LAPACK_DPBSTF_HELPER.f
LAPACK_DPBTF2_HELPER.f
LAPACK_DPBTRF_KERNEL.f
LAPACK_DPBTRS_HELPER.f
LAPACK_DPOTRF_HELPER.f
LAPACK_DPOTRI_HELPER.f
LAPACK_DSTEV_HELPER.f
LAPACK_DSYTF2_HELPER.f
LAPACK_DTRTI2_HELPER.f
LAPACK_DTRTRS_HELPER.f
LAPACK_GIV_MGIV_EIG_HELPER.f
LAPACK_LANCZOS_EIG_HELPER.f
LAPACK_LIN_EQN_DGB_KERNEL.f
LAPACK_LIN_EQN_DGE_ext.f90
LAPACK_MISCEL_ext.f90
LAPACK_POTF2_HELPER.f
LAPACK_STD_EIG_1_ext.f90
LAPACK_STD_EIG_1_HELPER.f
BLAS\XERBLA.f is included for completeness. It was not changed relative to the original tree, but it is the only internal BLAS file intentionally kept in the OpenBLAS hijack layout.
Peel-Off Surfaces
The peel-off work targets these internal LAPACK areas:
DGE linear equation routines: DGETF2, DGETRF, DGETRI, DGETRS
Symmetric matrix inverse / Cholesky path: DLAUU2, DLAUUM, DPOTRF, DPOTRI, DTRTI2
Miscellaneous routines: DTRTRS, DSTEV
Standard eigen path: DSYEV, DSYTRD, DORGTR
General band path: DGBTRF, DGBTRS, DGBTF2
Positive-definite band path: DPBEQU, DPBTRF, DPBTF2, DPOTF2, DPBCON, DPBTRS, DSYTF2
The goal is to keep MYSTRAN's required internal numerical behavior available while reducing coupling to bundled BLAS and making symbol ownership clearer when optimized external libraries are linked.
OpenBLAS Build Context Used
The tested enhanced build used:
OpenBLAS import library: libopenblas.dll.a
Runtime DLL directory: libopenblas.dll
Regular SuperLU, not SuperLU-MT
AVX2-style release flags: -O3, -funroll-loops, -march=core-avx2, -mtune=core-avx2
Conservative floating-point behavior: -fno-fast-math, -ffp-contract=off
Symbol checks confirmed the produced executables imported libopenblas.dll and had OpenBLAS-resolved BLAS imports such as imp_dgemm, while retaining local xerbla.
Validation Notes
Full validation for mystran_lapack_surgery.exe and mystran_lapack_peel_off.exe both produced:
1/2605 failed
The failure was the same near-zero eigen residue:
Deck: vic/12/V30 Beam MPC on constrained dof.bdf
Quantity: SC/2/REALEIGENVALUES/MODE/1/CYCLES
Expected: 0
Tolerance: 1e-05
Patched/OpenBLAS result: 1.387039e-05
This matched the earlier OpenBLAS hijack behavior and is best treated as zero dust rather than a new LAPACK peel-off regression.
Separate benchmark runners with sane zero-dust handling showed:
OpenBLAS hijack and lapack_peel_off had the same failed deck lists.
Baseline original and baseline AVX2 were cleaner on Benchmark suites than the OpenBLAS builds.
The Benchmark suites are not clean even on baseline; most baseline Benchmark failures are real validation differences, not zero dust.