Skip to content

Feature/lapack peel off#237

Closed
realbabilu wants to merge 2 commits into
MystranSolver:mainfrom
realbabilu:feature/lapack-peel-off
Closed

Feature/lapack peel off#237
realbabilu wants to merge 2 commits into
MystranSolver:mainfrom
realbabilu:feature/lapack-peel-off

Conversation

@realbabilu
Copy link
Copy Markdown

Patch package captured from my local:

Target paths when applying manually:

Source\Modules\LAPACK*
BLAS\XERBLA.f
Purpose
The internal MYSTRAN LAPACK sources were reorganized so MYSTRAN can be built in an external optimized BLAS/LAPACK configuration, especially the OpenBLAS hijack build, without carrying a full internal BLAS implementation.

The intended shape is:

External OpenBLAS supplies BLAS symbols such as dgemm_, dtrsm_, etc.
Regular single-thread SuperLU is linked against the same OpenBLAS BLAS library.
MYSTRAN keeps only the local XERBLA.f error handler from internal BLAS.
No internal CBLAS or f2c BLAS layer is required for the OpenBLAS configuration.
Internal MYSTRAN LAPACK entry points that are still needed are retained, but many routines are peeled into helper files so the build can coexist cleanly with optimized external libraries.
What Changed
Existing internal LAPACK files modified:

LAPACK_BLAS_AUX.f
LAPACK_GIV_MGIV_EIG.f
LAPACK_LANCZOS_EIG.f
LAPACK_LIN_EQN_DGB.f
LAPACK_LIN_EQN_DGE.f
LAPACK_LIN_EQN_DPB.f
LAPACK_MISCEL.f
LAPACK_STD_EIG_1.f
LAPACK_SYM_MAT_INV.f
Additional helper/ext/kernel files added under Source\Modules\LAPACK:

LAPACK_DGETF2_HELPER.f
LAPACK_DGETRF_HELPER.f
LAPACK_DGETRI_HELPER.f
LAPACK_DGETRS_HELPER.f
LAPACK_DISNAN_HELPER.f
LAPACK_DLABAD_HELPER.f
LAPACK_DLACON_HELPER.f90
LAPACK_DLACPY_HELPER.f
LAPACK_DLAE2_HELPER.f
LAPACK_DLAEV2_HELPER.f
LAPACK_DLAGTS_HELPER.f90
LAPACK_DLAN_HELPER.f90
LAPACK_DLAPY2_HELPER.f
LAPACK_DLAR_ROT_HELPER.f90
LAPACK_DLARF_HELPER.f90
LAPACK_DLARFB_HELPER.f90
LAPACK_DLARFG_HELPER.f90
LAPACK_DLARFT_HELPER.f90
LAPACK_DLARTG_HELPER.f90
LAPACK_DLAS_MISC_HELPER.f90
LAPACK_DLASCL_HELPER.f90
LAPACK_DLASRT_HELPER.f
LAPACK_DLASSQ_HELPER.f
LAPACK_DLAUUM_HELPER.f
LAPACK_DPBCON_HELPER.f
LAPACK_DPBEQU_HELPER.f
LAPACK_DPBSTF_HELPER.f
LAPACK_DPBTF2_HELPER.f
LAPACK_DPBTRF_KERNEL.f
LAPACK_DPBTRS_HELPER.f
LAPACK_DPOTRF_HELPER.f
LAPACK_DPOTRI_HELPER.f
LAPACK_DSTEV_HELPER.f
LAPACK_DSYTF2_HELPER.f
LAPACK_DTRTI2_HELPER.f
LAPACK_DTRTRS_HELPER.f
LAPACK_GIV_MGIV_EIG_HELPER.f
LAPACK_LANCZOS_EIG_HELPER.f
LAPACK_LIN_EQN_DGB_KERNEL.f
LAPACK_LIN_EQN_DGE_ext.f90
LAPACK_MISCEL_ext.f90
LAPACK_POTF2_HELPER.f
LAPACK_STD_EIG_1_ext.f90
LAPACK_STD_EIG_1_HELPER.f
BLAS\XERBLA.f is included for completeness. It was not changed relative to the original tree, but it is the only internal BLAS file intentionally kept in the OpenBLAS hijack layout.

Peel-Off Surfaces
The peel-off work targets these internal LAPACK areas:

DGE linear equation routines: DGETF2, DGETRF, DGETRI, DGETRS
Symmetric matrix inverse / Cholesky path: DLAUU2, DLAUUM, DPOTRF, DPOTRI, DTRTI2
Miscellaneous routines: DTRTRS, DSTEV
Standard eigen path: DSYEV, DSYTRD, DORGTR
General band path: DGBTRF, DGBTRS, DGBTF2
Positive-definite band path: DPBEQU, DPBTRF, DPBTF2, DPOTF2, DPBCON, DPBTRS, DSYTF2
The goal is to keep MYSTRAN's required internal numerical behavior available while reducing coupling to bundled BLAS and making symbol ownership clearer when optimized external libraries are linked.

OpenBLAS Build Context Used
The tested enhanced build used:

OpenBLAS import library: libopenblas.dll.a
Runtime DLL directory: libopenblas.dll
Regular SuperLU, not SuperLU-MT
AVX2-style release flags: -O3, -funroll-loops, -march=core-avx2, -mtune=core-avx2
Conservative floating-point behavior: -fno-fast-math, -ffp-contract=off
Symbol checks confirmed the produced executables imported libopenblas.dll and had OpenBLAS-resolved BLAS imports such as imp_dgemm, while retaining local xerbla.

Validation Notes
Full validation for mystran_lapack_surgery.exe and mystran_lapack_peel_off.exe both produced:

1/2605 failed

The failure was the same near-zero eigen residue:

Deck: vic/12/V30 Beam MPC on constrained dof.bdf
Quantity: SC/2/REALEIGENVALUES/MODE/1/CYCLES
Expected: 0
Tolerance: 1e-05
Patched/OpenBLAS result: 1.387039e-05
This matched the earlier OpenBLAS hijack behavior and is best treated as zero dust rather than a new LAPACK peel-off regression.

Separate benchmark runners with sane zero-dust handling showed:

OpenBLAS hijack and lapack_peel_off had the same failed deck lists.
Baseline original and baseline AVX2 were cleaner on Benchmark suites than the OpenBLAS builds.
The Benchmark suites are not clean even on baseline; most baseline Benchmark failures are real validation differences, not zero dust.

Patch package captured from:

`E:\mystran4\MYSTRANSolver-18.0.0.enhanced`

Target paths when applying manually:

- `Source\Modules\LAPACK\*`
- `BLAS\XERBLA.f`

This package represents the final `lapack_surgery` + `lapack_peeloff` state used to build:

- `mystran_lapack_surgery.exe`
- `mystran_lapack_peel_off.exe`

The internal MYSTRAN LAPACK sources were reorganized so MYSTRAN can be built in an external optimized BLAS/LAPACK configuration, especially the OpenBLAS hijack build, without carrying a full internal BLAS implementation.

The intended shape is:

- External OpenBLAS supplies BLAS symbols such as `dgemm_`, `dtrsm_`, etc.
- Regular single-thread SuperLU is linked against the same OpenBLAS BLAS library.
- MYSTRAN keeps only the local `XERBLA.f` error handler from internal BLAS.
- No internal CBLAS or f2c BLAS layer is required for the OpenBLAS configuration.
- Internal MYSTRAN LAPACK entry points that are still needed are retained, but many routines are peeled into helper files so the build can coexist cleanly with optimized external libraries.

Existing internal LAPACK files modified:

- `LAPACK_BLAS_AUX.f`
- `LAPACK_GIV_MGIV_EIG.f`
- `LAPACK_LANCZOS_EIG.f`
- `LAPACK_LIN_EQN_DGB.f`
- `LAPACK_LIN_EQN_DGE.f`
- `LAPACK_LIN_EQN_DPB.f`
- `LAPACK_MISCEL.f`
- `LAPACK_STD_EIG_1.f`
- `LAPACK_SYM_MAT_INV.f`

Additional helper/ext/kernel files added under `Source\Modules\LAPACK`:

- `LAPACK_DGETF2_HELPER.f`
- `LAPACK_DGETRF_HELPER.f`
- `LAPACK_DGETRI_HELPER.f`
- `LAPACK_DGETRS_HELPER.f`
- `LAPACK_DISNAN_HELPER.f`
- `LAPACK_DLABAD_HELPER.f`
- `LAPACK_DLACON_HELPER.f90`
- `LAPACK_DLACPY_HELPER.f`
- `LAPACK_DLAE2_HELPER.f`
- `LAPACK_DLAEV2_HELPER.f`
- `LAPACK_DLAGTS_HELPER.f90`
- `LAPACK_DLAN_HELPER.f90`
- `LAPACK_DLAPY2_HELPER.f`
- `LAPACK_DLAR_ROT_HELPER.f90`
- `LAPACK_DLARF_HELPER.f90`
- `LAPACK_DLARFB_HELPER.f90`
- `LAPACK_DLARFG_HELPER.f90`
- `LAPACK_DLARFT_HELPER.f90`
- `LAPACK_DLARTG_HELPER.f90`
- `LAPACK_DLAS_MISC_HELPER.f90`
- `LAPACK_DLASCL_HELPER.f90`
- `LAPACK_DLASRT_HELPER.f`
- `LAPACK_DLASSQ_HELPER.f`
- `LAPACK_DLAUUM_HELPER.f`
- `LAPACK_DPBCON_HELPER.f`
- `LAPACK_DPBEQU_HELPER.f`
- `LAPACK_DPBSTF_HELPER.f`
- `LAPACK_DPBTF2_HELPER.f`
- `LAPACK_DPBTRF_KERNEL.f`
- `LAPACK_DPBTRS_HELPER.f`
- `LAPACK_DPOTRF_HELPER.f`
- `LAPACK_DPOTRI_HELPER.f`
- `LAPACK_DSTEV_HELPER.f`
- `LAPACK_DSYTF2_HELPER.f`
- `LAPACK_DTRTI2_HELPER.f`
- `LAPACK_DTRTRS_HELPER.f`
- `LAPACK_GIV_MGIV_EIG_HELPER.f`
- `LAPACK_LANCZOS_EIG_HELPER.f`
- `LAPACK_LIN_EQN_DGB_KERNEL.f`
- `LAPACK_LIN_EQN_DGE_ext.f90`
- `LAPACK_MISCEL_ext.f90`
- `LAPACK_POTF2_HELPER.f`
- `LAPACK_STD_EIG_1_ext.f90`
- `LAPACK_STD_EIG_1_HELPER.f`

`BLAS\XERBLA.f` is included for completeness. It was not changed relative to the original tree, but it is the only internal BLAS file intentionally kept in the OpenBLAS hijack layout.

The peel-off work targets these internal LAPACK areas:

- DGE linear equation routines: `DGETF2`, `DGETRF`, `DGETRI`, `DGETRS`
- Symmetric matrix inverse / Cholesky path: `DLAUU2`, `DLAUUM`, `DPOTRF`, `DPOTRI`, `DTRTI2`
- Miscellaneous routines: `DTRTRS`, `DSTEV`
- Standard eigen path: `DSYEV`, `DSYTRD`, `DORGTR`
- General band path: `DGBTRF`, `DGBTRS`, `DGBTF2`
- Positive-definite band path: `DPBEQU`, `DPBTRF`, `DPBTF2`, `DPOTF2`, `DPBCON`, `DPBTRS`, `DSYTF2`

The goal is to keep MYSTRAN's required internal numerical behavior available while reducing coupling to bundled BLAS and making symbol ownership clearer when optimized external libraries are linked.

The tested enhanced build used:

- OpenBLAS import library: `C:\gcc\openblas32\lib\libopenblas.dll.a`
- Runtime DLL directory: `C:\gcc\openblas32\bin`
- Regular SuperLU, not SuperLU-MT
- AVX2-style release flags: `-O3`, `-funroll-loops`, `-march=core-avx2`, `-mtune=core-avx2`
- Conservative floating-point behavior: `-fno-fast-math`, `-ffp-contract=off`

Symbol checks confirmed the produced executables imported `libopenblas.dll` and had OpenBLAS-resolved BLAS imports such as `__imp_dgemm_`, while retaining local `xerbla_`.

Full validation for `mystran_lapack_surgery.exe` and `mystran_lapack_peel_off.exe` both produced:

`1/2605 failed`

The failure was the same near-zero eigen residue:

- Deck: `vic/12/V30 Beam MPC on constrained dof.bdf`
- Quantity: `SC/2/REALEIGENVALUES/MODE/1/CYCLES`
- Expected: `0`
- Tolerance: `1e-05`
- Patched/OpenBLAS result: `1.387039e-05`

This matched the earlier OpenBLAS hijack behavior and is best treated as zero dust rather than a new LAPACK peel-off regression.

Separate benchmark runners with sane zero-dust handling showed:

- OpenBLAS hijack and `lapack_peel_off` had the same failed deck lists.
- Baseline original and baseline AVX2 were cleaner on Benchmark suites than the OpenBLAS builds.
- The Benchmark suites are not clean even on baseline; most baseline Benchmark failures are real validation differences, not zero dust.
@realbabilu realbabilu closed this May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant