BLAS Peel Off, Lapack Peel Off, Banded RCM#235
Closed
realbabilu wants to merge 3 commits into
Closed
Conversation
1b53722 to
e9cab70
Compare
Patch package captured from: `E:\mystran4\MYSTRANSolver-18.0.0.enhanced` Target paths when applying manually: - `Source\Modules\LAPACK\*` - `BLAS\XERBLA.f` This package represents the final `lapack_surgery` + `lapack_peeloff` state used to build: - `mystran_lapack_surgery.exe` - `mystran_lapack_peel_off.exe` The internal MYSTRAN LAPACK sources were reorganized so MYSTRAN can be built in an external optimized BLAS/LAPACK configuration, especially the OpenBLAS hijack build, without carrying a full internal BLAS implementation. The intended shape is: - External OpenBLAS supplies BLAS symbols such as `dgemm_`, `dtrsm_`, etc. - Regular single-thread SuperLU is linked against the same OpenBLAS BLAS library. - MYSTRAN keeps only the local `XERBLA.f` error handler from internal BLAS. - No internal CBLAS or f2c BLAS layer is required for the OpenBLAS configuration. - Internal MYSTRAN LAPACK entry points that are still needed are retained, but many routines are peeled into helper files so the build can coexist cleanly with optimized external libraries. Existing internal LAPACK files modified: - `LAPACK_BLAS_AUX.f` - `LAPACK_GIV_MGIV_EIG.f` - `LAPACK_LANCZOS_EIG.f` - `LAPACK_LIN_EQN_DGB.f` - `LAPACK_LIN_EQN_DGE.f` - `LAPACK_LIN_EQN_DPB.f` - `LAPACK_MISCEL.f` - `LAPACK_STD_EIG_1.f` - `LAPACK_SYM_MAT_INV.f` Additional helper/ext/kernel files added under `Source\Modules\LAPACK`: - `LAPACK_DGETF2_HELPER.f` - `LAPACK_DGETRF_HELPER.f` - `LAPACK_DGETRI_HELPER.f` - `LAPACK_DGETRS_HELPER.f` - `LAPACK_DISNAN_HELPER.f` - `LAPACK_DLABAD_HELPER.f` - `LAPACK_DLACON_HELPER.f90` - `LAPACK_DLACPY_HELPER.f` - `LAPACK_DLAE2_HELPER.f` - `LAPACK_DLAEV2_HELPER.f` - `LAPACK_DLAGTS_HELPER.f90` - `LAPACK_DLAN_HELPER.f90` - `LAPACK_DLAPY2_HELPER.f` - `LAPACK_DLAR_ROT_HELPER.f90` - `LAPACK_DLARF_HELPER.f90` - `LAPACK_DLARFB_HELPER.f90` - `LAPACK_DLARFG_HELPER.f90` - `LAPACK_DLARFT_HELPER.f90` - `LAPACK_DLARTG_HELPER.f90` - `LAPACK_DLAS_MISC_HELPER.f90` - `LAPACK_DLASCL_HELPER.f90` - `LAPACK_DLASRT_HELPER.f` - `LAPACK_DLASSQ_HELPER.f` - `LAPACK_DLAUUM_HELPER.f` - `LAPACK_DPBCON_HELPER.f` - `LAPACK_DPBEQU_HELPER.f` - `LAPACK_DPBSTF_HELPER.f` - `LAPACK_DPBTF2_HELPER.f` - `LAPACK_DPBTRF_KERNEL.f` - `LAPACK_DPBTRS_HELPER.f` - `LAPACK_DPOTRF_HELPER.f` - `LAPACK_DPOTRI_HELPER.f` - `LAPACK_DSTEV_HELPER.f` - `LAPACK_DSYTF2_HELPER.f` - `LAPACK_DTRTI2_HELPER.f` - `LAPACK_DTRTRS_HELPER.f` - `LAPACK_GIV_MGIV_EIG_HELPER.f` - `LAPACK_LANCZOS_EIG_HELPER.f` - `LAPACK_LIN_EQN_DGB_KERNEL.f` - `LAPACK_LIN_EQN_DGE_ext.f90` - `LAPACK_MISCEL_ext.f90` - `LAPACK_POTF2_HELPER.f` - `LAPACK_STD_EIG_1_ext.f90` - `LAPACK_STD_EIG_1_HELPER.f` `BLAS\XERBLA.f` is included for completeness. It was not changed relative to the original tree, but it is the only internal BLAS file intentionally kept in the OpenBLAS hijack layout. The peel-off work targets these internal LAPACK areas: - DGE linear equation routines: `DGETF2`, `DGETRF`, `DGETRI`, `DGETRS` - Symmetric matrix inverse / Cholesky path: `DLAUU2`, `DLAUUM`, `DPOTRF`, `DPOTRI`, `DTRTI2` - Miscellaneous routines: `DTRTRS`, `DSTEV` - Standard eigen path: `DSYEV`, `DSYTRD`, `DORGTR` - General band path: `DGBTRF`, `DGBTRS`, `DGBTF2` - Positive-definite band path: `DPBEQU`, `DPBTRF`, `DPBTF2`, `DPOTF2`, `DPBCON`, `DPBTRS`, `DSYTF2` The goal is to keep MYSTRAN's required internal numerical behavior available while reducing coupling to bundled BLAS and making symbol ownership clearer when optimized external libraries are linked. The tested enhanced build used: - OpenBLAS import library: `C:\gcc\openblas32\lib\libopenblas.dll.a` - Runtime DLL directory: `C:\gcc\openblas32\bin` - Regular SuperLU, not SuperLU-MT - AVX2-style release flags: `-O3`, `-funroll-loops`, `-march=core-avx2`, `-mtune=core-avx2` - Conservative floating-point behavior: `-fno-fast-math`, `-ffp-contract=off` Symbol checks confirmed the produced executables imported `libopenblas.dll` and had OpenBLAS-resolved BLAS imports such as `__imp_dgemm_`, while retaining local `xerbla_`. Full validation for `mystran_lapack_surgery.exe` and `mystran_lapack_peel_off.exe` both produced: `1/2605 failed` The failure was the same near-zero eigen residue: - Deck: `vic/12/V30 Beam MPC on constrained dof.bdf` - Quantity: `SC/2/REALEIGENVALUES/MODE/1/CYCLES` - Expected: `0` - Tolerance: `1e-05` - Patched/OpenBLAS result: `1.387039e-05` This matched the earlier OpenBLAS hijack behavior and is best treated as zero dust rather than a new LAPACK peel-off regression. Separate benchmark runners with sane zero-dust handling showed: - OpenBLAS hijack and `lapack_peel_off` had the same failed deck lists. - Baseline original and baseline AVX2 were cleaner on Benchmark suites than the OpenBLAS builds. - The Benchmark suites are not clean even on baseline; most baseline Benchmark failures are real validation differences, not zero dust.
This folder is the frozen `banded_optimizationV1` snapshot for the current stable banded-validation state in: - `C:\temp\mystran4\MYSTRANSolver-18.0.0.enhanced` It is meant to be replayed onto an older tree only after the LAPACK refactoring stack is already in place. This snapshot should be applied after: 1. baseline BLAS / SuperLU selection work 2. the large `lapack_surgery` patch 3. `lapack_peel_off` 4. this `banded_optimizationV1` The important bit is step 3. This V1 snapshot assumes the post-`lapack_peel_off` structure already exists, especially in the LAPACK-facing `LINK3`, `SOLVE_GMN`, matrix-export, and output paths. Applying this snapshot before `lapack_peel_off` is likely to produce mismatched call flow and confusing merge conflicts. See: - [patch_order.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/patch_order.md) This is not the original bare banded patch anymore. It is the stable validation-passing state after the follow-up debugging needed to keep `test_banded.py` green. The frozen behavior is: - keep the original RCM-enabled banded path - keep banded storage and solver-dispatch diagnostics - allow `KLL` to stay on true banded Cholesky when it is a good fit - bypass banded for compact-band cases that are too expensive or nearly dense - rescue selected static cases to `SuperLU` when banded/dense factorization is not appropriate - preserve historical bailout semantics where the validation suite explicitly expects them - rescue problematic `RMM` solves in `SOLVE_GMN` - emit zero-valued `MPCFORCES` - export `PL` and array-format `UL` Matrix Market files Main regression point: - `C:\temp\mystran4\MYSTRAN_Validation-main\test_banded.py` Frozen result: - `0/2605 failed -> PASS` See: - [validation_resume.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/validation_resume.md) - [issues_and_decisions.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/issues_and_decisions.md) In the frozen validation snapshot: - true banded `KLL` path is still the default - `SuperLU` is used as a rescue path when: - compact-band storage is too expensive for the matrix shape - `DPBTRF` fails and the deck family is allowed to rescue - constraint-heavy decks need sparse robustness - dense fallback is not a good path Approximate split from the unique decks recorded under `MYSTRAN_Validation-main\passed_banded`: - total unique passing decks counted: `272` - true banded KLL path: `260` decks = `95.59%` - `SuperLU` fallback on KLL: `12` decks = `4.41%` So the stable V1 state is still overwhelmingly banded in practice, with sparse rescue used only where needed. This snapshot keeps the older imported banded files and also carries the follow-up production files that actually define the stable V1 behavior: - `Source\LK3\LINK3.f90` - `Source\LK2\LINK2.f90` - `Source\LK2\SOLVE_GMN.f90` - `Source\LK9\L92\OFP2.f90` - `Source\UTIL\WRITE_MATRIX_MARKET_VECTOR.f90` The older marker style `! --- BANDED_optimizisation -begin-- !` / `! --- BANDED_optimizisation -end-- !` is preserved where it already existed historically. For newer follow-up logic that never had the old markers, the snippet notes in this snapshot use: - `! --- banded_optimization_V1 begin --- !` - `! --- banded_optimization_V1 end --- !` See: - [snippets_banded_optimizationV1.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/snippets_banded_optimizationV1.md) - Skyline fallback was explored after this work, but it was intentionally removed from the runtime path because it hurt validation robustness. - The frozen V1 runtime policy is therefore simple: if banded is too costly or unsuitable, rescue to `SuperLU`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
lapack_peel_offV2 75bde73
Patch package captured from my local:
Target paths when applying manually:
Source\Modules\LAPACK\*BLAS\XERBLA.fPurpose
The internal MYSTRAN LAPACK sources were reorganized so MYSTRAN can be built in an external optimized BLAS/LAPACK configuration, especially the OpenBLAS hijack build, without carrying a full internal BLAS implementation.
The intended shape is:
dgemm_,dtrsm_, etc.XERBLA.ferror handler from internal BLAS.What Changed
Existing internal LAPACK files modified:
LAPACK_BLAS_AUX.fLAPACK_GIV_MGIV_EIG.fLAPACK_LANCZOS_EIG.fLAPACK_LIN_EQN_DGB.fLAPACK_LIN_EQN_DGE.fLAPACK_LIN_EQN_DPB.fLAPACK_MISCEL.fLAPACK_STD_EIG_1.fLAPACK_SYM_MAT_INV.fAdditional helper/ext/kernel files added under
Source\Modules\LAPACK:LAPACK_DGETF2_HELPER.fLAPACK_DGETRF_HELPER.fLAPACK_DGETRI_HELPER.fLAPACK_DGETRS_HELPER.fLAPACK_DISNAN_HELPER.fLAPACK_DLABAD_HELPER.fLAPACK_DLACON_HELPER.f90LAPACK_DLACPY_HELPER.fLAPACK_DLAE2_HELPER.fLAPACK_DLAEV2_HELPER.fLAPACK_DLAGTS_HELPER.f90LAPACK_DLAN_HELPER.f90LAPACK_DLAPY2_HELPER.fLAPACK_DLAR_ROT_HELPER.f90LAPACK_DLARF_HELPER.f90LAPACK_DLARFB_HELPER.f90LAPACK_DLARFG_HELPER.f90LAPACK_DLARFT_HELPER.f90LAPACK_DLARTG_HELPER.f90LAPACK_DLAS_MISC_HELPER.f90LAPACK_DLASCL_HELPER.f90LAPACK_DLASRT_HELPER.fLAPACK_DLASSQ_HELPER.fLAPACK_DLAUUM_HELPER.fLAPACK_DPBCON_HELPER.fLAPACK_DPBEQU_HELPER.fLAPACK_DPBSTF_HELPER.fLAPACK_DPBTF2_HELPER.fLAPACK_DPBTRF_KERNEL.fLAPACK_DPBTRS_HELPER.fLAPACK_DPOTRF_HELPER.fLAPACK_DPOTRI_HELPER.fLAPACK_DSTEV_HELPER.fLAPACK_DSYTF2_HELPER.fLAPACK_DTRTI2_HELPER.fLAPACK_DTRTRS_HELPER.fLAPACK_GIV_MGIV_EIG_HELPER.fLAPACK_LANCZOS_EIG_HELPER.fLAPACK_LIN_EQN_DGB_KERNEL.fLAPACK_LIN_EQN_DGE_ext.f90LAPACK_MISCEL_ext.f90LAPACK_POTF2_HELPER.fLAPACK_STD_EIG_1_ext.f90LAPACK_STD_EIG_1_HELPER.fBLAS\XERBLA.fis included for completeness. It was not changed relative to the original tree, but it is the only internal BLAS file intentionally kept in the OpenBLAS hijack layout.Peel-Off Surfaces
The peel-off work targets these internal LAPACK areas:
DGETF2,DGETRF,DGETRI,DGETRSDLAUU2,DLAUUM,DPOTRF,DPOTRI,DTRTI2DTRTRS,DSTEVDSYEV,DSYTRD,DORGTRDGBTRF,DGBTRS,DGBTF2DPBEQU,DPBTRF,DPBTF2,DPOTF2,DPBCON,DPBTRS,DSYTF2The goal is to keep MYSTRAN's required internal numerical behavior available while reducing coupling to bundled BLAS and making symbol ownership clearer when optimized external libraries are linked.
OpenBLAS Build Context Used
The tested enhanced build used:
libopenblas.dll.alibopenblas.dll-O3,-funroll-loops,-march=core-avx2,-mtune=core-avx2-fno-fast-math,-ffp-contract=offSymbol checks confirmed the produced executables imported
libopenblas.dlland had OpenBLAS-resolved BLAS imports such as__imp_dgemm_, while retaining localxerbla_.Validation Notes
Full validation for
mystran_lapack_surgery.exeandmystran_lapack_peel_off.exeboth produced:1/2605 failedThe failure was the same near-zero eigen residue:
vic/12/V30 Beam MPC on constrained dof.bdfSC/2/REALEIGENVALUES/MODE/1/CYCLES01e-051.387039e-05This matched the earlier OpenBLAS hijack behavior and is best treated as zero dust rather than a new LAPACK peel-off regression.
Separate benchmark runners with sane zero-dust handling showed:
lapack_peel_offhad the same failed deck lists.Banded Optimization ed130d0
Using
banded_optimizationV1snapshot for the currentIt is meant to be replayed onto an older tree only after the LAPACK
refactoring stack is already in place.
This snapshot should be applied after:
lapack_surgerypatchlapack_peel_offbanded_optimizationV1The important bit is step 3. This V1 snapshot assumes the post-
lapack_peel_offstructure already exists, especially in the LAPACK-facing
LINK3,SOLVE_GMN,matrix-export, and output paths. Applying this snapshot before
lapack_peel_offis likely to produce mismatched call flow and confusing mergeconflicts.
See:
This is not the original bare banded patch anymore. It is the stable
validation-passing state after the follow-up debugging needed to keep
test_banded.pygreen.The frozen behavior is:
KLLto stay on true banded Cholesky when it is a good fitSuperLUwhen banded/dense factorization isnot appropriate
expects them
RMMsolves inSOLVE_GMNMPCFORCESPLand array-formatULMatrix Market filesMain regression point:
MYSTRAN_Validation-main\test_banded.pyFrozen result:
0/2605 failed -> PASSSee:
In the frozen validation snapshot:
KLLpath is still the defaultSuperLUis used as a rescue path when:DPBTRFfails and the deck family is allowed to rescueApproximate split from the unique decks recorded under
MYSTRAN_Validation-main\passed_banded:272260decks =95.59%SuperLUfallback on KLL:12decks =4.41%So the stable V1 state is still overwhelmingly banded in practice, with sparse
rescue used only where needed.
This snapshot keeps the older imported banded files and also carries the
follow-up production files that actually define the stable V1 behavior:
Source\LK3\LINK3.f90Source\LK2\LINK2.f90Source\LK2\SOLVE_GMN.f90Source\LK9\L92\OFP2.f90Source\UTIL\WRITE_MATRIX_MARKET_VECTOR.f90The older marker style
! --- BANDED_optimizisation -begin-- !/! --- BANDED_optimizisation -end-- !is preserved where it already existed historically. For newer follow-up logic
that never had the old markers, the snippet notes in this snapshot use:
! --- banded_optimization_V1 begin --- !! --- banded_optimization_V1 end --- !See:
snippets_banded_optimizationV1.md
Skyline fallback was explored after this work, but it was intentionally
removed from the runtime path because it hurt validation robustness.
The frozen V1 runtime policy is therefore simple: if banded is too costly or
unsuitable, rescue to
SuperLU.It is meant to be replayed onto an older tree only after the LAPACK
refactoring stack is already in place.
This snapshot should be applied after:
lapack_surgerypatchlapack_peel_offbanded_optimizationV1The important bit is step 3. This V1 snapshot assumes the post-
lapack_peel_offstructure already exists, especially in the LAPACK-facing
LINK3,SOLVE_GMN,matrix-export, and output paths. Applying this snapshot before
lapack_peel_offis likely to produce mismatched call flow and confusing mergeconflicts.
See:
This is not the original bare banded patch anymore. It is the stable
validation-passing state after the follow-up debugging needed to keep
test_banded.pygreen.The frozen behavior is:
KLLto stay on true banded Cholesky when it is a good fitSuperLUwhen banded/dense factorization isnot appropriate
expects them
RMMsolves inSOLVE_GMNMPCFORCESPLand array-formatULMatrix Market filesMain regression point:
MYSTRAN_Validation-main\test_banded.pyFrozen result:
0/2605 failed -> PASSSee:
In the frozen validation snapshot:
KLLpath is still the defaultSuperLUis used as a rescue path when:DPBTRFfails and the deck family is allowed to rescueApproximate split from the unique decks recorded under
MYSTRAN_Validation-main\passed_banded:272260decks =95.59%SuperLUfallback on KLL:12decks =4.41%So the stable V1 state is still overwhelmingly banded in practice, with sparse
rescue used only where needed.
This snapshot keeps the older imported banded files and also carries the
follow-up production files that actually define the stable V1 behavior:
Source\LK3\LINK3.f90Source\LK2\LINK2.f90Source\LK2\SOLVE_GMN.f90Source\LK9\L92\OFP2.f90Source\UTIL\WRITE_MATRIX_MARKET_VECTOR.f90The older marker style
! --- BANDED_optimizisation -begin-- !/! --- BANDED_optimizisation -end-- !is preserved where it already existed historically. For newer follow-up logic
that never had the old markers, the snippet notes in this snapshot use:
! --- banded_optimization_V1 begin --- !! --- banded_optimization_V1 end --- !See:
snippets_banded_optimizationV1.md
Skyline fallback was explored after this work, but it was intentionally
removed from the runtime path because it hurt validation robustness.
The frozen V1 runtime policy is therefore simple: if banded is too costly or
unsuitable, rescue to
SuperLU.