Feature/banded opt v1#238
Closed
realbabilu wants to merge 3 commits into
Closed
Conversation
Patch package captured from: `E:\mystran4\MYSTRANSolver-18.0.0.enhanced` Target paths when applying manually: - `Source\Modules\LAPACK\*` - `BLAS\XERBLA.f` This package represents the final `lapack_surgery` + `lapack_peeloff` state used to build: - `mystran_lapack_surgery.exe` - `mystran_lapack_peel_off.exe` The internal MYSTRAN LAPACK sources were reorganized so MYSTRAN can be built in an external optimized BLAS/LAPACK configuration, especially the OpenBLAS hijack build, without carrying a full internal BLAS implementation. The intended shape is: - External OpenBLAS supplies BLAS symbols such as `dgemm_`, `dtrsm_`, etc. - Regular single-thread SuperLU is linked against the same OpenBLAS BLAS library. - MYSTRAN keeps only the local `XERBLA.f` error handler from internal BLAS. - No internal CBLAS or f2c BLAS layer is required for the OpenBLAS configuration. - Internal MYSTRAN LAPACK entry points that are still needed are retained, but many routines are peeled into helper files so the build can coexist cleanly with optimized external libraries. Existing internal LAPACK files modified: - `LAPACK_BLAS_AUX.f` - `LAPACK_GIV_MGIV_EIG.f` - `LAPACK_LANCZOS_EIG.f` - `LAPACK_LIN_EQN_DGB.f` - `LAPACK_LIN_EQN_DGE.f` - `LAPACK_LIN_EQN_DPB.f` - `LAPACK_MISCEL.f` - `LAPACK_STD_EIG_1.f` - `LAPACK_SYM_MAT_INV.f` Additional helper/ext/kernel files added under `Source\Modules\LAPACK`: - `LAPACK_DGETF2_HELPER.f` - `LAPACK_DGETRF_HELPER.f` - `LAPACK_DGETRI_HELPER.f` - `LAPACK_DGETRS_HELPER.f` - `LAPACK_DISNAN_HELPER.f` - `LAPACK_DLABAD_HELPER.f` - `LAPACK_DLACON_HELPER.f90` - `LAPACK_DLACPY_HELPER.f` - `LAPACK_DLAE2_HELPER.f` - `LAPACK_DLAEV2_HELPER.f` - `LAPACK_DLAGTS_HELPER.f90` - `LAPACK_DLAN_HELPER.f90` - `LAPACK_DLAPY2_HELPER.f` - `LAPACK_DLAR_ROT_HELPER.f90` - `LAPACK_DLARF_HELPER.f90` - `LAPACK_DLARFB_HELPER.f90` - `LAPACK_DLARFG_HELPER.f90` - `LAPACK_DLARFT_HELPER.f90` - `LAPACK_DLARTG_HELPER.f90` - `LAPACK_DLAS_MISC_HELPER.f90` - `LAPACK_DLASCL_HELPER.f90` - `LAPACK_DLASRT_HELPER.f` - `LAPACK_DLASSQ_HELPER.f` - `LAPACK_DLAUUM_HELPER.f` - `LAPACK_DPBCON_HELPER.f` - `LAPACK_DPBEQU_HELPER.f` - `LAPACK_DPBSTF_HELPER.f` - `LAPACK_DPBTF2_HELPER.f` - `LAPACK_DPBTRF_KERNEL.f` - `LAPACK_DPBTRS_HELPER.f` - `LAPACK_DPOTRF_HELPER.f` - `LAPACK_DPOTRI_HELPER.f` - `LAPACK_DSTEV_HELPER.f` - `LAPACK_DSYTF2_HELPER.f` - `LAPACK_DTRTI2_HELPER.f` - `LAPACK_DTRTRS_HELPER.f` - `LAPACK_GIV_MGIV_EIG_HELPER.f` - `LAPACK_LANCZOS_EIG_HELPER.f` - `LAPACK_LIN_EQN_DGB_KERNEL.f` - `LAPACK_LIN_EQN_DGE_ext.f90` - `LAPACK_MISCEL_ext.f90` - `LAPACK_POTF2_HELPER.f` - `LAPACK_STD_EIG_1_ext.f90` - `LAPACK_STD_EIG_1_HELPER.f` `BLAS\XERBLA.f` is included for completeness. It was not changed relative to the original tree, but it is the only internal BLAS file intentionally kept in the OpenBLAS hijack layout. The peel-off work targets these internal LAPACK areas: - DGE linear equation routines: `DGETF2`, `DGETRF`, `DGETRI`, `DGETRS` - Symmetric matrix inverse / Cholesky path: `DLAUU2`, `DLAUUM`, `DPOTRF`, `DPOTRI`, `DTRTI2` - Miscellaneous routines: `DTRTRS`, `DSTEV` - Standard eigen path: `DSYEV`, `DSYTRD`, `DORGTR` - General band path: `DGBTRF`, `DGBTRS`, `DGBTF2` - Positive-definite band path: `DPBEQU`, `DPBTRF`, `DPBTF2`, `DPOTF2`, `DPBCON`, `DPBTRS`, `DSYTF2` The goal is to keep MYSTRAN's required internal numerical behavior available while reducing coupling to bundled BLAS and making symbol ownership clearer when optimized external libraries are linked. The tested enhanced build used: - OpenBLAS import library: `C:\gcc\openblas32\lib\libopenblas.dll.a` - Runtime DLL directory: `C:\gcc\openblas32\bin` - Regular SuperLU, not SuperLU-MT - AVX2-style release flags: `-O3`, `-funroll-loops`, `-march=core-avx2`, `-mtune=core-avx2` - Conservative floating-point behavior: `-fno-fast-math`, `-ffp-contract=off` Symbol checks confirmed the produced executables imported `libopenblas.dll` and had OpenBLAS-resolved BLAS imports such as `__imp_dgemm_`, while retaining local `xerbla_`. Full validation for `mystran_lapack_surgery.exe` and `mystran_lapack_peel_off.exe` both produced: `1/2605 failed` The failure was the same near-zero eigen residue: - Deck: `vic/12/V30 Beam MPC on constrained dof.bdf` - Quantity: `SC/2/REALEIGENVALUES/MODE/1/CYCLES` - Expected: `0` - Tolerance: `1e-05` - Patched/OpenBLAS result: `1.387039e-05` This matched the earlier OpenBLAS hijack behavior and is best treated as zero dust rather than a new LAPACK peel-off regression. Separate benchmark runners with sane zero-dust handling showed: - OpenBLAS hijack and `lapack_peel_off` had the same failed deck lists. - Baseline original and baseline AVX2 were cleaner on Benchmark suites than the OpenBLAS builds. - The Benchmark suites are not clean even on baseline; most baseline Benchmark failures are real validation differences, not zero dust.
This folder is the frozen `banded_optimizationV1` snapshot for the current stable banded-validation state in: - `C:\temp\mystran4\MYSTRANSolver-18.0.0.enhanced` It is meant to be replayed onto an older tree only after the LAPACK refactoring stack is already in place. This snapshot should be applied after: 1. baseline BLAS / SuperLU selection work 2. the large `lapack_surgery` patch 3. `lapack_peel_off` 4. this `banded_optimizationV1` The important bit is step 3. This V1 snapshot assumes the post-`lapack_peel_off` structure already exists, especially in the LAPACK-facing `LINK3`, `SOLVE_GMN`, matrix-export, and output paths. Applying this snapshot before `lapack_peel_off` is likely to produce mismatched call flow and confusing merge conflicts. See: - [patch_order.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/patch_order.md) This is not the original bare banded patch anymore. It is the stable validation-passing state after the follow-up debugging needed to keep `test_banded.py` green. The frozen behavior is: - keep the original RCM-enabled banded path - keep banded storage and solver-dispatch diagnostics - allow `KLL` to stay on true banded Cholesky when it is a good fit - bypass banded for compact-band cases that are too expensive or nearly dense - rescue selected static cases to `SuperLU` when banded/dense factorization is not appropriate - preserve historical bailout semantics where the validation suite explicitly expects them - rescue problematic `RMM` solves in `SOLVE_GMN` - emit zero-valued `MPCFORCES` - export `PL` and array-format `UL` Matrix Market files Main regression point: - `C:\temp\mystran4\MYSTRAN_Validation-main\test_banded.py` Frozen result: - `0/2605 failed -> PASS` See: - [validation_resume.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/validation_resume.md) - [issues_and_decisions.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/issues_and_decisions.md) In the frozen validation snapshot: - true banded `KLL` path is still the default - `SuperLU` is used as a rescue path when: - compact-band storage is too expensive for the matrix shape - `DPBTRF` fails and the deck family is allowed to rescue - constraint-heavy decks need sparse robustness - dense fallback is not a good path Approximate split from the unique decks recorded under `MYSTRAN_Validation-main\passed_banded`: - total unique passing decks counted: `272` - true banded KLL path: `260` decks = `95.59%` - `SuperLU` fallback on KLL: `12` decks = `4.41%` So the stable V1 state is still overwhelmingly banded in practice, with sparse rescue used only where needed. This snapshot keeps the older imported banded files and also carries the follow-up production files that actually define the stable V1 behavior: - `Source\LK3\LINK3.f90` - `Source\LK2\LINK2.f90` - `Source\LK2\SOLVE_GMN.f90` - `Source\LK9\L92\OFP2.f90` - `Source\UTIL\WRITE_MATRIX_MARKET_VECTOR.f90` The older marker style `! --- BANDED_optimizisation -begin-- !` / `! --- BANDED_optimizisation -end-- !` is preserved where it already existed historically. For newer follow-up logic that never had the old markers, the snippet notes in this snapshot use: - `! --- banded_optimization_V1 begin --- !` - `! --- banded_optimization_V1 end --- !` See: - [snippets_banded_optimizationV1.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/snippets_banded_optimizationV1.md) - Skyline fallback was explored after this work, but it was intentionally removed from the runtime path because it hurt validation robustness. - The frozen V1 runtime policy is therefore simple: if banded is too costly or unsuitable, rescue to `SuperLU`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Using banded_optimizationV1 snapshot for the current
It is meant to be replayed onto an older tree only after the LAPACK
refactoring stack is already in place.
This snapshot should be applied after:
baseline BLAS / SuperLU selection work
the large lapack_surgery patch
lapack_peel_off
this banded_optimizationV1
The important bit is step 3. This V1 snapshot assumes the post-lapack_peel_off
structure already exists, especially in the LAPACK-facing LINK3, SOLVE_GMN,
matrix-export, and output paths. Applying this snapshot before
lapack_peel_off is likely to produce mismatched call flow and confusing merge
conflicts.
See:
patch_order.md
This is not the original bare banded patch anymore. It is the stable
validation-passing state after the follow-up debugging needed to keep
test_banded.py green.
The frozen behavior is:
keep the original RCM-enabled banded path
keep banded storage and solver-dispatch diagnostics
allow KLL to stay on true banded Cholesky when it is a good fit
bypass banded for compact-band cases that are too expensive or nearly dense
rescue selected static cases to SuperLU when banded/dense factorization is
not appropriate
preserve historical bailout semantics where the validation suite explicitly
expects them
rescue problematic RMM solves in SOLVE_GMN
emit zero-valued MPCFORCES
export PL and array-format UL Matrix Market files
Main regression point:
MYSTRAN_Validation-main\test_banded.py
Frozen result:
0/2605 failed -> PASS
See:
[dev_docs/validation_resume.md)
[dev_docs/issues_and_decisions.md)
In the frozen validation snapshot:
true banded KLL path is still the default
SuperLU is used as a rescue path when:
compact-band storage is too expensive for the matrix shape
DPBTRF fails and the deck family is allowed to rescue
constraint-heavy decks need sparse robustness
dense fallback is not a good path
Approximate split from the unique decks recorded under
MYSTRAN_Validation-main\passed_banded:
total unique passing decks counted: 272
true banded KLL path: 260 decks = 95.59%
SuperLU fallback on KLL: 12 decks = 4.41%
So the stable V1 state is still overwhelmingly banded in practice, with sparse
rescue used only where needed.
This snapshot keeps the older imported banded files and also carries the
follow-up production files that actually define the stable V1 behavior:
Source\LK3\LINK3.f90
Source\LK2\LINK2.f90
Source\LK2\SOLVE_GMN.f90
Source\LK9\L92\OFP2.f90
Source\UTIL\WRITE_MATRIX_MARKET_VECTOR.f90
The older marker style
! --- BANDED_optimizisation -begin-- ! / ! --- BANDED_optimizisation -end-- !
is preserved where it already existed historically. For newer follow-up logic
that never had the old markers, the snippet notes in this snapshot use:
! --- banded_optimization_V1 begin --- !
! --- banded_optimization_V1 end --- !
See:
snippets_banded_optimizationV1.md
Skyline fallback was explored after this work, but it was intentionally
removed from the runtime path because it hurt validation robustness.
The frozen V1 runtime policy is therefore simple: if banded is too costly or
unsuitable, rescue to SuperLU.
It is meant to be replayed onto an older tree only after the LAPACK
refactoring stack is already in place.
This snapshot should be applied after:
baseline BLAS / SuperLU selection work
the large lapack_surgery patch
lapack_peel_off
this banded_optimizationV1
The important bit is step 3. This V1 snapshot assumes the post-lapack_peel_off
structure already exists, especially in the LAPACK-facing LINK3, SOLVE_GMN,
matrix-export, and output paths. Applying this snapshot before
lapack_peel_off is likely to produce mismatched call flow and confusing merge
conflicts.
See:
patch_order.md
This is not the original bare banded patch anymore. It is the stable
validation-passing state after the follow-up debugging needed to keep
test_banded.py green.
The frozen behavior is:
keep the original RCM-enabled banded path
keep banded storage and solver-dispatch diagnostics
allow KLL to stay on true banded Cholesky when it is a good fit
bypass banded for compact-band cases that are too expensive or nearly dense
rescue selected static cases to SuperLU when banded/dense factorization is
not appropriate
preserve historical bailout semantics where the validation suite explicitly
expects them
rescue problematic RMM solves in SOLVE_GMN
emit zero-valued MPCFORCES
export PL and array-format UL Matrix Market files
Main regression point:
MYSTRAN_Validation-main\test_banded.py
Frozen result:
0/2605 failed -> PASS
See:
validation_resume.md
issues_and_decisions.md
In the frozen validation snapshot:
true banded KLL path is still the default
SuperLU is used as a rescue path when:
compact-band storage is too expensive for the matrix shape
DPBTRF fails and the deck family is allowed to rescue
constraint-heavy decks need sparse robustness
dense fallback is not a good path
Approximate split from the unique decks recorded under
MYSTRAN_Validation-main\passed_banded:
total unique passing decks counted: 272
true banded KLL path: 260 decks = 95.59%
SuperLU fallback on KLL: 12 decks = 4.41%
So the stable V1 state is still overwhelmingly banded in practice, with sparse
rescue used only where needed.
This snapshot keeps the older imported banded files and also carries the
follow-up production files that actually define the stable V1 behavior:
Source\LK3\LINK3.f90
Source\LK2\LINK2.f90
Source\LK2\SOLVE_GMN.f90
Source\LK9\L92\OFP2.f90
Source\UTIL\WRITE_MATRIX_MARKET_VECTOR.f90
The older marker style
! --- BANDED_optimizisation -begin-- ! / ! --- BANDED_optimizisation -end-- !
is preserved where it already existed historically. For newer follow-up logic
that never had the old markers, the snippet notes in this snapshot use:
! --- banded_optimization_V1 begin --- !
! --- banded_optimization_V1 end --- !
See:
snippets_banded_optimizationV1.md
Skyline fallback was explored after this work, but it was intentionally
removed from the runtime path because it hurt validation robustness.
The frozen V1 runtime policy is therefore simple: if banded is too costly or
unsuitable, rescue to SuperLU.