Skip to content

Feature/banded opt v1#238

Closed
realbabilu wants to merge 3 commits into
MystranSolver:mainfrom
realbabilu:feature/banded-opt-v1
Closed

Feature/banded opt v1#238
realbabilu wants to merge 3 commits into
MystranSolver:mainfrom
realbabilu:feature/banded-opt-v1

Conversation

@realbabilu
Copy link
Copy Markdown

Using banded_optimizationV1 snapshot for the current

It is meant to be replayed onto an older tree only after the LAPACK
refactoring stack is already in place.

This snapshot should be applied after:

baseline BLAS / SuperLU selection work
the large lapack_surgery patch
lapack_peel_off
this banded_optimizationV1
The important bit is step 3. This V1 snapshot assumes the post-lapack_peel_off
structure already exists, especially in the LAPACK-facing LINK3, SOLVE_GMN,
matrix-export, and output paths. Applying this snapshot before
lapack_peel_off is likely to produce mismatched call flow and confusing merge
conflicts.

See:

patch_order.md
This is not the original bare banded patch anymore. It is the stable
validation-passing state after the follow-up debugging needed to keep
test_banded.py green.

The frozen behavior is:

keep the original RCM-enabled banded path
keep banded storage and solver-dispatch diagnostics
allow KLL to stay on true banded Cholesky when it is a good fit
bypass banded for compact-band cases that are too expensive or nearly dense
rescue selected static cases to SuperLU when banded/dense factorization is
not appropriate
preserve historical bailout semantics where the validation suite explicitly
expects them
rescue problematic RMM solves in SOLVE_GMN
emit zero-valued MPCFORCES
export PL and array-format UL Matrix Market files
Main regression point:

MYSTRAN_Validation-main\test_banded.py
Frozen result:

0/2605 failed -> PASS
See:

[dev_docs/validation_resume.md)
[dev_docs/issues_and_decisions.md)
In the frozen validation snapshot:

true banded KLL path is still the default
SuperLU is used as a rescue path when:
compact-band storage is too expensive for the matrix shape
DPBTRF fails and the deck family is allowed to rescue
constraint-heavy decks need sparse robustness
dense fallback is not a good path
Approximate split from the unique decks recorded under
MYSTRAN_Validation-main\passed_banded:

total unique passing decks counted: 272
true banded KLL path: 260 decks = 95.59%
SuperLU fallback on KLL: 12 decks = 4.41%
So the stable V1 state is still overwhelmingly banded in practice, with sparse
rescue used only where needed.

This snapshot keeps the older imported banded files and also carries the
follow-up production files that actually define the stable V1 behavior:

Source\LK3\LINK3.f90
Source\LK2\LINK2.f90
Source\LK2\SOLVE_GMN.f90
Source\LK9\L92\OFP2.f90
Source\UTIL\WRITE_MATRIX_MARKET_VECTOR.f90
The older marker style
! --- BANDED_optimizisation -begin-- ! / ! --- BANDED_optimizisation -end-- !
is preserved where it already existed historically. For newer follow-up logic
that never had the old markers, the snippet notes in this snapshot use:

! --- banded_optimization_V1 begin --- !
! --- banded_optimization_V1 end --- !
See:

snippets_banded_optimizationV1.md

Skyline fallback was explored after this work, but it was intentionally
removed from the runtime path because it hurt validation robustness.

The frozen V1 runtime policy is therefore simple: if banded is too costly or
unsuitable, rescue to SuperLU.

It is meant to be replayed onto an older tree only after the LAPACK
refactoring stack is already in place.

This snapshot should be applied after:

baseline BLAS / SuperLU selection work
the large lapack_surgery patch
lapack_peel_off
this banded_optimizationV1
The important bit is step 3. This V1 snapshot assumes the post-lapack_peel_off
structure already exists, especially in the LAPACK-facing LINK3, SOLVE_GMN,
matrix-export, and output paths. Applying this snapshot before
lapack_peel_off is likely to produce mismatched call flow and confusing merge
conflicts.

See:

patch_order.md
This is not the original bare banded patch anymore. It is the stable
validation-passing state after the follow-up debugging needed to keep
test_banded.py green.

The frozen behavior is:

keep the original RCM-enabled banded path
keep banded storage and solver-dispatch diagnostics
allow KLL to stay on true banded Cholesky when it is a good fit
bypass banded for compact-band cases that are too expensive or nearly dense
rescue selected static cases to SuperLU when banded/dense factorization is
not appropriate
preserve historical bailout semantics where the validation suite explicitly
expects them
rescue problematic RMM solves in SOLVE_GMN
emit zero-valued MPCFORCES
export PL and array-format UL Matrix Market files
Main regression point:

MYSTRAN_Validation-main\test_banded.py
Frozen result:

0/2605 failed -> PASS
See:

validation_resume.md
issues_and_decisions.md
In the frozen validation snapshot:

true banded KLL path is still the default
SuperLU is used as a rescue path when:
compact-band storage is too expensive for the matrix shape
DPBTRF fails and the deck family is allowed to rescue
constraint-heavy decks need sparse robustness
dense fallback is not a good path
Approximate split from the unique decks recorded under
MYSTRAN_Validation-main\passed_banded:

total unique passing decks counted: 272
true banded KLL path: 260 decks = 95.59%
SuperLU fallback on KLL: 12 decks = 4.41%
So the stable V1 state is still overwhelmingly banded in practice, with sparse
rescue used only where needed.

This snapshot keeps the older imported banded files and also carries the
follow-up production files that actually define the stable V1 behavior:

Source\LK3\LINK3.f90
Source\LK2\LINK2.f90
Source\LK2\SOLVE_GMN.f90
Source\LK9\L92\OFP2.f90
Source\UTIL\WRITE_MATRIX_MARKET_VECTOR.f90
The older marker style
! --- BANDED_optimizisation -begin-- ! / ! --- BANDED_optimizisation -end-- !
is preserved where it already existed historically. For newer follow-up logic
that never had the old markers, the snippet notes in this snapshot use:

! --- banded_optimization_V1 begin --- !
! --- banded_optimization_V1 end --- !
See:

snippets_banded_optimizationV1.md

Skyline fallback was explored after this work, but it was intentionally
removed from the runtime path because it hurt validation robustness.

The frozen V1 runtime policy is therefore simple: if banded is too costly or
unsuitable, rescue to SuperLU.

Patch package captured from:

`E:\mystran4\MYSTRANSolver-18.0.0.enhanced`

Target paths when applying manually:

- `Source\Modules\LAPACK\*`
- `BLAS\XERBLA.f`

This package represents the final `lapack_surgery` + `lapack_peeloff` state used to build:

- `mystran_lapack_surgery.exe`
- `mystran_lapack_peel_off.exe`

The internal MYSTRAN LAPACK sources were reorganized so MYSTRAN can be built in an external optimized BLAS/LAPACK configuration, especially the OpenBLAS hijack build, without carrying a full internal BLAS implementation.

The intended shape is:

- External OpenBLAS supplies BLAS symbols such as `dgemm_`, `dtrsm_`, etc.
- Regular single-thread SuperLU is linked against the same OpenBLAS BLAS library.
- MYSTRAN keeps only the local `XERBLA.f` error handler from internal BLAS.
- No internal CBLAS or f2c BLAS layer is required for the OpenBLAS configuration.
- Internal MYSTRAN LAPACK entry points that are still needed are retained, but many routines are peeled into helper files so the build can coexist cleanly with optimized external libraries.

Existing internal LAPACK files modified:

- `LAPACK_BLAS_AUX.f`
- `LAPACK_GIV_MGIV_EIG.f`
- `LAPACK_LANCZOS_EIG.f`
- `LAPACK_LIN_EQN_DGB.f`
- `LAPACK_LIN_EQN_DGE.f`
- `LAPACK_LIN_EQN_DPB.f`
- `LAPACK_MISCEL.f`
- `LAPACK_STD_EIG_1.f`
- `LAPACK_SYM_MAT_INV.f`

Additional helper/ext/kernel files added under `Source\Modules\LAPACK`:

- `LAPACK_DGETF2_HELPER.f`
- `LAPACK_DGETRF_HELPER.f`
- `LAPACK_DGETRI_HELPER.f`
- `LAPACK_DGETRS_HELPER.f`
- `LAPACK_DISNAN_HELPER.f`
- `LAPACK_DLABAD_HELPER.f`
- `LAPACK_DLACON_HELPER.f90`
- `LAPACK_DLACPY_HELPER.f`
- `LAPACK_DLAE2_HELPER.f`
- `LAPACK_DLAEV2_HELPER.f`
- `LAPACK_DLAGTS_HELPER.f90`
- `LAPACK_DLAN_HELPER.f90`
- `LAPACK_DLAPY2_HELPER.f`
- `LAPACK_DLAR_ROT_HELPER.f90`
- `LAPACK_DLARF_HELPER.f90`
- `LAPACK_DLARFB_HELPER.f90`
- `LAPACK_DLARFG_HELPER.f90`
- `LAPACK_DLARFT_HELPER.f90`
- `LAPACK_DLARTG_HELPER.f90`
- `LAPACK_DLAS_MISC_HELPER.f90`
- `LAPACK_DLASCL_HELPER.f90`
- `LAPACK_DLASRT_HELPER.f`
- `LAPACK_DLASSQ_HELPER.f`
- `LAPACK_DLAUUM_HELPER.f`
- `LAPACK_DPBCON_HELPER.f`
- `LAPACK_DPBEQU_HELPER.f`
- `LAPACK_DPBSTF_HELPER.f`
- `LAPACK_DPBTF2_HELPER.f`
- `LAPACK_DPBTRF_KERNEL.f`
- `LAPACK_DPBTRS_HELPER.f`
- `LAPACK_DPOTRF_HELPER.f`
- `LAPACK_DPOTRI_HELPER.f`
- `LAPACK_DSTEV_HELPER.f`
- `LAPACK_DSYTF2_HELPER.f`
- `LAPACK_DTRTI2_HELPER.f`
- `LAPACK_DTRTRS_HELPER.f`
- `LAPACK_GIV_MGIV_EIG_HELPER.f`
- `LAPACK_LANCZOS_EIG_HELPER.f`
- `LAPACK_LIN_EQN_DGB_KERNEL.f`
- `LAPACK_LIN_EQN_DGE_ext.f90`
- `LAPACK_MISCEL_ext.f90`
- `LAPACK_POTF2_HELPER.f`
- `LAPACK_STD_EIG_1_ext.f90`
- `LAPACK_STD_EIG_1_HELPER.f`

`BLAS\XERBLA.f` is included for completeness. It was not changed relative to the original tree, but it is the only internal BLAS file intentionally kept in the OpenBLAS hijack layout.

The peel-off work targets these internal LAPACK areas:

- DGE linear equation routines: `DGETF2`, `DGETRF`, `DGETRI`, `DGETRS`
- Symmetric matrix inverse / Cholesky path: `DLAUU2`, `DLAUUM`, `DPOTRF`, `DPOTRI`, `DTRTI2`
- Miscellaneous routines: `DTRTRS`, `DSTEV`
- Standard eigen path: `DSYEV`, `DSYTRD`, `DORGTR`
- General band path: `DGBTRF`, `DGBTRS`, `DGBTF2`
- Positive-definite band path: `DPBEQU`, `DPBTRF`, `DPBTF2`, `DPOTF2`, `DPBCON`, `DPBTRS`, `DSYTF2`

The goal is to keep MYSTRAN's required internal numerical behavior available while reducing coupling to bundled BLAS and making symbol ownership clearer when optimized external libraries are linked.

The tested enhanced build used:

- OpenBLAS import library: `C:\gcc\openblas32\lib\libopenblas.dll.a`
- Runtime DLL directory: `C:\gcc\openblas32\bin`
- Regular SuperLU, not SuperLU-MT
- AVX2-style release flags: `-O3`, `-funroll-loops`, `-march=core-avx2`, `-mtune=core-avx2`
- Conservative floating-point behavior: `-fno-fast-math`, `-ffp-contract=off`

Symbol checks confirmed the produced executables imported `libopenblas.dll` and had OpenBLAS-resolved BLAS imports such as `__imp_dgemm_`, while retaining local `xerbla_`.

Full validation for `mystran_lapack_surgery.exe` and `mystran_lapack_peel_off.exe` both produced:

`1/2605 failed`

The failure was the same near-zero eigen residue:

- Deck: `vic/12/V30 Beam MPC on constrained dof.bdf`
- Quantity: `SC/2/REALEIGENVALUES/MODE/1/CYCLES`
- Expected: `0`
- Tolerance: `1e-05`
- Patched/OpenBLAS result: `1.387039e-05`

This matched the earlier OpenBLAS hijack behavior and is best treated as zero dust rather than a new LAPACK peel-off regression.

Separate benchmark runners with sane zero-dust handling showed:

- OpenBLAS hijack and `lapack_peel_off` had the same failed deck lists.
- Baseline original and baseline AVX2 were cleaner on Benchmark suites than the OpenBLAS builds.
- The Benchmark suites are not clean even on baseline; most baseline Benchmark failures are real validation differences, not zero dust.
This folder is the frozen `banded_optimizationV1` snapshot for the current
stable banded-validation state in:

- `C:\temp\mystran4\MYSTRANSolver-18.0.0.enhanced`

It is meant to be replayed onto an older tree only after the LAPACK
refactoring stack is already in place.

This snapshot should be applied after:

1. baseline BLAS / SuperLU selection work
2. the large `lapack_surgery` patch
3. `lapack_peel_off`
4. this `banded_optimizationV1`

The important bit is step 3. This V1 snapshot assumes the post-`lapack_peel_off`
structure already exists, especially in the LAPACK-facing `LINK3`, `SOLVE_GMN`,
matrix-export, and output paths. Applying this snapshot before
`lapack_peel_off` is likely to produce mismatched call flow and confusing merge
conflicts.

See:

- [patch_order.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/patch_order.md)

This is not the original bare banded patch anymore. It is the stable
validation-passing state after the follow-up debugging needed to keep
`test_banded.py` green.

The frozen behavior is:

- keep the original RCM-enabled banded path
- keep banded storage and solver-dispatch diagnostics
- allow `KLL` to stay on true banded Cholesky when it is a good fit
- bypass banded for compact-band cases that are too expensive or nearly dense
- rescue selected static cases to `SuperLU` when banded/dense factorization is
  not appropriate
- preserve historical bailout semantics where the validation suite explicitly
  expects them
- rescue problematic `RMM` solves in `SOLVE_GMN`
- emit zero-valued `MPCFORCES`
- export `PL` and array-format `UL` Matrix Market files

Main regression point:

- `C:\temp\mystran4\MYSTRAN_Validation-main\test_banded.py`

Frozen result:

- `0/2605 failed -> PASS`

See:

- [validation_resume.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/validation_resume.md)
- [issues_and_decisions.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/issues_and_decisions.md)

In the frozen validation snapshot:

- true banded `KLL` path is still the default
- `SuperLU` is used as a rescue path when:
  - compact-band storage is too expensive for the matrix shape
  - `DPBTRF` fails and the deck family is allowed to rescue
  - constraint-heavy decks need sparse robustness
  - dense fallback is not a good path

Approximate split from the unique decks recorded under
`MYSTRAN_Validation-main\passed_banded`:

- total unique passing decks counted: `272`
- true banded KLL path: `260` decks = `95.59%`
- `SuperLU` fallback on KLL: `12` decks = `4.41%`

So the stable V1 state is still overwhelmingly banded in practice, with sparse
rescue used only where needed.

This snapshot keeps the older imported banded files and also carries the
follow-up production files that actually define the stable V1 behavior:

- `Source\LK3\LINK3.f90`
- `Source\LK2\LINK2.f90`
- `Source\LK2\SOLVE_GMN.f90`
- `Source\LK9\L92\OFP2.f90`
- `Source\UTIL\WRITE_MATRIX_MARKET_VECTOR.f90`

The older marker style
`! --- BANDED_optimizisation -begin-- !` / `! --- BANDED_optimizisation -end-- !`
is preserved where it already existed historically. For newer follow-up logic
that never had the old markers, the snippet notes in this snapshot use:

- `! --- banded_optimization_V1 begin --- !`
- `! --- banded_optimization_V1 end --- !`

See:

- [snippets_banded_optimizationV1.md](C:/temp/mystran4/codex_mod/banded_optimizationV1/dev_docs/snippets_banded_optimizationV1.md)

- Skyline fallback was explored after this work, but it was intentionally
  removed from the runtime path because it hurt validation robustness.
- The frozen V1 runtime policy is therefore simple: if banded is too costly or
  unsuitable, rescue to `SuperLU`.
@realbabilu realbabilu closed this May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant