fix: gate loadOtherState validators/balances preload behind opt-in#9245
fix: gate loadOtherState validators/balances preload behind opt-in#9245
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an optional preloadValidatorsAndBalances flag to the loadOtherState method, enabling eager materialization of validator and balance arrays only when necessary, such as during block replay. This change helps avoid significant memory overhead in API paths that only require specific fields. The update includes modifications to the BeaconStateView implementation, its interface, and the persistent checkpoint cache, along with a new unit test. A review comment suggests refactoring the inline options type into a named interface to improve code maintainability.
914f164 to
74b3a11
Compare
74b3a11 to
af04829
Compare
Performance Report✔️ no performance regression detected Full benchmark results
|
|
Cross-linking from #9246: this PR addresses the API-path memory spike, but there is a separate correctness bug in the same The cache-aliasing bug in The default Why the preload gating in this PR does not fix it The poisoning happens inside
So this PR correctly fixes the memory spike but leaves the correctness bug untouched. Also relevant: Fix #9246 (two-line |
lodekeeper
left a comment
There was a problem hiding this comment.
LGTM. Surgical fix: gates the eager getAllReadonlyValues() / getAll() preload behind an opt-in flag and only persistentCheckpointsCache (which always consumes the state immediately via block replay) sets it.
Correctness: Interface + impl + single call-site change are consistent. All six other loadOtherState() callers (api/impl/beacon/state, api/impl/validator, api/impl/proof, api/impl/lodestar × 2, plus the unit test) now revert to lazy materialization — restores pre-#8857 semantics on API paths. Optional arg keeps the signature backwards-compatible.
Risk: Low. This just narrows an existing optimization to where it's actually needed.
Independence from #9246: Confirmed — this PR doesn't touch loadState() where the cache-aliasing bug lives, so the two fixes address distinct v1.42.0 regressions (memory spike vs. withdrawal mismatch). Both are needed for the public-node redeploy.
Nice minimal fix 👍
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #9245 +/- ##
=========================================
Coverage 52.53% 52.53%
=========================================
Files 848 848
Lines 61405 61405
Branches 4525 4525
=========================================
Hits 32259 32259
Misses 29081 29081
Partials 65 65 🚀 New features to boost your workflow:
|
## Summary
Fixes the v1.42.0 "Withdrawal mismatch at index=0" regression by
changing the two `clone()` calls in `loadState.ts` to `clone(true)`
(i.e. `dontTransferCache=true`).
The default `clone()` in `@chainsafe/ssz` transfers the source
sub-view's cache to the new instance, which means the
`migratedState.validators` sub-view and the seed container's cached
child-view snapshot share the **same** internal `nodes[]` and `caches[]`
arrays. A subsequent `migratedState.commit()` writes modified validator
/ inactivity-score nodes into those shared arrays, silently corrupting
the seed state's cache snapshot at the modified indices.
The corruption stays latent until the seed state is later cloned with
transfer-cache enabled — the path `verifyBlock` takes via
`preState.clone({dontTransferCache: false})`. On that next-block clone,
reads at the modified index return the migrated state's validator
instead of the seed's, which surfaces as `Withdrawal mismatch at
index=0` divergences between Lodestar and EL.
### Timeline of the corruption
```
// Inside loadState() with seedState = head state:
migratedState.validators = seedState.validators.clone();
// ^ migrated.validators.nodes === seedState.caches[validatorsIndex].nodes (shared)
for (const i of modifiedValidators) {
migratedState.validators.set(i, loadValidator(...)); // staged in viewsChanged
}
migratedState.commit();
// ^ arrayComposite.js commit(): `this.nodes[index] = node`
// writes newValidator into the SHARED nodes[] array.
// seedState.caches[validatorsIndex].nodes[modifiedIndex] is now poisoned.
```
On the next block:
```
const preState = headState.clone(); // default dontTransferCache=false
// ^ transfers the poisoned caches[] snapshot to preState
preState.validators.getReadonly(i); // returns migrated validator, not seed's
// -> getExpectedWithdrawals reads wrong validator at index 0
// -> "Withdrawal mismatch at index=0"
```
### Root cause
Introduced by #8857 (`chore: consume BeaconStateView`) which added the
`loadOtherState` / shared-head seed path that exercises this clone in
production.
## Test plan
- [x] New regression test `loadState does not poison seed state's cache`
in `packages/state-transition/test/unit/util/loadState.test.ts`
- [x] Verified the test FAILS without the fix (reads `0xaa`-filled
validator on `postState.clone()`) and PASSES with the fix
- [x] All 176 existing `state-transition` util tests pass
- [x] `check-types` and `lint` clean
## Relation to #9245
PR #9245 (`fix: gate loadOtherState validators/balances preload behind
opt-in`) addresses a different regression from the same #8857-era
changes — the eager `getAllReadonlyValues()` preload causing memory
spikes on the API path. The two fixes are **independent and both
needed**:
- #9245 fixes the API-path memory spike.
- This PR fixes the correctness bug (withdrawal mismatch). Notably,
`persistentCheckpointsCache` also exercises `loadState()` with the
problematic `clone()`, so #9245's preload gating alone doesn't cover the
full surface.
🤖 Generated with AI assistance
---------
Co-authored-by: lodekeeper <[email protected]>
Co-authored-by: Nico Flaig <[email protected]>
Motivation
Main-thread memory spike on
serveHistoricalState: truenodes after #8857#8857 rerouted the API path through
getState()inpackages/beacon-node/src/api/impl/beacon/state/index.tsfrom the lightweightloadState(...)toBeaconStateView.loadOtherState(), which eagerly materializesall validators and balances. That is only needed for
persistentCheckpointsCache.Description
loadOtherState()opt-in via{preloadValidatorsAndBalances: true}. Default is lazy, matching pre-chore: consume BeaconStateView #8857 semantics.persistentCheckpointsCachepasses in this flag as truelodestar/packages/beacon-node/src/api/impl/beacon/state/index.ts
Line 34 in 9fa9f08
AI Assistance Disclosure
This PR was developed with AI assistance (Claude Code).