fix: invalidate fork-choice head on FCU INVALID with latestValidHash#9332
fix: invalidate fork-choice head on FCU INVALID with latestValidHash#9332lodekeeper wants to merge 2 commits intoChainSafe:unstablefrom
Conversation
Per Engine API spec, when `engine_forkchoiceUpdated` returns INVALID with a latestValidHash, the consensus client must mark the offending head and its ancestors back to (but not including) the LVH as invalid in fork choice and recompute the head. Lodestar was discarding `latestValidHash` and throwing a plain `Error`, leaving the bad branch in place. The result: a CL whose fork-choice picks an invalid EL fork stays wedged — every slot re-fires the same FCU, EL keeps responding INVALID, and the tree never abandons the bad branch. Observed on bal-devnet-6 with `lodestar-nethermind-1` (Lodestar WG topic 9934): block 5597 has two siblings off parent 0x05a771b (5596) — canonical 0x12efd8b (geth, valid) vs orphan 0x19dc0a (ethrex, invalid HeaderGasUsedMismatch). Snooper confirmed nethermind responds INVALID with latestValidHash=0x05a771b; lodestar's fork choice never abandoned the orphan. The pre-gloas newPayload INVALID path already invalidates via `segmentExecStatus.invalidSegmentLVH` in `chain/blocks/index.ts`. Gloas bypasses that path (`verifyBlockExecutionPayload` returns Syncing for gloas blocks; the actual newPayload happens in the new gloas-only `importExecutionPayload.ts`), so the FCU INVALID handler is the only line of defense. This change makes that handler do its job. - `engine/http.ts`: `notifyForkchoiceUpdate` now throws a typed `ForkchoiceUpdateError` carrying `headBlockHash`, `latestValidHash`, and `validationError` instead of dropping the LVH. - `chain/blocks/utils/forkchoiceUpdateInvalid.ts`: new helper that calls `forkChoice.validateLatestHash` with the invalid head + LVH, swallowing any internal failure so the import path keeps moving. - Both post-import FCU `.catch` handlers (`importBlock.ts`, `importExecutionPayload.ts`) detect the typed error and route to the helper. After invalidation, fork-choice score recompute drops the bad head; the next block's FCU naturally targets a different head. - Block production's FCU caller (`produceBlockBody.ts`) is unchanged in behavior — it still propagates the (now typed) error up. Tests: new unit tests for `engine/http` throwing the typed error on INVALID (with both null and non-null LVH) and for the helper plumbing the right `LVHInvalidResponse`. Lint, type-check, and the existing chain/blocks / executionEngine / fork-choice unit suites all pass. Follow-up: gloas-specific newPayload INVALID handling in `importExecutionPayload.ts` should also call `validateLatestHash` to match the pre-gloas safety net. That requires extending fork-choice's `PayloadExecutionStatus` to include Invalid (so a FULL variant can be added as Invalid before propagation) and is left as a separate change. 🤖 Generated with AI assistance
There was a problem hiding this comment.
Code Review
This pull request implements handling for INVALID responses from the Execution Layer during engine_forkchoiceUpdated calls, adhering to the Engine API specification. It introduces a ForkchoiceUpdateError to capture the latestValidHash and a utility function, invalidateForkchoiceHeadFromFcuInvalid, to mark invalid branches in the fork-choice store. This prevents the Beacon Node from remaining wedged on an invalid branch. Review feedback suggests explicitly triggering a head recompute after invalidation to ensure the node immediately transitions to the correct branch and avoids potential issues with validator duties.
| try { | ||
| chain.forkChoice.validateLatestHash({ | ||
| executionStatus: ExecutionStatus.Invalid, | ||
| latestValidExecHash: latestValidHash, | ||
| invalidateFromParentBlockRoot: headBlockRoot, | ||
| invalidateFromParentBlockHash: headBlockHash, | ||
| }); |
There was a problem hiding this comment.
After invalidating the head in the fork-choice store via validateLatestHash, it is highly recommended to trigger a head recompute in the BeaconChain instance. This ensures that the node immediately switches to the correct branch, updates its internal head state (e.g., this.regen.head), and emits the necessary head events. Without this, the node might remain on the invalid head until the next slot start or block import, which could lead to invalid attestations or proposals if the node is a validator.
There was a problem hiding this comment.
Good catch — confirmed. validateLatestHash calls protoArray.applyScoreChanges (updates bestChild/bestDescendant internally) but does NOT call findHead and does NOT update ForkChoice.head. The cached head would stay on the now-invalid block until prepareNextSlot (~slot+8s) or the next importBlock (~slot+12s), so a validator attesting at slot+4s could vote for the stale invalid head.
Fixed in ef2a182: after validateLatestHash succeeds, the helper now calls chain.recomputeForkChoiceHead(ForkchoiceCaller.forkchoiceUpdateInvalid). Added a new caller value for clean metrics, log an info when the head actually switches, and added two unit tests covering the recompute path + the failure-swallowing.
Note: I'm not also wiring the full reorg side-effects (regen updateHeadState, ChainEvent.reorg emission, etc.) that importBlock does after its own recomputeForkChoiceHead. That's a deeper change touching more state and I'd rather keep this PR focused on breaking the wedge — happy to do a follow-up if you'd prefer the full treatment in this PR.
There was a problem hiding this comment.
Good catch. This helper already does the proto-array recompute via chain.recomputeForkChoiceHead(ForkchoiceCaller.forkchoiceUpdateInvalid), so the missing piece is a bit narrower: after that recompute we still do not mirror the usual chain-level head switch path (regen.updateHeadState(...) + head event / related metrics). So fork choice moves immediately, but the chain-facing head can still stay stale until the next slot or import. I agree that follow-up is needed there.
Address Gemini review comment on PR ChainSafe#9332. After `validateLatestHash` marks the bad head + ancestors invalid, `protoArray.applyScoreChanges` updates internal scores but does not refresh `ForkChoice.head`. Without an explicit recompute, the cached head stays on the now-invalid block until the next `prepareNextSlot` (~slot+8s) or block import (~slot+12s), during which a validator could attest or propose against the stale head. Add `ForkchoiceCaller.forkchoiceUpdateInvalid` and call `chain.recomputeForkChoiceHead(...)` from the helper after invalidation succeeds. Log when the head actually switches; gracefully swallow recompute failures so the FCU catch handler does not crash the import path. 🤖 Generated with AI assistance
Motivation
Per Engine API spec, when
engine_forkchoiceUpdatedreturnsINVALIDwith alatestValidHash, the consensus client must mark the offending head and its ancestors back to (but not including) the LVH as invalid in fork choice and recompute the head.Lodestar was discarding
latestValidHash(http.ts:373—latestValidHash: _latestValidHash) and throwing a plainErroron INVALID. The post-import FCU.catchhandlers inimportBlock.tsandimportExecutionPayload.tsonly logged and moved on. The result: a CL whose fork-choice picks an invalid EL fork stays wedged — every slot re-fires the same FCU, EL keeps responding INVALID, and the tree never abandons the bad branch.Observed on bal-devnet-6 with
lodestar-nethermind-1(reported by Stefan, Lodestar WG topic #9934):0x05a771b…(5596): canonical0x12efd8b…(geth, valid) vs orphan0x19dc0a…(ethrex, invalidHeaderGasUsedMismatch)INVALID, latestValidHash=0x05a771b…This is most visible in Gloas because the pre-Gloas safety net is bypassed:
verifyBlockExecutionPayloadreturnsSyncingfor Gloas blocks (the actualnotifyNewPayloadhappens later in the new Gloas-onlyimportExecutionPayload.ts), so the segment-level INVALID handling inchain/blocks/index.ts:106(forkChoice.validateLatestHash(segmentExecStatus.invalidSegmentLVH)) never fires for Gloas. The FCU INVALID handler is the only line of defense, and it was broken.Description
execution/engine/interface.ts: new typed errorForkchoiceUpdateErrorwith{code: INVALID, headBlockHash, latestValidHash, validationError}+isForkchoiceUpdateInvalidErrorpredicate.execution/engine/http.ts:notifyForkchoiceUpdatethrowsForkchoiceUpdateErroron INVALID instead of dropping the LVH.chain/blocks/utils/forkchoiceUpdateInvalid.ts(new): helper that callsforkChoice.validateLatestHashwith the head as the invalidate-from block and the EL's LVH, swallowing any internal failure so the import path keeps moving.chain/blocks/importBlock.ts+chain/blocks/importExecutionPayload.ts: post-import FCU.catchhandlers detect the typed error and route to the helper. After invalidation, fork-choice score recompute drops the bad head; the next block's FCU naturally targets a different head.produceBlock/produceBlockBody.tsis unchanged — it still propagates the (now typed) error up to block production.Tests
executionEngine/http.test.ts: two new tests coveringINVALIDwith non-null and nulllatestValidHash, asserting the typed error fields.chain/blocks/utils/forkchoiceUpdateInvalid.test.ts(new): helper plumbs the rightLVHInvalidResponsetovalidateLatestHash, forwards null LVH unchanged, and swallows internal failures.Test plan
pnpm lint(beacon-node) — cleanpnpm check-types(beacon-node) — 0 errorspnpm build(fork-choice + beacon-node) — cleanpnpm vitest test/unit/executionEngine/— 8/8 pass (2 new + 6 existing)pnpm vitest test/unit/chain/blocks/utils/forkchoiceUpdateInvalid.test.ts— 3/3 passpnpm vitest test/unit/chain/blocks/— 34/34 pass (no regressions)pnpm vitest test/unit/chain/prepareNextSlot.test.ts— 7/7 pass (block-production FCU caller path unaffected)pnpm test:unit(fork-choice) — 111/111 passlodestar-nethermindagainst the original wedge scenario — to be performed separatelyFollow-up
Gloas-specific
notifyNewPayloadINVALID handling inimportExecutionPayload.ts:188-194currently throwsPayloadErrorwithout callingvalidateLatestHash. To match the pre-Gloas safety net, that branch should also invalidate fork choice. Doing it cleanly requires extending fork-choice'sPayloadExecutionStatusto includeInvalidso a FULL variant can be added withInvalidstatus before propagation, which is a more involved change. Tracking as a separate follow-up — this PR's FCU-side fix already breaks the wedge.🤖 Generated with AI assistance