Skip to content

fix: invalidate fork-choice head on FCU INVALID with latestValidHash#9332

Open
lodekeeper wants to merge 2 commits intoChainSafe:unstablefrom
lodekeeper:fix/fcu-invalid-lvh-handling
Open

fix: invalidate fork-choice head on FCU INVALID with latestValidHash#9332
lodekeeper wants to merge 2 commits intoChainSafe:unstablefrom
lodekeeper:fix/fcu-invalid-lvh-handling

Conversation

@lodekeeper
Copy link
Copy Markdown
Contributor

Motivation

Per Engine API spec, when engine_forkchoiceUpdated returns INVALID with a latestValidHash, the consensus client must mark the offending head and its ancestors back to (but not including) the LVH as invalid in fork choice and recompute the head.

Lodestar was discarding latestValidHash (http.ts:373latestValidHash: _latestValidHash) and throwing a plain Error on INVALID. The post-import FCU .catch handlers in importBlock.ts and importExecutionPayload.ts only logged and moved on. The result: a CL whose fork-choice picks an invalid EL fork stays wedged — every slot re-fires the same FCU, EL keeps responding INVALID, and the tree never abandons the bad branch.

Observed on bal-devnet-6 with lodestar-nethermind-1 (reported by Stefan, Lodestar WG topic #9934):

  • Block 5597 has two siblings off parent 0x05a771b… (5596): canonical 0x12efd8b… (geth, valid) vs orphan 0x19dc0a… (ethrex, invalid HeaderGasUsedMismatch)
  • Snooper req #404675 confirms nethermind responds INVALID, latestValidHash=0x05a771b…
  • Lodestar status throughout: head 8062 / exec-block valid 5846 while network slot ≈ 37500 — wedged

This is most visible in Gloas because the pre-Gloas safety net is bypassed: verifyBlockExecutionPayload returns Syncing for Gloas blocks (the actual notifyNewPayload happens later in the new Gloas-only importExecutionPayload.ts), so the segment-level INVALID handling in chain/blocks/index.ts:106 (forkChoice.validateLatestHash(segmentExecStatus.invalidSegmentLVH)) never fires for Gloas. The FCU INVALID handler is the only line of defense, and it was broken.

Description

  • execution/engine/interface.ts: new typed error ForkchoiceUpdateError with {code: INVALID, headBlockHash, latestValidHash, validationError} + isForkchoiceUpdateInvalidError predicate.
  • execution/engine/http.ts: notifyForkchoiceUpdate throws ForkchoiceUpdateError on INVALID instead of dropping the LVH.
  • chain/blocks/utils/forkchoiceUpdateInvalid.ts (new): helper that calls forkChoice.validateLatestHash with the head as the invalidate-from block and the EL's LVH, swallowing any internal failure so the import path keeps moving.
  • chain/blocks/importBlock.ts + chain/blocks/importExecutionPayload.ts: post-import FCU .catch handlers detect the typed error and route to the helper. After invalidation, fork-choice score recompute drops the bad head; the next block's FCU naturally targets a different head.
  • produceBlock/produceBlockBody.ts is unchanged — it still propagates the (now typed) error up to block production.

Tests

  • executionEngine/http.test.ts: two new tests covering INVALID with non-null and null latestValidHash, asserting the typed error fields.
  • chain/blocks/utils/forkchoiceUpdateInvalid.test.ts (new): helper plumbs the right LVHInvalidResponse to validateLatestHash, forwards null LVH unchanged, and swallows internal failures.

Test plan

  • pnpm lint (beacon-node) — clean
  • pnpm check-types (beacon-node) — 0 errors
  • pnpm build (fork-choice + beacon-node) — clean
  • pnpm vitest test/unit/executionEngine/ — 8/8 pass (2 new + 6 existing)
  • pnpm vitest test/unit/chain/blocks/utils/forkchoiceUpdateInvalid.test.ts — 3/3 pass
  • pnpm vitest test/unit/chain/blocks/ — 34/34 pass (no regressions)
  • pnpm vitest test/unit/chain/prepareNextSlot.test.ts — 7/7 pass (block-production FCU caller path unaffected)
  • pnpm test:unit (fork-choice) — 111/111 pass
  • Live verification on bal-devnet-6 with lodestar-nethermind against the original wedge scenario — to be performed separately

Follow-up

Gloas-specific notifyNewPayload INVALID handling in importExecutionPayload.ts:188-194 currently throws PayloadError without calling validateLatestHash. To match the pre-Gloas safety net, that branch should also invalidate fork choice. Doing it cleanly requires extending fork-choice's PayloadExecutionStatus to include Invalid so a FULL variant can be added with Invalid status before propagation, which is a more involved change. Tracking as a separate follow-up — this PR's FCU-side fix already breaks the wedge.

🤖 Generated with AI assistance

Per Engine API spec, when `engine_forkchoiceUpdated` returns INVALID with a
latestValidHash, the consensus client must mark the offending head and its
ancestors back to (but not including) the LVH as invalid in fork choice and
recompute the head. Lodestar was discarding `latestValidHash` and throwing a
plain `Error`, leaving the bad branch in place. The result: a CL whose
fork-choice picks an invalid EL fork stays wedged — every slot re-fires the
same FCU, EL keeps responding INVALID, and the tree never abandons the bad
branch.

Observed on bal-devnet-6 with `lodestar-nethermind-1` (Lodestar WG topic 9934):
block 5597 has two siblings off parent 0x05a771b (5596) — canonical 0x12efd8b
(geth, valid) vs orphan 0x19dc0a (ethrex, invalid HeaderGasUsedMismatch).
Snooper confirmed nethermind responds INVALID with latestValidHash=0x05a771b;
lodestar's fork choice never abandoned the orphan.

The pre-gloas newPayload INVALID path already invalidates via
`segmentExecStatus.invalidSegmentLVH` in `chain/blocks/index.ts`. Gloas
bypasses that path (`verifyBlockExecutionPayload` returns Syncing for gloas
blocks; the actual newPayload happens in the new gloas-only
`importExecutionPayload.ts`), so the FCU INVALID handler is the only line of
defense. This change makes that handler do its job.

- `engine/http.ts`: `notifyForkchoiceUpdate` now throws a typed
  `ForkchoiceUpdateError` carrying `headBlockHash`, `latestValidHash`, and
  `validationError` instead of dropping the LVH.
- `chain/blocks/utils/forkchoiceUpdateInvalid.ts`: new helper that calls
  `forkChoice.validateLatestHash` with the invalid head + LVH, swallowing any
  internal failure so the import path keeps moving.
- Both post-import FCU `.catch` handlers (`importBlock.ts`,
  `importExecutionPayload.ts`) detect the typed error and route to the helper.
  After invalidation, fork-choice score recompute drops the bad head; the
  next block's FCU naturally targets a different head.
- Block production's FCU caller (`produceBlockBody.ts`) is unchanged in
  behavior — it still propagates the (now typed) error up.

Tests: new unit tests for `engine/http` throwing the typed error on INVALID
(with both null and non-null LVH) and for the helper plumbing the right
`LVHInvalidResponse`. Lint, type-check, and the existing chain/blocks /
executionEngine / fork-choice unit suites all pass.

Follow-up: gloas-specific newPayload INVALID handling in
`importExecutionPayload.ts` should also call `validateLatestHash` to match
the pre-gloas safety net. That requires extending fork-choice's
`PayloadExecutionStatus` to include Invalid (so a FULL variant can be added
as Invalid before propagation) and is left as a separate change.

🤖 Generated with AI assistance
@lodekeeper lodekeeper requested a review from a team as a code owner May 6, 2026 13:12
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements handling for INVALID responses from the Execution Layer during engine_forkchoiceUpdated calls, adhering to the Engine API specification. It introduces a ForkchoiceUpdateError to capture the latestValidHash and a utility function, invalidateForkchoiceHeadFromFcuInvalid, to mark invalid branches in the fork-choice store. This prevents the Beacon Node from remaining wedged on an invalid branch. Review feedback suggests explicitly triggering a head recompute after invalidation to ensure the node immediately transitions to the correct branch and avoids potential issues with validator duties.

Comment on lines +35 to +41
try {
chain.forkChoice.validateLatestHash({
executionStatus: ExecutionStatus.Invalid,
latestValidExecHash: latestValidHash,
invalidateFromParentBlockRoot: headBlockRoot,
invalidateFromParentBlockHash: headBlockHash,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

After invalidating the head in the fork-choice store via validateLatestHash, it is highly recommended to trigger a head recompute in the BeaconChain instance. This ensures that the node immediately switches to the correct branch, updates its internal head state (e.g., this.regen.head), and emits the necessary head events. Without this, the node might remain on the invalid head until the next slot start or block import, which could lead to invalid attestations or proposals if the node is a validator.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — confirmed. validateLatestHash calls protoArray.applyScoreChanges (updates bestChild/bestDescendant internally) but does NOT call findHead and does NOT update ForkChoice.head. The cached head would stay on the now-invalid block until prepareNextSlot (~slot+8s) or the next importBlock (~slot+12s), so a validator attesting at slot+4s could vote for the stale invalid head.

Fixed in ef2a182: after validateLatestHash succeeds, the helper now calls chain.recomputeForkChoiceHead(ForkchoiceCaller.forkchoiceUpdateInvalid). Added a new caller value for clean metrics, log an info when the head actually switches, and added two unit tests covering the recompute path + the failure-swallowing.

Note: I'm not also wiring the full reorg side-effects (regen updateHeadState, ChainEvent.reorg emission, etc.) that importBlock does after its own recomputeForkChoiceHead. That's a deeper change touching more state and I'd rather keep this PR focused on breaking the wedge — happy to do a follow-up if you'd prefer the full treatment in this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This helper already does the proto-array recompute via chain.recomputeForkChoiceHead(ForkchoiceCaller.forkchoiceUpdateInvalid), so the missing piece is a bit narrower: after that recompute we still do not mirror the usual chain-level head switch path (regen.updateHeadState(...) + head event / related metrics). So fork choice moves immediately, but the chain-facing head can still stay stale until the next slot or import. I agree that follow-up is needed there.

Address Gemini review comment on PR ChainSafe#9332. After `validateLatestHash`
marks the bad head + ancestors invalid, `protoArray.applyScoreChanges`
updates internal scores but does not refresh `ForkChoice.head`. Without
an explicit recompute, the cached head stays on the now-invalid block
until the next `prepareNextSlot` (~slot+8s) or block import (~slot+12s),
during which a validator could attest or propose against the stale head.

Add `ForkchoiceCaller.forkchoiceUpdateInvalid` and call
`chain.recomputeForkChoiceHead(...)` from the helper after invalidation
succeeds. Log when the head actually switches; gracefully swallow recompute
failures so the FCU catch handler does not crash the import path.

🤖 Generated with AI assistance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant