fix: checkpoint sync for skipped slot#9329
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for skipped-slot checkpoint synchronization, ensuring that the first batch of a sync chain can correctly fetch the parent payload envelope when necessary. Key changes include updates to the Batch and SyncChain logic, the addition of a createFromBid method for cache seeding, and improved validation for range sync responses. The reviewer suggested a minor cleanup in packages/beacon-node/src/chain/blocks/index.ts to remove a redundant variable, which is a valid improvement.
28465ae to
b1ed8b8
Compare
Performance Report✔️ no performance regression detected Full benchmark results
|
|
@lodekeeper please review |
lodekeeper
left a comment
There was a problem hiding this comment.
Main logic looks sound overall. I left one sync-robustness note on the new parent-by-root column path. I also agree with the Gemini cleanup on the redundant blockSlots temporary in processBlocks.
|
|
||
| private downloadByRange: SyncChainFns["downloadByRange"] = async (peer, batch) => { | ||
| const batchBlocks = batch.getBlocks(); | ||
| const requests = batch.getRequestsForPeer(peer); |
There was a problem hiding this comment.
🟡 Parent by-root column requests still bypass peer custody filtering
getRequestsForPeer() only filters columnsRequest, but this new skipped-checkpoint path can also carry sampled columns in parentPayloadRequest.columns. That means we can still select a peer that did not advertise some of those columns, then immediately ask it for them by-root in the first batch. In the best case that just churns retries; in the worst case it can stall the very path this PR is trying to unblock.
Could we run parentPayloadRequest.columns through the same peer.custodyColumns filter before spreading requests into downloadByRange()?
|
Reviewed — left one note on the new parent-by-root column request path, and I agree with the Gemini cleanup on the redundant |
| const blockState = await this.regen.getBlockSlotState( | ||
| protoBlock, | ||
| protoBlock.slot, | ||
| {dontTransferCache: true}, | ||
| RegenCaller.processBlock |
There was a problem hiding this comment.
can do
const blockState = await this.regen.getBlockSlotState(
protoBlock,
protoBlock.slot,
{dontTransferCache: true},
RegenCaller.processBlock
).catch(() =>
// only happen at the 1st batch of skipped slot checkpoint sync
this.regen.getClosestHeadState(protoBlock)
);There was a problem hiding this comment.
Behavior-wise this is equivalent to the current wrapError(...) branch. I slightly prefer the expanded form here because this is a pretty unusual path: we are intentionally swallowing a regen failure and falling back to getClosestHeadState() only for the skipped-checkpoint case, so keeping the err branch explicit makes that special-case easier to spot (and easier to tighten later if we ever want to narrow which regen failures are allowed to fall through). So I think the current form is fine as-is.
Motivation
Lodestar is not able to do checkpoint sync from skipped slot checkpoint because it does not download the parent payload envelope to feed forkchoice
Description
Batch.getRequest()decides which requests to downloadBatch.downloadSuccess()handler: need to make sure parent envelope is fully downloadeddownloadByRangeheuristic, it needs to download whatever specified inBatch.getRequest()checkpointSync.test.tse2eCloses #9319
AI Assistance Disclosure
Create with the help of Claude