Skip to content

handle reorgs for proposer preferences#16651

Open
james-prysm wants to merge 41 commits into
developfrom
propose-preferences-reorgs
Open

handle reorgs for proposer preferences#16651
james-prysm wants to merge 41 commits into
developfrom
propose-preferences-reorgs

Conversation

@james-prysm
Copy link
Copy Markdown
Contributor

@james-prysm james-prysm commented Apr 9, 2026

What type of PR is this?

Bug fix

What does this PR do? Why is it needed?

The proposer preference cache was keyed by slot only (map[Slot]ProposerPreference) with first-write-wins semantics. During a reorg that changes the proposer shuffling (different RANDAO at epoch boundary), a new proposer's preferences for the same slot would be rejected because the slot already had an entry. This caused:

  • Gossip dedup blocking the new proposer's preferences
  • Bid validation using the wrong validator's fee recipient and gas limit
  • Payload attributes using the wrong fee recipient

Makes the proposer preferences cache reorg-safe by re-keying from slot to (slot, validatorIndex). After a reorg that changes the proposer shuffling, the correct proposer's preferences are resolved dynamically from the head state's proposer lookahead.

Reproducing reorg testing with Kurtosis

To validate this change under real reorgs, add a temporary broadcast delay flag that is NOT part of this PR. Apply the following patch locally, build new images, and run with the kurtosis config below.

1. Patch config/features/flags.go

Add a flag variable before DisableDutiesV2:

reorgTestBroadcastDelay = &cli.DurationFlag{
    Name:  "reorg-test-broadcast-delay",
    Usage: "(Testing): Delays P2P block broadcast by this duration while processing the block locally first.",
}

Add reorgTestBroadcastDelay, to the BeaconChainFlags slice.

2. Patch config/features/config.go

Add a field to the Flags struct:

ReorgTestBroadcastDelay time.Duration

Add to ConfigureBeaconChain() before Init(cfg):

if ctx.IsSet(reorgTestBroadcastDelay.Name) {
    cfg.ReorgTestBroadcastDelay = ctx.Duration(reorgTestBroadcastDelay.Name)
}

3. Patch beacon-chain/rpc/prysm/v1alpha1/validator/proposer.go

Add "github.com/OffchainLabs/prysm/v7/config/features" to imports.

Replace broadcastReceiveBlock with:

func (vs *Server) broadcastReceiveBlock(ctx context.Context, wg *sync.WaitGroup, block interfaces.SignedBeaconBlock, root [fieldparams.RootLength]byte) error {
    delay := features.Get().ReorgTestBroadcastDelay
    if delay > 0 {
        // Receive locally first, then delay broadcast — creates a divergent fork.
        vs.BlockNotifier.BlockFeed().Send(&feed.Event{
            Type: blockfeed.ReceivedBlock,
            Data: &blockfeed.ReceivedBlockData{SignedBlock: block},
        })
        if err := vs.BlockReceiver.ReceiveBlock(ctx, block, root, nil); err != nil {
            return errors.Wrap(err, "receive block")
        }
        time.Sleep(delay)
        if err := vs.broadcastBlock(ctx, wg, block, root); err != nil {
            return errors.Wrap(err, "broadcast block")
        }
        return nil
    }

    if err := vs.broadcastBlock(ctx, wg, block, root); err != nil {
        return errors.Wrap(err, "broadcast block")
    }
    vs.BlockNotifier.BlockFeed().Send(&feed.Event{
        Type: blockfeed.ReceivedBlock,
        Data: &blockfeed.ReceivedBlockData{SignedBlock: block},
    })
    if err := vs.BlockReceiver.ReceiveBlock(ctx, block, root, nil); err != nil {
        return errors.Wrap(err, "receive block")
    }
    return nil
}

4. Kurtosis config (gloas-config-4node-reorg.yml)

participants:
  el:
    el_type: geth
    el_image: ethpandaops/geth:glamsterdam-devnet-0
  cl:
    cl_type: prysm
    cl_image: gcr.io/offchainlabs/prysm/beacon-chain:latest
    vc_image: gcr.io/offchainlabs/prysm/validator:latest
    supernode: true
    count: 2
    vc_extra_params:
      - "--verbosity=debug"
  - el_type: geth
    el_image: ethpandaops/geth:epbs-devnet-0
    cl_type: prysm
    cl_image: gcr.io/offchainlabs/prysm/beacon-chain:latest
    vc_image: gcr.io/offchainlabs/prysm/validator:latest
    supernode: true
    count: 2
    cl_extra_params:
      - "--reorg-test-broadcast-delay=5s"
    vc_extra_params:
      - "--verbosity=debug"

network_params:
  fulu_fork_epoch: 0
  gloas_fork_epoch: 2
  seconds_per_slot: 4
  genesis_delay: 40

additional_services:
  - dora

global_log_level: debug

dora_params:
  image: ethpandaops/dora:gloas-support

5. Run and monitor

kurtosis run --enclave gloas-reorg github.com/ethpandaops/ethereum-package \
  --args-file gloas-config-4node-reorg.yml

# Check for reorgs (delayed nodes experience them)
kurtosis service logs gloas-reorg cl-3-prysm-geth --all 2>&1 | grep "Chain reorg occurred"

# Check proposer preferences are flowing
kurtosis service logs gloas-reorg cl-1-prysm-geth --all 2>&1 | grep "Processed signed proposer"

# Key log: cache accepted a different validator's preference for a slot that already had one.
# This is the reorg-safety signal — proves the (slot, validatorIndex) keying works.
kurtosis service logs gloas-reorg cl-3-prysm-geth --all 2>&1 | grep "possible reorg"

With 4-second slots and a 5-second delay, the delayed nodes' blocks arrive after the next slot begins, reliably triggering reorgs (depth 2-3).

Logs to look for

Log message Source Meaning
Chain reorg occurred blockchain A reorg happened on this node
New proposer preference for slot that already has a different validator (possible reorg) cache The cache accepted a second validator's preference for the same slot — proves the reorg-safe keying works
Processed signed proposer preferences rpc/validator VC submitted preferences via RPC (shows broadcast/duplicate/total counts)

Which issues(s) does this PR fix?

Fixes ##16616

related specs
ethereum/consensus-specs#5196
ethereum/consensus-specs#5190
ethereum/consensus-specs#5191

Other notes for review

Acknowledgements

  • I have read CONTRIBUTING.md.
  • I have included a uniquely named changelog fragment file.
  • I have added a description with sufficient context for reviewers to understand this PR.
  • I have tested that my changes work as expected and I added a testing plan to the PR description (if applicable).

@james-prysm james-prysm marked this pull request as ready for review April 10, 2026 21:55
@james-prysm james-prysm marked this pull request as draft April 29, 2026 07:52
james-prysm added a commit that referenced this pull request Apr 30, 2026
Squashed cherry-pick of PR #16651 onto glamsterdam-devnet-2.
james-prysm added a commit that referenced this pull request Apr 30, 2026
Squashed cherry-pick of PR #16651 onto glamsterdam-devnet-2.
@james-prysm james-prysm marked this pull request as ready for review May 7, 2026 19:04
// NextSlotState may return a state at slot < boundarySlot when the
// dependent block was at an earlier slot (empty boundary slot); only use it
// if it lands exactly on the boundary, otherwise fall through to load+advance.
if cached := transition.NextSlotState(dependentRoot[:], boundarySlot); cached != nil && cached.Slot() == boundarySlot {
Copy link
Copy Markdown
Contributor Author

@james-prysm james-prysm May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for reviewer please take a look at this to see if it makes sense, this is for getting proposer preference messages at the epoch boundary when the head state could still be behind so we need to use the next slot state cache

Comment thread beacon-chain/sync/validate_execution_payload_bid.go Outdated
Comment thread beacon-chain/core/helpers/block.go Outdated
// state.block_roots[start_slot(epoch(slot)-1) - 1]. Returns
// ErrProposerDependentRootUnderflow when the proposal epoch is < 2; the spec's
// fallback to the genesis block root is the caller's responsibility.
func ProposerDependentRoot(st state.ReadOnlyBeaconState, slot primitives.Slot) ([32]byte, error) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this a state getter like st.ProposerDependentRoot? you are accessing state.BlockRootAtIndex at the end anyway

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion, made it a state getter in 9e3126f

return cache.TrackedValidator{}, false
}
var feeRecipient primitives.ExecutionAddress
copy(feeRecipient[:], pref.FeeRecipient)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to copy here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 9e3126f

Comment on lines +15 to +19
// trackedProposer now anchors preferences on dependent_root derived from the
// passed state (state.block_roots lookup). At slot 0 the lookup underflows so
// proposerPreference falls back to the no-cache path; cached-preference
// behavior is exercised end-to-end by the gossip and bid validation tests
// under beacon-chain/sync.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are those comments helpful? they seem like mostly used by AI

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no its not sorry that was a miss on personal review , fixed in 9e3126f

Comment on lines +36 to +40
dependentRoot [32]byte,
slot primitives.Slot,
validatorIndex primitives.ValidatorIndex,
feeRecipient []byte,
gasLimit uint64,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's enough arguments should we just pass in the preference

)
}

valIdx := msg.Message.ValidatorIndex
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this next to L94?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 9e3126f

Comment on lines +80 to +81
var dependentRoot [fieldparams.RootLength]byte
copy(dependentRoot[:], msg.Message.DependentRoot)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to copy?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 9e3126f

Comment on lines +158 to +165
epoch := slots.ToEpoch(slot)
if epoch < 2 {
root, err := s.cfg.beaconDB.GenesisBlockRoot(ctx)
if err != nil {
return [32]byte{}, errors.Wrap(err, "genesis block root")
}
return root, nil
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add this to DependentRootForEpoch rather than doing it for all the callers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 9e3126f


dependentRoot := bytesutil.ToBytes32(signedPreferences.Message.DependentRoot)
// [IGNORE] block with root preferences.dependent_root has been seen.
if !s.cfg.chain.InForkchoice(dependentRoot) && !s.cfg.beaconDB.HasBlock(ctx, dependentRoot) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why put this here rather than part of verifier package.. and we can reuse similar method there

// NextSlotState may return a state at slot < boundarySlot when the
// dependent block was at an earlier slot (empty boundary slot); only use it
// if it lands exactly on the boundary, otherwise fall through to load+advance.
if cached := transition.NextSlotState(dependentRoot[:], boundarySlot); cached != nil && cached.Slot() == boundarySlot {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use the next slot state cache here, it will surely be a misse right because we are getting an old boundary slot?

Copy link
Copy Markdown
Contributor Author

@james-prysm james-prysm May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried to isolate it more, this is for the case of getting the a proposer preference for the next epoch when our head is not up to date for slot 0 , also a part of 9e3126f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Unassigned

Development

Successfully merging this pull request may close these issues.

2 participants