Skip to content

fix: alter topic --replication-factor infinite loop when broker is unavailable#318

Open
d-rk wants to merge 1 commit intomainfrom
feature/256-fix-replication-factor-infinite-loop
Open

fix: alter topic --replication-factor infinite loop when broker is unavailable#318
d-rk wants to merge 1 commit intomainfrom
feature/256-fix-replication-factor-infinite-loop

Conversation

@d-rk
Copy link
Copy Markdown
Collaborator

@d-rk d-rk commented Mar 16, 2026

Summary

  • Fixes alter topic --replication-factor entering an infinite loop when a broker is unavailable (e.g., down or unreachable)
  • Filters brokers by connectivity (broker.Connected()) before building replica assignments, so replicas are only assigned to available brokers
  • Prioritizes removing replicas from unavailable brokers when decreasing replication factor
  • Adds a 10-minute timeout to the reassignment polling loop as a safety net

Changes

internal/topic/topic-operation.go

  • AlterTopic: Uses broker.Connected() to filter available brokers. Only available brokers are included in brokerReplicaCount. Unavailable brokers are warned about via output.Warnf. The replication factor validation now checks against available broker count. The reassignment polling loop now has a 10-minute deadline.
  • getTargetReplicas: When decreasing replication factor, brokers not present in brokerReplicaCount (i.e., unavailable) are sorted to the end of the replica list so they are removed first.

internal/topic/topic-operation_test.go (new)

  • Unit tests for getTargetReplicas covering: normal increase/decrease, decrease with unavailable brokers, multiple unavailable brokers, all unavailable, and not enough brokers error.

CHANGELOG.md

  • Added entry under [Unreleased]

Closes #256

…available

When a broker is down, the reassignment loop could run infinitely because:
1. Replicas were assigned to unavailable brokers that could never accept them
2. The polling loop had no timeout

This fix:
- Filters brokers by connectivity before building replica assignments
- Prioritizes removing replicas from unavailable brokers when decreasing RF
- Adds a 10-minute timeout to the reassignment polling loop

Closes #256
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

alter topic --replication-factor infinite loop when ISR missing

1 participant