Skip to content

fix(ios): add jitter to reconnection backoff to prevent thundering herd#593

Open
bianbiandashen wants to merge 1 commit intoamantus-ai:mainfrom
bianbiandashen:fix/ios-reconnection-jitter
Open

fix(ios): add jitter to reconnection backoff to prevent thundering herd#593
bianbiandashen wants to merge 1 commit intoamantus-ai:mainfrom
bianbiandashen:fix/ios-reconnection-jitter

Conversation

@bianbiandashen
Copy link
Copy Markdown

Summary

Add jitter (randomization) to exponential backoff in iOS reconnection logic to prevent the "thundering herd" problem.

Problem

When many iOS clients disconnect simultaneously (e.g., server restart, network outage, or app background/foreground transition), they all use identical exponential backoff intervals:

Client A: reconnect at 1s, 2s, 4s, 8s, 16s...
Client B: reconnect at 1s, 2s, 4s, 8s, 16s...
Client C: reconnect at 1s, 2s, 4s, 8s, 16s...

This causes all clients to hit the server at exactly the same moments, creating load spikes that can:

  • Overwhelm the server during recovery
  • Cause cascading failures
  • Extend the overall recovery time

Solution

Add jitter (randomization) to spread reconnection attempts:

static func calculateBackoff(
    attempt: Int,
    baseDelay: TimeInterval = 1.0,
    maxDelay: TimeInterval = 60.0,
    jitterFactor: Double = 0.3  // NEW: 30% randomization
) -> TimeInterval {
    let exponentialDelay = baseDelay * pow(2.0, Double(attempt - 1))
    let cappedDelay = min(exponentialDelay, maxDelay)
    let jitter = cappedDelay * jitterFactor * Double.random(in: 0...1)
    return cappedDelay + jitter
}

With 30% jitter on a 16s base delay:

  • Without jitter: all clients retry at exactly 16s
  • With jitter: clients retry between 16s and 20.8s

This spreads the load and allows the server to recover gracefully.

Additional Fix: Max Reconnect Attempts

Also added a maximum reconnect limit (10 attempts) to BufferWebSocketClient to prevent infinite retry loops when the server is permanently unavailable.

Files Changed

  • ios/VibeTunnel/Services/ReconnectionManager.swift

    • Added jitterFactor parameter to calculateBackoff()
    • Updated performReconnection() to use jitter
  • ios/VibeTunnel/Services/BufferWebSocketClient.swift

    • Added jitter to scheduleReconnect()
    • Added maxReconnectAttempts constant (10)
    • Added max attempts check to prevent infinite retries

Test Plan

  • Verify jitter produces varied delays (not deterministic)
  • Verify max attempts stops reconnection after 10 failures
  • Verify reconnection still works correctly with jitter

When many clients disconnect simultaneously (e.g., server restart or
network outage), they all attempt to reconnect at the same exponential
intervals. This causes a 'thundering herd' effect where all clients
hit the server at the same moment, potentially overwhelming it.

Changes:
- Add jitter parameter (default 30%) to ReconnectionManager.calculateBackoff()
- Update performReconnection() to use the new jitter-enabled backoff
- Add jitter to BufferWebSocketClient reconnection delay
- Add max reconnect attempts (10) to BufferWebSocketClient to prevent
  infinite retry loops

The jitter spreads reconnection attempts over time by adding a random
delay of 0-30% of the base delay. For example, with a 16s base delay:
- Without jitter: all clients retry at exactly 16s
- With 30% jitter: clients retry between 16s and 20.8s
niklassaers added a commit to niklassaers/vibetunnel that referenced this pull request Feb 26, 2026
Add jitter to exponential backoff in both BufferWebSocketClient and
ReconnectionManager to prevent thundering herd on server restart.
Cap BufferWebSocketClient reconnection at 10 attempts.

Integrates PR amantus-ai#593.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant