Skip to content

test(client): fix three flaky tests around socket/connect lifecycle#3299

Open
nkaradzhov wants to merge 3 commits into
redis:masterfrom
nkaradzhov:fix-flaky-socket-timeout-test
Open

test(client): fix three flaky tests around socket/connect lifecycle#3299
nkaradzhov wants to merge 3 commits into
redis:masterfrom
nkaradzhov:fix-flaky-socket-timeout-test

Conversation

@nkaradzhov
Copy link
Copy Markdown
Collaborator

@nkaradzhov nkaradzhov commented May 29, 2026

Summary

Three flaky tests in packages/client consolidated into one branch, one commit each:

  • Socket > socketTimeout > should timeout with positive socketTimeout values — listener attachment race let the socket-timeout error surface as uncaught on slow CI.
  • Client > ConnectionTimeoutErrorconnectTimeout: 1 raced a 1ms event-loop block; on fast hardware the connect event landed before the timeout callback and the connection succeeded.
  • Client > PubSub > should resubscribeonce(subscriber, 'error') removed itself after the first emit, leaving the second (transient) error during the forced reconnect uncaught.

All three are wired up via the same root-cause shape: relying on a one-shot once listener for an event source that can fire more than once. The fix in each case is a permanent listener plus a one-shot promise gate to keep the test deterministic.

The ConnectionTimeoutError case is purely a timing margin bump.

Test plan

  • should timeout with positive socketTimeout values × 10 runs locally — all pass
  • ConnectionTimeoutError × 5 runs locally — all pass
  • should resubscribe × 10 runs locally — all pass
  • Full Socket spec — 11/11 pass
  • Full PubSub spec — 5/5 pass
  • CI on Node 18, 20, 22, 24

🤖 Generated with Claude Code


Note

Low Risk
Only test files and timing/listener wiring change; no runtime client behavior is modified.

Overview
Stabilizes three flaky client tests around socket and reconnect timing; no production code changes.

PubSub should resubscribe replaces once(subscriber, 'error') with a permanent error listener plus a one-shot promise so extra errors during forced reconnect do not become uncaught, while still waiting on the first error.

ConnectionTimeoutError lengthens the deliberate event-loop block from ~1ms to ~50ms so connectTimeout: 1 reliably wins over a fast TCP connect on quick hosts.

socketTimeout positive timeout raises the timeout to 200ms, uses disableClientSetup, connects explicitly with reconnectStrategy: false, and attaches a permanent error listener before connect so timeout/handshake errors stay handled and assertions run deterministically.

Reviewed by Cursor Bugbot for commit 09c5416. Bugbot is set up for automated code reviews on this repo. Configure here.

Attach a permanent error listener before connecting so the timeout
event cannot surface as an uncaught error during the handshake or in
the idle gap before the test body attached its listener. Drive the
test off the first emitted error via a one-shot promise gate instead
of a fixed-duration `setTimeout`, and use `reconnectStrategy: false`
+ `disableClientSetup` to make the run deterministic.

Bump the timeout from 50ms to 200ms to give the localhost handshake
slack on slow CI.
The test pairs `connectTimeout: 1` with a 1ms event-loop block to force
the socket's idle timer to fire. On a fast host the kernel finishes the
TCP handshake almost instantly and the two 1ms durations race: the block
can exit just as (or before) the timer expires, so the `connect` event
is processed before the timeout callback and the connection succeeds.
The test then fails with `Missing expected rejection`.

Block for 50ms instead so the timer is guaranteed to have expired —
and its callback to be queued ahead of the `connect` event — by the
time the event loop runs again.
The test forced the subscriber connection down with `CLIENT KILL` and
used `once(subscriber, 'error')` to wait for the resulting error. A
single forced reconnect can emit `error` more than once — the initial
SocketClosedUnexpectedlyError, plus any transient connect error while
the server tears the old connection down — and `once` removes its
listener after the first emit, so the next one surfaced as
"Uncaught Error: Socket closed unexpectedly".

Swap in a permanent error listener and use a one-shot promise gate to
await the first error deterministically. Subsequent errors during the
reconnect cycle are caught (and ignored) by the permanent listener.
@nkaradzhov nkaradzhov requested a review from PavelPashov May 29, 2026 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant