Skip to content

Pong responses discarded under high load, causing broker to kill connections #408

@darinspivey

Description

@darinspivey

Problem

The outbound channel (bounded, default capacity 100) is shared between all connection traffic — producer sends, acks, flow permits, lookup requests, and pong responses to broker keepalive pings. Under high throughput, this channel fills due to TCP backpressure. When full, try_send for pong silently discards the response and logs an error:

ERROR pulsar::connection: failed to send pong: sending into a full channel

Since the broker sends keepalive pings every 30 seconds and expects a pong response, a discarded pong can cause the broker to consider the connection dead and close it. The client then reconnects and the cycle repeats, causing repeated connection churn under sustained load.

The bug was introduced in #312 (feat: Replace unbounded channel with bounded) where pong was converted from unbounded_send() (which cannot fail) to try_send() on the shared bounded channel. The subsequent fix in #319 fix: Block the sending of control messages) converted all other control messages to blocking .send().await but missed pong.

Suggested Fix

Give pong its own dedicated bounded(1) channel, separate from the main outbound channel. The sink writer drains the pong channel ahead of the main channel via select_biased!, ensuring pong responses are prioritized to the socket regardless of outbound traffic volume.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions