Skip to content

[fix][client] Prevent duplicate ServiceUrlProvider initialization#25899

Open
oneby-wang wants to merge 1 commit into
apache:masterfrom
oneby-wang:service-url-provider-init-idempotency
Open

[fix][client] Prevent duplicate ServiceUrlProvider initialization#25899
oneby-wang wants to merge 1 commit into
apache:masterfrom
oneby-wang:service-url-provider-init-idempotency

Conversation

@oneby-wang
Copy link
Copy Markdown
Contributor

Motivation

#25892 fixed a flaky SameAuthParamsLookupAutoClusterFailoverTest by removing an extra manual failover.initialize(client) call from the test.

The root cause of that flakiness was duplicate initialization. PulsarClientBuilder.build() already initializes the configured ServiceUrlProvider through PulsarClientImpl, so calling initialize(client) again starts duplicate background checks for the same provider instance.

This is especially problematic for SameAuthParamsLookupAutoClusterFailover, because each initialize call creates a new broker-service-url-check EventLoopGroup. Multiple checker threads can then mutate the same failover state and produce subtle race conditions that are difficult to diagnose. AutoClusterFailover and ControlledClusterFailover have the same lifecycle risk: duplicate initialization can register duplicate scheduled tasks, and ControlledClusterFailover can also recreate its HTTP client without closing the previous one.

Modifications

  • Make AutoClusterFailover, ControlledClusterFailover, and SameAuthParamsLookupAutoClusterFailover fail fast when initialize(PulsarClient) is called more than once.
  • Remove the duplicate manual initialize call from SameAuthParamsLookupAutoClusterFailoverTest.testAutoClusterFailover because the provider is already initialized by PulsarClientBuilder.build().
  • Add dedicated tests that build a PulsarClient with each provider and then verify a second initialize(client) call throws IllegalStateException.

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

The threading model is affected only by preventing duplicate background failover check tasks from being registered for the same ServiceUrlProvider instance.

Copy link
Copy Markdown
Contributor

@void-ptr974 void-ptr974 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a lifecycle fix. If the same ServiceUrlProvider instance is reused to build a second client, the second build now fails with IllegalStateException, but the constructor failure path calls shutdown(), which unconditionally closes conf.getServiceUrlProvider(). That can close the provider still used by the first live client.

Example:

ServiceUrlProvider provider = AutoClusterFailover.builder()
        .primary(primary)
        .secondary(List.of(secondary))
        .failoverDelay(1, TimeUnit.SECONDS)
        .switchBackDelay(1, TimeUnit.SECONDS)
        .build();

PulsarClient client1 = PulsarClient.builder()
        .serviceUrlProvider(provider)
        .build();

PulsarClient.builder()
        .serviceUrlProvider(provider)
        .build(); // fails, then closes provider used by client1

Could we only close the provider on constructor failure if this PulsarClientImpl successfully initialized it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants