Skip to content

Fix parallel segment reload race on IndexLoadingConfig tier; reuse configs per tier and IndexLoadingConfig.copy()#18174

Open
rsrkpatwari1234 wants to merge 16 commits intoapache:masterfrom
rsrkpatwari1234:rsrkpatwari1234-issue-18164
Open

Fix parallel segment reload race on IndexLoadingConfig tier; reuse configs per tier and IndexLoadingConfig.copy()#18174
rsrkpatwari1234 wants to merge 16 commits intoapache:masterfrom
rsrkpatwari1234:rsrkpatwari1234-issue-18164

Conversation

@rsrkpatwari1234
Copy link
Copy Markdown
Contributor

@rsrkpatwari1234 rsrkpatwari1234 commented Apr 12, 2026

Problem

When multiple segments were reloaded in parallel (reloadAllSegments / batched reloadSegments), all tasks shared a single IndexLoadingConfig from one fetchIndexLoadingConfig() call. Each reload path calls setSegmentTier(...) (and related updates) on that shared instance, so concurrent tasks could overwrite each other’s tier. With tier overrides in table config, that could apply the wrong preprocessing / loading settings (#18164).

Fix

  • BaseTableDataManager.reloadSegmentDataManagers (private helper used by parallel reload): Calls fetchIndexLoadingConfig() once per batch. For immutable segments, builds a Map<tierKey, IndexLoadingConfig> with one IndexLoadingConfig per distinct current segment tier (from getSegmentCurrentTier), using template.copy() plus the tier key. Offline reloads for the same tier use the same config under synchronized (tierConfig) so mutations stay correct without one config per segment. Reloads for different tiers can still run in parallel.
  • Realtime consuming paths only read TableConfig from the template and do not mutate tier, so they reuse the single fetched template without that lock.
  • IndexLoadingConfig.copy(): Returns a new instance with the same instance / table / schema references and a full snapshot of all mutable fields (including _readMode, _segmentTier, index-config caches, _dirty, etc.), so “copy” matches real semantics and supports the tier-keyed path without N ZK reads.

Tests

IndexLoadingConfigTest.testCopyPreservesMutableStateAndIndependentTier: Asserts shared TableConfig / Schema, matching tableDataDir / readMode / initial tier, and that tier changes on the copy do not affect the original.

Fixes #18164

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 80.85106% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.51%. Comparing base (7e10a36) to head (3f6b895).
⚠️ Report is 49 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/core/data/manager/BaseTableDataManager.java 78.26% 1 Missing and 4 partials ⚠️
...local/segment/index/loader/IndexLoadingConfig.java 83.33% 0 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18174      +/-   ##
============================================
+ Coverage     63.18%   63.51%   +0.33%     
- Complexity     1616     1627      +11     
============================================
  Files          3214     3244      +30     
  Lines        195838   197409    +1571     
  Branches      30251    30549     +298     
============================================
+ Hits         123734   125379    +1645     
+ Misses        62236    61976     -260     
- Partials       9868    10054     +186     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.47% <80.85%> (+0.34%) ⬆️
java-21 63.48% <80.85%> (+0.32%) ⬆️
temurin 63.51% <80.85%> (+0.33%) ⬆️
unittests 63.50% <80.85%> (+0.33%) ⬆️
unittests1 55.53% <80.85%> (+0.14%) ⬆️
unittests2 34.96% <42.55%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rsrkpatwari1234
Copy link
Copy Markdown
Contributor Author

Requesting review on this. Integration test failure seems unrelated to this PR -

java.lang.AssertionError: [ExactlyOnce] Transaction markers were not propagated within 120s; committed records are not visible to read_committed consumers. read_committed=0, read_uncommitted=153636
	at org.apache.pinot.integration.tests.ExactlyOnceKafkaRealtimeClusterIntegrationTest.waitForCommittedRecordsVisible(ExactlyOnceKafkaRealtimeClusterIntegrationTest.java:181)

_segmentReloadSemaphore.acquire(segmentName, _logger);
try {
reloadSegment(segmentDataManager, indexLoadingConfig, forceDownload);
reloadSegment(segmentDataManager, indexLoadingConfigTemplate.copy(), forceDownload);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have 100k of segments then we'll have as many copies of IndexLoadingConfig objects here, with only the segment tier aren't the same.

Segment tiers are expected to be only a few per server. Can we have, for example, a map from tier to index loading config so we only need to create as many copies as the amount of tiers?

Copy link
Copy Markdown
Contributor Author

@rsrkpatwari1234 rsrkpatwari1234 Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this by fetching table config once and building a Map<tierKey, IndexLoadingConfig> with one entry per distinct current segment tier (plus a shared template for realtime paths that only read TableConfig). So we create O(number of tiers) IndexLoadingConfig instances, not O(number of segments).

Since offline reload mutates IndexLoadingConfig (tier, setTableDataDir, etc.), so the same instance can’t be used concurrently without coordination. We synchronized on the per-tier config for immutable-segment reloads so only one reload runs at a time for a given tier on that shared object; different tiers can still reload in parallel. Realtime consuming segments don’t mutate the template, so they can safely share the single fetched template.

@rsrkpatwari1234 rsrkpatwari1234 changed the title Fix parallel segment reload race on IndexLoadingConfig tier; add IndexLoadingConfig.copy() to avoid per-segment ZK fetches Fix parallel segment reload race on IndexLoadingConfig tier; reuse configs per tier and IndexLoadingConfig.copy() Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Potential Race Condition against Segment Tier during Reload All Segments

3 participants