Fix parallel segment reload race on IndexLoadingConfig tier; reuse configs per tier and IndexLoadingConfig.copy()#18174
Conversation
…xLoadingConfig.copy() to avoid per-segment ZK fetches
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18174 +/- ##
============================================
+ Coverage 63.18% 63.51% +0.33%
- Complexity 1616 1627 +11
============================================
Files 3214 3244 +30
Lines 195838 197409 +1571
Branches 30251 30549 +298
============================================
+ Hits 123734 125379 +1645
+ Misses 62236 61976 -260
- Partials 9868 10054 +186
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Requesting review on this. Integration test failure seems unrelated to this PR - |
| _segmentReloadSemaphore.acquire(segmentName, _logger); | ||
| try { | ||
| reloadSegment(segmentDataManager, indexLoadingConfig, forceDownload); | ||
| reloadSegment(segmentDataManager, indexLoadingConfigTemplate.copy(), forceDownload); |
There was a problem hiding this comment.
If we have 100k of segments then we'll have as many copies of IndexLoadingConfig objects here, with only the segment tier aren't the same.
Segment tiers are expected to be only a few per server. Can we have, for example, a map from tier to index loading config so we only need to create as many copies as the amount of tiers?
There was a problem hiding this comment.
Addressed this by fetching table config once and building a Map<tierKey, IndexLoadingConfig> with one entry per distinct current segment tier (plus a shared template for realtime paths that only read TableConfig). So we create O(number of tiers) IndexLoadingConfig instances, not O(number of segments).
Since offline reload mutates IndexLoadingConfig (tier, setTableDataDir, etc.), so the same instance can’t be used concurrently without coordination. We synchronized on the per-tier config for immutable-segment reloads so only one reload runs at a time for a given tier on that shared object; different tiers can still reload in parallel. Realtime consuming segments don’t mutate the template, so they can safely share the single fetched template.
Problem
When multiple segments were reloaded in parallel (reloadAllSegments / batched reloadSegments), all tasks shared a single
IndexLoadingConfigfrom onefetchIndexLoadingConfig()call. Each reload path callssetSegmentTier(...)(and related updates) on that shared instance, so concurrent tasks could overwrite each other’s tier. With tier overrides in table config, that could apply the wrong preprocessing / loading settings (#18164).Fix
BaseTableDataManager.reloadSegmentDataManagers(private helper used by parallel reload): CallsfetchIndexLoadingConfig()once per batch. For immutable segments, builds aMap<tierKey, IndexLoadingConfig>with oneIndexLoadingConfigper distinct current segment tier (fromgetSegmentCurrentTier), usingtemplate.copy()plus the tier key. Offline reloads for the same tier use the same config undersynchronized (tierConfig)so mutations stay correct without one config per segment. Reloads for different tiers can still run in parallel.TableConfigfrom the template and do not mutate tier, so they reuse the single fetched template without that lock.IndexLoadingConfig.copy(): Returns a new instance with the same instance / table / schema references and a full snapshot of all mutable fields (including _readMode, _segmentTier, index-config caches, _dirty, etc.), so “copy” matches real semantics and supports the tier-keyed path without N ZK reads.Tests
IndexLoadingConfigTest.testCopyPreservesMutableStateAndIndependentTier: Asserts shared TableConfig / Schema, matching tableDataDir / readMode / initial tier, and that tier changes on the copy do not affect the original.Fixes #18164