Honor deletedSegmentsRetentionPeriod for REFRESH tables in segment lineage cleanup#18257
Closed
swaminathanmanish wants to merge 2 commits intoapache:masterfrom
Closed
Conversation
…neage cleanup manageSegmentLineageCleanupForTable called deleteSegments() without passing the table config, so REFRESH tables (which are skipped by manageRetentionForTable) always fell back to the cluster-level default staging retention (7 days), ignoring any per-table deletedSegmentsRetentionPeriod set in segmentsConfig. Pass the table's deletedSegmentsRetentionPeriod to deleteSegments so operators can control the staging window per REFRESH table, including setting "0d" for immediate deletion. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
90dccea to
1306e99
Compare
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18257 +/- ##
============================================
- Coverage 63.48% 63.47% -0.01%
Complexity 1627 1627
============================================
Files 3244 3244
Lines 197365 197366 +1
Branches 30540 30540
============================================
- Hits 125306 125287 -19
- Misses 62019 62037 +18
- Partials 10040 10042 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
REFRESH (non-APPEND) offline tables have their superseded segments deleted via
manageSegmentLineageCleanupForTable. This method calleddeleteSegments(tableNameWithType, segmentsToDelete)— the no-arg overload that ignoresdeletedSegmentsRetentionPeriodfrom the table config and always falls back to the cluster-level default (7 days).manageRetentionForTable(which does read per-table retention config) exits early for REFRESH tables:So even if an operator sets
deletedSegmentsRetentionPeriod: "0d"on a REFRESH table, the staging retention is silently ignored — segments always sit inDeleted_Segments/for the full cluster default before permanent deletion.Fix
Pass the table's
deletedSegmentsRetentionPeriodtodeleteSegmentsinmanageSegmentLineageCleanupForTable:When
deletedSegmentsRetentionPeriodis null (not configured), the existingdeleteSegments(String, List, String)overload falls through to re-read the table config and then the cluster default — identical behavior to before.Usage
To enable immediate deletion of superseded segments for a REFRESH table:
Backward Compatibility
No behavior change for tables that do not set
deletedSegmentsRetentionPeriod. APPEND and realtime tables are unaffected.