Skip to content

fix(sort-pushdown): restore SortExec elimination after stats-based file reorder#22493

Merged
zhuqi-lucas merged 9 commits into
apache:mainfrom
zhuqi-lucas:fix/sort-elimination-after-stats-reorder
May 24, 2026
Merged

fix(sort-pushdown): restore SortExec elimination after stats-based file reorder#22493
zhuqi-lucas merged 9 commits into
apache:mainfrom
zhuqi-lucas:fix/sort-elimination-after-stats-reorder

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

@zhuqi-lucas zhuqi-lucas commented May 24, 2026

Which issue does this PR close?

Regression introduced by #21956 — single-partition multi-file scans with
WITH ORDER (or inferred parquet sorting_columns metadata) plus files
listed out of order on disk no longer have their SortExec eliminated,
even when the stats-based file reorder produces a non-overlapping layout.
This was the central case PR #21182 was designed to fix.

Rationale for this change

In the Phase 2 sort pushdown design (#21182), when a file source returned
Unsupported because validated_output_ordering() stripped its declared
ordering, the outer FileScanConfig::try_pushdown_sort invoked a fallback
(try_sort_file_groups_by_statistics) that reordered files by min/max
stats, re-validated the ordering against the new file groups, and could
upgrade the result back to Exact — dropping the outer SortExec.

PR #21956 added the column_in_file_schema signal so that plain-column
sort requests would return Inexact (with sort_order_for_reorder set)
instead of Unsupported. That enabled runtime per-RG reorder, but it
also pulled the typical wrong-file-order case out of the
Unsupported-fallback re-validation path. The Inexact branch of
rebuild_with_source always strips output_ordering, so even when the
post-sort file groups became non-overlapping and the declared ordering
would re-validate, the outer wrapper returned Inexact and PushdownSort
kept the SortExec.

The SLT comment on test 6.1 (# … → SortExec eliminated) still describes
the pre-#21956 behaviour, but the recorded expectation was updated to
match the post-#21956 plan where SortExec stayed — a silent regression.

What changes are included in this PR?

rebuild_with_source

When is_exact=false but the stats-based file sort produced
all_non_overlapping=true, re-validate the declared output_ordering
against the new file groups. If ordering_satisfy passes, preserve
output_ordering instead of stripping it.

Outer FileScanConfig::try_pushdown_sort

On the Inexact branch, inspect config.output_ordering after rebuild
and return Exact if it survived. This restores the
Unsupported → upgrade Exact semantics, but on the Inexact branch the
plain-column path now follows.

Safety

  • When the source has no declared ordering (no WITH ORDER and no
    parquet sorting_columns metadata), self.output_ordering is empty,
    the re-validate produces no orderings, ordering_satisfy returns
    false, and SortExec stays. min/max stats alone never trigger an
    upgrade.
  • When all_non_overlapping=false (overlapping files post-sort),
    output_ordering is stripped exactly as before.

Tests

sort_pushdown.slt:

  • Tests 4.1 (parquet metadata), 6.1 (WITH ORDER ASC),
    8.1 (WITH ORDER DESC with reverse), G.1, G.2 (multi-
    partition SPM + BufferExec) — expectations updated to match the
    restored SortExec-eliminated plans (matches the PR feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2) #21182 era plans
    and the SLT comments that were already in the file).
  • New Test 5b — files written without ORDER BY (no
    sorting_columns metadata) + external table without WITH ORDER:
    asserts that even when min/max stats happen to be non-overlapping,
    SortExec is kept.

Are these changes tested?

Yes — sort_pushdown.slt covers the affected paths comprehensively.
Full cargo test -p datafusion-sqllogictest --test sqllogictests and
cargo test -p datafusion-datasource --lib / -p datafusion-datasource-parquet --lib / -p datafusion-physical-optimizer
pass. cargo clippy -D warnings clean.

Are there any user-facing changes?

Yes — for the in-scope scenarios above, EXPLAIN plans now show the
SortExec removed and output_ordering set on DataSourceExec. Query
results are unaffected.

…le reorder

PR apache#21956 added a `column_in_file_schema` signal that routes plain-column
sort requests with stripped output_ordering into the Inexact branch of
`FileScanConfig::try_pushdown_sort`. The Inexact branch unconditionally
strips `output_ordering` even when the post-sort file groups have become
non-overlapping and the ordering would re-validate — so `SortExec` stayed
above the source where it had previously been eliminated.

Symptom: with `WITH ORDER (key ASC)` (or inferred parquet
`sorting_columns` metadata) plus files listed out of order on disk,
PushdownSort reordered the files but kept the SortExec, regressing the
behaviour PR apache#21182 introduced.

Fix: in `rebuild_with_source`, when `is_exact=false` but
`all_non_overlapping=true` after the stats-based sort, re-validate the
declared ordering against the new file groups. If `ordering_satisfy`
passes, keep `output_ordering` so the outer wrapper can upgrade Inexact
back to Exact. The outer `FileScanConfig::try_pushdown_sort` now also
inspects `output_ordering` on the Inexact branch and returns Exact when
it survived.

Safety:
  - When `self.output_ordering` is empty (no `WITH ORDER`, no
    `sorting_columns` metadata) the re-validate falls through, ordering
    is stripped, and SortExec is kept.
  - When `all_non_overlapping=false`, ordering is stripped as before.

SLT updates: `sort_pushdown.slt` tests 4.1, 6.1, 8.1, G.1, G.2 now
expect the SortExec to be eliminated (the behaviour `apache#21182` originally
delivered and SLT comments still describe). New `Test 5b` covers the
safety case: files written without `ORDER BY` (no sorting_columns) and
no `WITH ORDER` must keep SortExec even when stats happen to be
non-overlapping.
Copilot AI review requested due to automatic review settings May 24, 2026 11:01
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate labels May 24, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a regression in sort pushdown where SortExec was no longer eliminated for single-partition, multi-file scans after stats-based file reordering (especially for WITH ORDER / inferred parquet sorting_columns cases) by re-validating ordering after reorder and allowing Inexact to upgrade back to Exact when warranted.

Changes:

  • Re-validate declared output_ordering after stats-based file reordering when all_non_overlapping=true, preserving ordering when it now satisfies the requested sort.
  • In FileScanConfig::try_pushdown_sort, upgrade the Inexact branch to Exact when output_ordering survives rebuild (enabling SortExec elimination again).
  • Update sort_pushdown.slt expectations to reflect restored SortExec elimination and add a new “no declared ordering” safety test.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
datafusion/datasource/src/file_scan_config/sort_pushdown.rs Adds post-reorder re-validation to preserve output_ordering (enabling SortExec elimination) on previously-Inexact paths.
datafusion/datasource/src/file_scan_config/mod.rs Allows Inexact pushdown results to upgrade to Exact when rebuild preserves output_ordering.
datafusion/sqllogictest/test_files/sort_pushdown.slt Updates explain-plan expectations for restored elimination and adds a safety regression test for “no declared ordering”.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread datafusion/datasource/src/file_scan_config/sort_pushdown.rs Outdated
Comment thread datafusion/sqllogictest/test_files/sort_pushdown.slt
1. NULL safety guard
   In `rebuild_with_source`'s Inexact-branch re-validate path, mirror
   the existing `try_sort_file_groups_by_statistics` fallback by
   refusing to upgrade to Exact when any file in the (post-sort) file
   groups has NULLs in the sort columns. Otherwise NULLs sitting at the
   end of a non-last file (NULLS LAST) would land mid-stream after
   concatenation, breaking the ordering.

2. SLT test cleanup
   - Test 5b: add DROP TABLE statements for no_decl_low / no_decl_mid /
     no_decl_high / no_decl_parquet so the test is self-contained.
   - Test 5c (new): NULL-safety regression test — files with reversed
     filesystem naming so the Inexact branch fires, and the lower-id
     file contains NULLs in the sort column. Asserts SortExec stays
     and query results are correct.
@zhuqi-lucas zhuqi-lucas requested review from adriangb and alamb May 24, 2026 11:18
The doc-comment for FileScanConfig::try_pushdown_sort still described
the pre-fix Inexact branch ("SortExec kept, scan optimized"). Update
the diagram to reflect the new behaviour: rebuild_with_source can
upgrade back to Exact when post-sort file groups are non-overlapping,
the request re-validates, and no NULLs sit in the sort columns of
non-last files. Also note that the Unsupported fallback now mirrors
the same NULL check.
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

cc @adriangb @alamb

This restores SortExec elimination that #21956 accidentally regressed — plain-column wrong-order-file cases now go through Inexact and unconditionally lose output_ordering. Re-validating after rebuild_with_source's stats-sort (with the same NULL guard as the existing try_sort_file_groups_by_statistics fallback) restores the pre-#21956 behaviour. SLT tests added.

Thanks!

adriangb and others added 3 commits May 24, 2026 09:26
Replace the nested `if all_non_overlapping { if is_exact {..} else {..} }
else { false }` in `rebuild_with_source` with a `match (all_non_overlapping,
is_exact)`. No behaviour change — the three arms map one-to-one to the prior
branches and make the "overlap → drop", "exact → keep", "inexact →
re-validate" cases explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…xact upgrade

When `try_pushdown_sort` upgrades the `Inexact` result back to `Exact`
(post-sort file groups non-overlapping + declared ordering re-validates),
the `SortExec` is removed. But the source rebuilt from the `Inexact` result
still carried `sort_order_for_reorder` / `reverse_row_groups`, so the parquet
opener kept reordering row groups at runtime.

With the `SortExec` gone, that runtime reorder is no longer a harmless
optimisation — it is load-bearing for correctness, and it is wrong for DESC
requests. The opener sorts row groups ASC-by-min and then `reverse()`s them;
two row groups within a single file that share the same `min` but differ in
`max` get mis-ordered by the reverse. Example: a file declared `WITH ORDER
(id DESC)` containing `[10,8,8,8]` whose row groups are `[10,8]` and `[8,8]`
(both min 8) streams as `8,8,10,8`. Previously the `SortExec` re-sorted that;
once eliminated, the wrong order is the final result.

Because the upgrade only fires when each file's declared ordering re-validates
against the non-overlapping post-sort file groups, reading the files in their
natural (declared-sorted) order already yields the requested ordering — no
runtime reorder is needed. Restore the original, hint-free source on upgrade,
matching the `Unsupported → Exact` path which also reads files naturally.

Existing tests 4.1 / 6.1 / 8.1 / G.1 / G.2 lose the now-absent
`sort_order_for_reorder` / `reverse_row_groups` from their `DataSourceExec`
plans; data results are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add Test 8b: a DESC scan over a file whose row groups share the same min
value (`[10,8,8,8]` → row groups `[10,8]` and `[8,8]`), listed out of order
on disk so the Inexact→Exact upgrade fires and SortExec is eliminated.

Without the fix this returns `8,8,10,8,...` (id=10 in third position): the
opener reorders row groups ASC-by-min then reverses, mis-ordering the two
min=8 row groups, and the eliminated SortExec no longer masks it. The test
asserts both the clean plan (no runtime reorder hints) and the correct
DESC result.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(sort-pushdown): drop runtime row-group reorder hints on Inexact→Exact upgrade (correctness)
Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great thank you!

Comment thread datafusion/datasource/src/file_scan_config/sort_pushdown.rs Outdated
Comment on lines +990 to +994
if config.output_ordering.is_empty() {
Ok(SortOrderPushdownResult::Inexact {
inner: Arc::new(config),
})
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can use some sort of early return / early break to avoid the indentation for the else branch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @adriangb , addressed in latest PR.

- adriangb r3294820727: flatten the if/else in try_pushdown_sort
  Inexact arm into an early-return guard clause so the Exact-upgrade
  block sits at one less level of indentation
- typos CI: 'mis-orders' tripped the spell checker on leading 'mis';
  reword the comment as 'reorders ... incorrectly' (no behaviour change)
@zhuqi-lucas zhuqi-lucas added this pull request to the merge queue May 24, 2026
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

Thanks @adriangb for review, merged to main now, will submit a PR target to branch 54.

Merged via the queue into apache:main with commit 94c58d0 May 24, 2026
57 of 58 checks passed
@zhuqi-lucas zhuqi-lucas deleted the fix/sort-elimination-after-stats-reorder branch May 24, 2026 15:40
zhuqi-lucas added a commit that referenced this pull request May 24, 2026
…ats-based file reorder (#22501)

## Which issue does this PR close?

Cherry-pick of #22493 onto `branch-54`.

## Rationale for this change

`branch-54` includes #21956 (`feat: globally reorder files and row
groups by statistics for TopK queries`), which introduced a regression:
for plain-column, multi-file scans where the on-disk file order does not
match the declared sort order, `SortExec` was no longer eliminated even
when stats-based reorder produced non-overlapping file groups whose
declared ordering re-validated.

#22493 restores the pre-#21956 sort-elimination behaviour by
re-validating `output_ordering` after `rebuild_with_source` reorders
files, and (per @adriangb's correctness follow-up) restoring the
original hint-free `file_source` on the Inexact→Exact upgrade so
leftover `reverse_row_groups` / `sort_order_for_reorder` hints don't
mis-order row groups within a single file once the `SortExec` safety net
is gone.

## What changes are included in this PR?

Straight cherry-pick of merge commit `94c58d086`. Includes:

- `FileScanConfig::try_pushdown_sort` Inexact arm: re-validate, upgrade
to Exact (with file_source restore), guard with NULL safety +
early-return
- `rebuild_with_source`: `match (all_non_overlapping, is_exact)`
decision table for keep_ordering
- SLT updates restoring `SortExec` elimination expectations + Tests
5b/5c/8b for the NULL-safety and same-min row-group edge cases

## Are these changes tested?

Cherry-picked cleanly (auto-merge in `sort_pushdown.rs`).
`cargo build -p datafusion-datasource` — passes.
`cargo test -p datafusion-sqllogictest --test sqllogictests --
sort_pushdown` — passes.

## Are there any user-facing changes?

Same as #22493: plain-column wrong-order-files cases regain SortExec
elimination when files happen to be non-overlapping by statistics. No
new API.

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zhuqi-lucas added a commit to zhuqi-lucas/arrow-datafusion that referenced this pull request May 25, 2026
…ange detection

apache#22460 added a consumer-side DynamicFilterTracker that subscribes once
to every still-incomplete DynamicFilterPhysicalExpr inside a predicate
and answers "has it changed?" with a single atomic load per filter —
no tree walk on every check.

Three call sites on this branch were polling the predicate's
snapshot_generation on every batch / row-group boundary; switch them
over:

- RowGroupPruner (push_decoder.rs): owns a DynamicFilterTracking and
  rebuilds its PruningPredicate only when tracker.changed() reports an
  update. Falls back to a single up-front build for Static / AllComplete
  predicates.
- Parquet opener force_per_rg_runs gate: "dynamic AND not yet complete"
  is exactly the Watching variant of DynamicFilterTracking — match on
  that instead of is_dynamic_physical_expr + is_filter_complete.
- ParquetSource fmt_extra marker (dynamic_rg_pruning=eligible): use
  DynamicFilterTracking::classify(...).contains_dynamic_filter() so the
  deprecated is_dynamic_physical_expr can go.

is_filter_complete and its unit tests in opener/mod.rs are removed —
the semantics they asserted are now covered by DynamicFilterTracking
tests in datafusion_physical_expr::expressions::dynamic_filters.

SLT expectations for sort_pushdown.slt are also brought in line with
upstream main (the SortExec-elimination changes from apache#22493 landed
between this branch's last push and now).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SortExec elimination regression for wrong-order multi-file scans with WITH ORDER / sorting_columns metadata (regression from #21956)

3 participants