Describe the bug
PR #21956 added the column_in_file_schema signal in ParquetSource::try_pushdown_sort, which routes plain-column sort requests through the Inexact branch when the source's output_ordering was stripped by validated_output_ordering() (files listed in wrong order on disk).
The Inexact branch of FileScanConfig::try_pushdown_sort calls rebuild_with_source with is_exact=false, which unconditionally strips output_ordering even when the post-sort file groups are non-overlapping and the declared ordering would re-validate. As a result, SortExec stays above the source for the canonical Phase 2 scenario (#21182) — even when stats-based file reorder restores a perfectly valid ordering.
Before #21956, these cases returned Unsupported and were upgraded to Exact via try_sort_file_groups_by_statistics (the fallback) — SortExec was eliminated.
Repro
See sort_pushdown.slt Test 6.1 — reversed_with_order_parquet with three out-of-order files: the comment says "SortExec eliminated" but the recorded plan keeps SortExec.
CREATE EXTERNAL TABLE reversed_with_order_parquet(id INT, value INT)
STORED AS PARQUET
LOCATION 'test_files/scratch/sort_pushdown/reversed/' -- [a_high(7-9), b_mid(4-6), c_low(1-3)] alphabetical
WITH ORDER (id ASC);
EXPLAIN SELECT * FROM reversed_with_order_parquet ORDER BY id ASC;
-- physical_plan
-- 01)SortExec: expr=[id@0 ASC NULLS LAST], preserve_partitioning=[false] ← should be gone
-- 02)--DataSourceExec: file_groups={1 group: [[c_low, b_mid, a_high]]}, ..., sort_order_for_reorder=[id@0 ASC NULLS LAST]
Expected
SortExec eliminated after Phase 2 reorder when:
- ordering is declared (
WITH ORDER or parquet sorting_columns metadata),
- post-sort file groups are non-overlapping,
- no NULLs in the sort columns of non-last files.
Related
Describe the bug
PR #21956 added the
column_in_file_schemasignal inParquetSource::try_pushdown_sort, which routes plain-column sort requests through theInexactbranch when the source'soutput_orderingwas stripped byvalidated_output_ordering()(files listed in wrong order on disk).The
Inexactbranch ofFileScanConfig::try_pushdown_sortcallsrebuild_with_sourcewithis_exact=false, which unconditionally stripsoutput_orderingeven when the post-sort file groups are non-overlapping and the declared ordering would re-validate. As a result,SortExecstays above the source for the canonical Phase 2 scenario (#21182) — even when stats-based file reorder restores a perfectly valid ordering.Before #21956, these cases returned
Unsupportedand were upgraded toExactviatry_sort_file_groups_by_statistics(the fallback) —SortExecwas eliminated.Repro
See
sort_pushdown.sltTest 6.1 —reversed_with_order_parquetwith three out-of-order files: the comment says "SortExec eliminated" but the recorded plan keepsSortExec.Expected
SortExeceliminated after Phase 2 reorder when:WITH ORDERor parquetsorting_columnsmetadata),Related
column_in_file_schema)