Skip to content

feat(optimizer): support index selection for seletive backfill#25207

Open
chenzl25 wants to merge 3 commits intodylan/support_pk_prefix_for_snapshot_backfillfrom
dylan/support_index_selection_for_selective_backfill
Open

feat(optimizer): support index selection for seletive backfill#25207
chenzl25 wants to merge 3 commits intodylan/support_pk_prefix_for_snapshot_backfillfrom
dylan/support_index_selection_for_selective_backfill

Conversation

@chenzl25
Copy link
Copy Markdown
Contributor

@chenzl25 chenzl25 commented Mar 31, 2026

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's intention?

This PR implements streaming index selection optimizations for snapshot backfill operations. The changes introduce two key optimizations:

1. Covering Index Selection for Backfill

  • Adds a new StreamingIndexSelectionRule that selects the lowest-cost covering index during snapshot backfill
  • Unlike batch index selection, this only considers covering indexes since streaming backfill cannot perform lookup joins
  • The rule estimates costs and picks the most efficient index that covers all required columns

2. IN Predicate Expansion

  • Expands IN predicates (e.g., WHERE a IN (1, 2, 3)) into a LogicalUnion of separate LogicalScan nodes
  • Each branch gets a single equality predicate (e.g., a = 1, a = 2, a = 3) for better scan range optimization
  • This enables more efficient primary key range scans and proper post-backfill data routing

Implementation Details:

  • The optimization is triggered during logical_rewrite_for_stream for snapshot backfill operations
  • Added RewriteStreamContext to pass backfill type information through the rewrite process
  • Extended test coverage with new end-to-end tests for IN predicates and covering index scenarios
  • Added planner tests to verify the generated stream plans produce the expected StreamUnion structures

The changes ensure that materialized view backfill operations can leverage indexes more effectively and handle IN predicates with optimal scan patterns.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

Improved performance of materialized view backfill operations through automatic index selection and IN predicate optimization. When creating materialized views with filtered queries, RisingWave now automatically selects the most efficient covering index and optimizes IN predicates by splitting them into parallel scan operations. This results in faster backfill completion times, especially for queries with selective predicates on indexed columns.

Copy link
Copy Markdown
Contributor Author

chenzl25 commented Mar 31, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@chenzl25 chenzl25 changed the title support index selection feat(optimizer): support index selection Mar 31, 2026
@github-actions github-actions bot added type/feature Type: New feature. and removed Invalid PR Title labels Mar 31, 2026
@chenzl25 chenzl25 changed the title feat(optimizer): support index selection feat(optimizer): support index selection for seletive backfill Mar 31, 2026
@chenzl25 chenzl25 marked this pull request as ready for review March 31, 2026 03:46
@chenzl25 chenzl25 requested review from Copilot, wenym1 and yuhao-su March 31, 2026 03:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a streaming-only optimizer rewrite to improve snapshot backfill performance and routing correctness by (1) selecting a lowest-cost covering index for backfill scans and (2) expanding eligible IN (...) predicates into a union of per-value scans.

Changes:

  • Introduces StreamingIndexSelectionRule (covering-index choice + INLogicalUnion of scans).
  • Threads snapshot backfill type into logical_rewrite_for_stream via RewriteStreamContext.
  • Extends planner tests and e2e backfill tests to validate union-shaped plans and covering-index usage.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/frontend/src/optimizer/rule/streaming_index_selection_rule.rs New streaming backfill rewrite rule (covering index selection + IN expansion).
src/frontend/src/optimizer/rule/mod.rs Wires the new rule into the optimizer rule module exports.
src/frontend/src/optimizer/rule/index_selection_rule.rs Exposes cost-estimation helpers/types for reuse by the streaming rule.
src/frontend/src/optimizer/plan_node/logical_scan.rs Applies streaming rewrite during logical_rewrite_for_stream for snapshot backfill; exposes clone_with_predicate.
src/frontend/src/optimizer/plan_node/convert.rs Extends RewriteStreamContext with optional BackfillType.
src/frontend/src/optimizer/mod.rs Passes backfill type into logical_rewrite_for_stream.
src/frontend/planner_test/tests/testdata/input/backfill.yaml Adds planner test inputs for IN expansion and covering index backfill selection.
src/frontend/planner_test/tests/testdata/output/backfill.yaml Adds expected stream plans (including StreamUnion) for new cases.
e2e_test/backfill/snapshot_backfill/pk_predicate_pushdown.slt Adds e2e coverage for IN predicate behavior and covering index usage during snapshot backfill.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/feature Type: New feature.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants