feat(iceberg): support order key by xxhZs · Pull Request #25202 · risingwavelabs/risingwave

xxhZs · 2026-03-30T09:48:21Z

Summary

This PR adds order_key support for Iceberg tables and ensures the configured sort order is carried through both file writing and Iceberg compaction metadata.

Main changes:

parse and validate order_key for Iceberg table creation
reject invalid expressions and unsupported/system columns early in frontend/connector validation
build Iceberg SortOrder from the configured key when creating the table
write sort_order_id into generated Iceberg data files so the sort order is visible in metadata
preserve the same sort-order metadata on compaction outputs
add e2e coverage for Iceberg engine, append-only Iceberg engine, and Iceberg sink

In practice, this means a table like:

WITH (order_key = 'v1 desc nulls last, id asc') ENGINE = ICEBERG

will materialize an Iceberg sort order in table metadata, and the produced data files can be observed from rw_iceberg_files.sort_order_id.

Validation

Validated locally with Iceberg engine + storage catalog.

Test case:

order_key = 'v1 desc nulls last, v2 asc nulls first, id desc'
inserted rows in 3 batches with FLUSH after each batch:
- (1, 100, 2, 'a'), (2, 100, 1, 'b')
- (3, 100, 1, 'c'), (4, 90, 2, 'd')
- (5, 90, 2, 'e'), (6, 90, 1, 'f')

Before compaction:

rw_iceberg_files showed 6 data files
all data files had non-null sort_order_id = 1

After starting a dedicated iceberg compactor and running:

VACUUM FULL t_order_multi;

After compaction:

rw_iceberg_files showed 1 data file
the compacted file still had non-null sort_order_id = 1

The compacted parquet file was read directly and its physical row order was:

(3, 100, 1, 'c')
(2, 100, 1, 'b')
(1, 100, 2, 'a')
(6, 90, 1, 'f')
(5, 90, 2, 'e')
(4, 90, 2, 'd')

This matches the configured multi-column order key:

v1 desc
v2 asc
id desc

Dependency

The compaction-side ordering behavior depends on the upstream change in:

nimtable/iceberg-compaction#124

This PR should wait for that upstream PR to be merged.

chenzl25 · 2026-03-30T11:57:37Z

src/connector/src/sink/iceberg/mod.rs

                let data_files = result
                    .into_iter()
                    .map(|f| {
+                        let f = apply_sort_order_id_to_data_file(f, table_sort_order_id)


I think we can't apply sort order key to the parquet file written by iceberg sink because there is no order guarantee in the streaming sink.

Copilot

Pull request overview

Adds order_key support for Iceberg table creation and sink file metadata, ensuring the configured sort order is propagated through table metadata and written data files (and preserved through compaction via updated compaction-core integration).

Changes:

Add order_key option parsing/validation (frontend + connector) and build Iceberg SortOrder during table auto-creation.
Write sort_order_id into produced Iceberg data files (visible via rw_iceberg_files) and update compaction integration configuration.
Add e2e coverage for Iceberg engine, append-only engine, and Iceberg sink.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/storage/src/hummock/compactor/iceberg_compaction/iceberg_compactor_runner.rs	Update compaction planning config API usage (`max_input_parallelism`).
src/storage/Cargo.toml	Bump `iceberg-compaction-core` git rev.
src/frontend/src/handler/create_table.rs	Parse/validate `order_key` during Iceberg engine table creation; forward option to sink and strip from source props.
src/connector/src/sink/iceberg/mod.rs	Add `order_key` to `IcebergConfig`, parse/validate it, build Iceberg `SortOrder`, and set `sort_order_id` on written data files; add unit tests.
e2e_test/iceberg/test_case/pure_slt/iceberg_sink.slt	Add sink coverage verifying `sort_order_id` is populated.
e2e_test/iceberg/test_case/pure_slt/iceberg_engine.slt	Add engine table coverage verifying `sort_order_id` is populated.
e2e_test/iceberg/test_case/pure_slt/iceberg_engine_append_only.slt	Add append-only engine coverage verifying `sort_order_id` is populated.
Cargo.toml	Bump `iceberg` / catalog crates git rev (aligned with compaction-core).
Cargo.lock	Lockfile updates for the bumped git dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T09:23:43Z

src/connector/src/sink/iceberg/mod.rs

    #[serde(default)]
    pub partition_by: Option<String>,

+    #[serde(default)]


IcebergConfig derives WithOptions and a new order_key field is added, but the checked-in auto-generated src/connector/with_options_sink.yaml does not include order_key (no matches for order_key in that file). Please regenerate and commit the updated YAML (via ./risedev generate-with-options) so CI/docs stay in sync with the Rust option definitions.

Suggested change

#[serde(default)]

#[serde(default)]

#[with_option(skip)]

Copilot · 2026-03-31T09:23:43Z

src/connector/src/sink/iceberg/mod.rs

+        let column = tokens[0];
+        let valid_column = Regex::new(r"^[A-Za-z_][A-Za-z0-9_]*$").unwrap();
+        if !valid_column.is_match(column) {
+            bail!(
+                "Invalid order key column `{column}`\nHINT: Only plain column names are supported in order_key"
+            );
+        }


parse_order_key_exprs recompiles Regex::new(r"^[A-Za-z_][A-Za-z0-9_]*$") for every item and uses unwrap(). Please precompile this regex once (e.g., static Lazy<Regex>) to avoid repeated allocations/CPU and remove the per-item unwrap() in the hot path.

Copilot · 2026-03-31T09:23:43Z

src/connector/src/sink/iceberg/mod.rs

+            validate_order_key_columns(
+                order_key,
+                param.columns.iter().map(|column| column.name.as_str()),
+            )
+            .context("invalid order_key")?;


In IcebergSink::new, validate_order_key_columns(...).context("invalid order_key")? returns an anyhow::Error and will be converted via From<anyhow::Error> into SinkError::Internal. For a user-provided WITH option this should be reported as SinkError::Config (with context) so invalid configuration is surfaced correctly instead of looking like an internal failure.

Suggested change

validate_order_key_columns(

order_key,

param.columns.iter().map(|column| column.name.as_str()),

)

.context("invalid order_key")?;

if let Err(e) = validate_order_key_columns(

order_key,

param.columns.iter().map(|column| column.name.as_str()),

)

.context("invalid order_key")

{

return Err(SinkError::Config(e));

}

chenzl25 · 2026-04-02T06:28:48Z

@xxhZs any updates?

xxhZs · 2026-04-07T08:52:34Z

can review againt @chenzl25

chenzl25

Generally LGTM

feat(iceberg): support order key

5fa5fcc

xxhZs requested a review from a team as a code owner March 30, 2026 09:48

xxhZs requested review from MrCroxx and removed request for a team March 30, 2026 09:48

github-actions bot added type/feature Type: New feature. ci/run-e2e-iceberg-tests labels Mar 30, 2026

xxhZs requested review from Li0k and chenzl25 March 30, 2026 09:49

chenzl25 reviewed Mar 30, 2026

View reviewed changes

chenzl25 requested a review from Copilot March 31, 2026 09:16

Copilot started reviewing on behalf of chenzl25 March 31, 2026 09:18 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

fix iceberg order key review issues

9675498

chenzl25 approved these changes Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(iceberg): support order key#25202

feat(iceberg): support order key#25202
xxhZs wants to merge 2 commits intomainfrom
xxh/add-sort-id-for-iceberg

xxhZs commented Mar 30, 2026 •

edited

Loading

Uh oh!

chenzl25 Mar 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

chenzl25 commented Apr 2, 2026

Uh oh!

xxhZs commented Apr 7, 2026

Uh oh!

chenzl25 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xxhZs commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Dependency

Uh oh!

chenzl25 Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

chenzl25 commented Apr 2, 2026

Uh oh!

xxhZs commented Apr 7, 2026

Uh oh!

chenzl25 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xxhZs commented Mar 30, 2026 •

edited

Loading