Skip to content

Add ORC and Avro file format support to Druid's Iceberg input source #19472

@Shekharrajak

Description

@Shekharrajak

Description

Component: extensions-contrib/druid-iceberg-extensions

Druid's Iceberg input source (druid-iceberg-extensions) currently only supports reading Iceberg tables stored in Parquet format.

IcebergNativeRecordReader hardcodes Parquet.read() + GenericParquetReaders for all reads:

Motivation

This was flaged while working on v2 spec support #19266 (comment)

dependecies : iceberg-orc, orc-core, iceberg-avro, and avro are absent

References:

• IcebergNativeRecordReader.java — current Parquet-only implementation
• IcebergFileTaskInputSource.java — serialisation boundary between coordinator and worker
• Iceberg API: org.apache.iceberg.data.GenericDeleteFilter (public, already on classpath)
• Iceberg API: org.apache.iceberg.data.orc.GenericOrcReader, org.apache.iceberg.data.avro.GenericAvroReader
• PR #19266 — added Iceberg V2 delete support (Parquet only); this is the follow-up

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions