Skip to content

Releases: apache/datafusion-comet

0.16.0

20 May 15:42

Choose a tag to compare

0.16.0 Pre-release
Pre-release

DataFusion Comet 0.16.0 Changelog

This release consists of 127 commits from 17 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: report task output metrics in Spark UI #3999 (0lai0)
  • fix: cast to and from timestamp_ntz #4008 (parthchandra)
  • fix: support to_json on Spark 4.0 #4036 (andygrove)
  • fix: enable arrays_overlap #3901 (kazuyukitanimura)
  • fix: Iceberg reflection for current() on TableOperations hierarchy #3895 (karuppayya)
  • fix: fall back to Spark for shuffle/sort/aggregate on non-default collated strings [Spark 4] #4035 (andygrove)
  • fix: scalar subquery pushdown and reuse for CometNativeScanExec (SPARK-43402) #4053 (mbutrovich)
  • fix: fall back for shredded Variant scans on Spark 4.0 #4084 (andygrove)
  • fix: enable Spark 4 SQL tests previously ignored for issues #3313 and #3314 #4092 (andygrove)
  • fix: fall back to Spark for hash join and sort-merge join on non-default collated string keys [Spark 4] #4095 (0lai0)
  • fix: reject string/binary read as numeric in native_datafusion scan #4091 (andygrove)
  • fix: reject incompatible decimal precision/scale in native_datafusion scan #4090 (andygrove)
  • fix: throw SchemaColumnConvertNotSupportedException from native_datafusion schema mismatch #4117 (andygrove)
  • fix: substring with negative start index #4017 (kazuyukitanimura)
  • fix: honor strictFloatingPoint in RangePartitioning #4167 (0lai0)
  • fix: [Spark 4.1.1] preserve stored allowDecimalPrecisionLoss in DecimalPrecision rule #4179 (andygrove)
  • fix: [Spark 4.1.1] preserve parent struct nullness when all requested fields missing in Parquet #4190 (andygrove)
  • fix: support Spark 4.1 BloomFilter V2 format and bit-scattering #4196 (andygrove)
  • fix: JNI local reference cleanup in JVMClasses::with_env #4225 (0lai0)
  • fix: broadcast exchange bypasses AQE partition coalescing #4163 (andygrove)
  • fix: resolve Scala compiler warnings for auto-tupling and bare try #4227 (andygrove)
  • fix: [Spark 4.1] preserve union output partitioning in CometUnionExec #4207 (andygrove)
  • fix: re-enable tests skipped for Spark 4.1 (issue #4098) #4253 (andygrove)
  • fix: cargo clean before release build to avoid stale native libs #4257 (andygrove)

Performance related:

  • perf: avoid redundant columnar shuffle when both parent and child are non-Comet #4010 (andygrove)
  • perf: reduce per-node allocations in to_native_metric_node #4075 (andygrove)

Implemented enhancements:

  • feat: enable native Iceberg reader by default #3819 (andygrove)
  • feat: support collect_set #3954 (comphead)
  • feat: non-AQE DPP for native Parquet scans, broadcast exchange reuse for DPP subqueries #4011 (mbutrovich)
  • feat: add support for array_position expression #3172 (andygrove)
  • feat: Cast string to timestamp_ntz #4034 (parthchandra)
  • feat: Add TimestampNTZType support for unix_timestamp #4039 (parthchandra)
  • feat: fix array_compact for Spark 4.0 and correct return type metadata #3796 (andygrove)
  • feat: task-level input metrics (bytesRead) for Iceberg native scan #4128 (mbutrovich)
  • feat: add MapSort expression support for Spark 4.0 #4076 (andygrove)
  • feat: Support Spark expression str_to_map #3654 (unknowntpo)
  • feat: add support for timestamp_seconds expression #3146 (andygrove)
  • feat: add config to gate converting Spark shuffle to Comet shuffle when child is non-Comet plan #4166 (andygrove)
  • feat: AQE DPP for native Parquet scans with broadcast reuse #4112 (mbutrovich)
  • feat: support regular BuildRight+LeftAnti hash join #4073 (viirya)
  • feat: add bug-triage Claude skill #4109 (andygrove)
  • feat: support PartialMerge aggregation mode #4003 (comphead)
  • feat: add encode time tracking for shuffle operations #4068 (0lai0)
  • feat: Add support for Spark ToDegrees and ToRadians math expressions #3786 (rafafrdz)
  • feat: Add support for Spark Acosh, Asinh, Atanh math expressions #3787 (rafafrdz)
  • feat: Add support for Spark Cbrt math expression #3788 (rafafrdz)
  • feat: Add support for Spark Pi math expression #3789 (rafafrdz)
  • feat: support Parquet field ID matching in native_datafusion scan #4216 (mbutrovich)
  • feat: support AQE DPP broadcast reuse for Iceberg native scans #4215 (mbutrovich)
  • feat: add support for url_encode, url_decode, and try_url_decode #4231 (parthchandra)
  • feat: support TimestampType join keys in SortMergeJoin #3986 (andygrove)

Documentation updates:

  • docs: Add changelog for 0.15.0 #4000 (andygrove)
  • docs: Update README and benchmark results for 0.15.0 release #3995 (andygrove)
  • docs: fix errors in benchmark pages #4001 (andygrove)
  • docs: split compatibility guide into multiple pages #4055 (andygrove)
  • docs: Generate expression compatibility docs from code #4057 (andygrove)
  • doc: update documentation for cast and datetime functions #4058 (parthchandra)
  • docs: add compatibility documentation to all expressions #4067 (andygrove)
  • docs: rename SQL File Tests to Comet SQL Tests #4108 (andygrove)
  • docs: add Understanding Comet Plans user guide page #4086 (andygrove)
  • docs: support conditional content for snapshot vs release builds #4030 (andygrove)
  • docs: update Spark version support and add version compatibility page #4138 (andygrove)
  • docs: improve review skill and contributor guide for serde patterns #4132 (andygrove)
  • docs: Fix errors in list of supported Spark versions #4141 (andygrove)
  • docs: Update roadmap in contributor guide #4144 (andygrove)
  • docs: add implement-comet-expression Claude skill #4158 (andygrove)
  • docs: add roadmap items for spillable hash join, UDF support, memory management, and 1.0.0 #4171 (andygrove)
  • docs: start Spark 4.1 known-limitations section, seeded with #4199 #4202 (andygrove)
  • docs: document Spark 4 IntelliJ setup #4198 (yuboxx)
  • docs: refresh Gluten comparison with ANSI, Spark 4, and Iceberg coverage #4169 (andygrove)
  • docs: check off 53 implemented expressions in support doc #4147 (andygrove)
  • docs: replace p...
Read more

0.15.0

20 May 15:41
63316d8

Choose a tag to compare

0.15.0 Pre-release
Pre-release

DataFusion Comet 0.15.0 Changelog

This release consists of 142 commits from 19 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: enable native_datafusion Spark SQL tests previously ignored in #3315 #3696 (andygrove)
  • fix: route file-not-found errors through SparkError JSON path #3699 (andygrove)
  • fix: fall back from native_datafusion for duplicate fields in case-insensitive mode #3687 (andygrove)
  • fix: enable more Spark SQL tests for native_datafusion (DynamicPartitionPruningSuite / ExplainSuite) #3694 (andygrove)
  • fix: Correct GetArrayItem null handling for dynamic indices and re-enable native execution #3709 (0lai0)
  • fix: enable native_datafusion Spark SQL tests for #3320, #3401, #3719 #3718 (andygrove)
  • fix: Native engine crashes on literal DateTrunc and TimestampTrunc #3668 (0lai0)
  • fix: Use the loaded Comet extension too (Spark 3.5.8) #3707 (martin-g)
  • fix: Use thread context classloader for Iceberg class loading #3738 (karuppayya)
  • fix: disable ANSI mode in benchmarks to avoid exceptions on invalid input #3750 (parthchandra)
  • fix: fix string to timestamp cast for UTC timestamps #3656 (parthchandra)
  • fix: native error message not propagated to SparkException on empty errorClass #3727 (manuzhang)
  • fix: add timezone and special formats support for cast string to timestamp #3730 (parthchandra)
  • fix: handle inf/-inf/nan in ShimSparkErrorConverter cast overflow #3768 (manuzhang)
  • fix: handle scalar decimal value overflow correctly in ANSI mode #3803 (parthchandra)
  • fix: correct array_append return type and mark as Compatible #3795 (andygrove)
  • fix: remove broken directBuffer feature for parquet reads #3814 (andygrove)
  • fix: remove unnecessary IgnoreCometNativeDataFusion tags from 3.5.8 diff #3831 (andygrove)
  • fix: query tolerance= in SQL file tests now also asserts Comet native execution #3797 (andygrove)
  • fix: include scan impl in PR Linux artifact names #3853 (manuzhang)
  • fix: correct invalid Option.contains assertion in cast test #3851 (manuzhang)
  • fix: native_datafusion: case-insensitive mode doesn't detect duplicate/ambiguous Parquet fields #3808 (vaibhawvipul)
  • fix: cache object stores and bucket regions to reduce DNS query volume #3802 (andygrove)
  • fix: skip Comet columnar shuffle for stages with DPP scans #3879 (andygrove)
  • fix: Native_datafusion reports correct files and bytes scanned #3798 (0lai0)
  • fix: address clippy collapsible_match warnings #3863 (manuzhang)
  • fix: parameterize file count in Native_datafusion metrics test #3896 (0lai0)
  • fix: Make cast string to timestamp compatible with Spark #3884 (parthchandra)
  • fix: add EmptySchemaShufflePartitioner and test from #3858 #3893 (mbutrovich)
  • fix: use min instead of max when capping write buffer size to Int range #3914 (andygrove)
  • fix: Update TPC-DS q36a golden file for Spark 4.0 decimal UNION widening change #3915 (parthchandra)
  • fix: audit array_insert expression for correctness and test coverage #3890 (andygrove)
  • fix: handle ambiguous and non-existent local times #3865 (matthewalex4)
  • fix: improve tracing feature #3688 (andygrove)
  • fix: make tan and atan2 compatible #3849 (kazuyukitanimura)
  • fix: checkSparkAnswer displays incorrect labels #3927 (parthchandra)
  • fix: support full-width and null characters, and negative scale in string to decimal #3922 (parthchandra)
  • fix: enable Corr #3892 (kazuyukitanimura)
  • fix: array to array cast #2897 (manuzhang)
  • fix: exclude tpcds-plan-stability extended.txt files from rat license check #3964 (andygrove)
  • fix: use UTC for Arrow schema timezone in SparkToColumnar conversions #3878 (andygrove)
  • fix: remove spurious .flatten call that garbled SortMergeJoin fallback messages #3968 (andygrove)
  • fix: Add legacy mode handling to cast Decimal to String #3939 (parthchandra)
  • fix: improve test coverage for decimal to primitive type casts #3948 (parthchandra)
  • fix: fix decimal div and add tests #3952 (parthchandra)
  • fix: make shuffle fallback decisions sticky across planning passes #3982 (andygrove)

Performance related:

  • perf: Coalesce broadcast exchange batches before broadcasting #3703 (mbutrovich)
  • perf: stop using FFI in native shuffle read path #3731 (andygrove)
  • perf: Enable native c2r for more queries #3764 (andygrove)
  • perf: Mark more operators as FFI safe to avoid deep copies #3765 (andygrove)
  • perf: remove BufReader wrapper when copying spill files to shuffle output #3861 (andygrove)
  • fix: share unified memory pools across native execution contexts within a task #3924 (andygrove)

Implemented enhancements:

  • feat: Add PR review skill for Comet expression reviews #3711 (andygrove)
  • feat: add sort_array benchmark #3758 (grorge123)
  • feat: Support Spark expression days #3746 (0lai0)
  • feat: expose comet metrics through Sparks external monitoring system #3708 (coderfender)
  • feat: support SQL aggregate FILTER (WHERE ...) clause in native execution #3835 (viirya)
  • feat: Implement CRC32C algorithm #3822 (snmvaughan)
  • feat: add audit-comet-expression Claude Code skill #3793 (andygrove)
  • feat: enable native_datafusion scan in auto mode #3781 (andygrove)
  • feat: support LEAD and LAG window functions with IGNORE NULLS #3876 (viirya)
  • feat: add standalone shuffle benchmark tool #3752 (andygrove)
  • feat: Mark array_compact as Compatible and improve test coverage #3889 (andygrove)
  • feat: add native support for get_json_object expression #3747 (andygrove)
  • feat: Support Spark expression hours #3804 (0lai0)
  • feat: add support for date_from_unix_date expression #3144 (andygrove)
  • feat: support spark bin function #3928 (kazantsev-maksim)
  • feat: support sort_array expression #3706 (grorge123)

Documentation updates:

  • docs: Add some .lldbint configurations for debugging document #3686 (wForget)
  • docs: document Iceberg Spark tests in contributor guide #3777 (mbutrovich)
  • docs: document negative zero cast-to-string incompatibility [#3811](https://github.com/...
Read more

0.14.1

20 May 15:40
081617b

Choose a tag to compare

0.14.1 Pre-release
Pre-release

DataFusion Comet 0.14.1 Changelog

This release consists of 5 commits from 1 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: [branch-0.14] backport #3802 - cache object stores and bucket regions to reduce DNS query volume #3935 (andygrove)
  • fix: [branch-0.14] backport #3924 - share unified memory pools across native execution contexts #3938 (andygrove)
  • fix: [branch-0.14] backport #3879 - skip Comet columnar shuffle for stages with DPP scans #3934 (andygrove)
  • fix: [branch-0.14] backport #3914 - use min instead of max when capping write buffer size to Int range #3936 (andygrove)
  • fix: [branch-0.14] backport #3865 - handle ambiguous and non-existent local times #3937 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

     5	Andy Grove

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

0.14.0

19 Mar 22:24
6211315

Choose a tag to compare

0.14.0 Pre-release
Pre-release

DataFusion Comet 0.14.0 Changelog

This release consists of 189 commits from 21 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: [iceberg] Fall back on dynamicpruning expressions for CometIcebergNativeScan #3335 (mbutrovich)
  • fix: [iceberg] Disable native c2r by default #3348 (andygrove)
  • fix: Fix space() with negative input #3347 (hsiang-c)
  • fix: respect scan impl config for v2 scan #3357 (andygrove)
  • fix: fix memory safety issue in native c2r #3367 (andygrove)
  • fix: preserve partitioning in CometNativeScanExec for bucketed scans #3392 (andygrove)
  • fix: unignore row index Spark SQL tests for native_datafusion #3414 (andygrove)
  • fix: fall back to Spark when Parquet field ID matching is enabled in native_datafusion #3415 (andygrove)
  • fix: Expose bucketing information from CometNativeScanExec #3437 (andygrove)
  • fix: support scalar processing for space function #3408 (kazantsev-maksim)
  • fix: Revert "perf: Remove mutable buffers from scan partition/missing columns (#3411)" [iceberg] #3486 (mbutrovich)
  • fix: unignore input_file_name Spark SQL tests for native_datafusion #3458 (andygrove)
  • fix: add scalar support for bit_count expression #3361 (hsiang-c)
  • fix: Support concat_ws with literal NULL separator #3542 (0lai0)
  • fix: handle type mismatches in native c2r conversion #3583 (andygrove)
  • fix: disable native C2R for legacy Iceberg scans [iceberg] #3663 (mbutrovich)
  • fix: resolve Miri UB in null struct field test, re-enable Miri on PRs #3669 (andygrove)
  • fix: Support on all-literal RLIKE expression #3647 (0lai0)
  • fix: Fix scan metrics test to run with both native_datafusion and native_iceberg_compat #3690 (andygrove)

Performance related:

  • perf: refactor sum int with specialized implementations for each eval_mode #3054 (andygrove)
  • perf: Optimize contains expression with SIMD-based scalar pattern sea… #2991 (Shekharrajak)
  • perf: Add batch coalescing in BufBatchWriter to reduce IPC schema overhead #3441 (andygrove)
  • perf: Use native_datafusion scan in benchmark scripts (6% faster for TPC-H) #3460 (andygrove)
  • perf: Remove mutable buffers from scan partition/missing columns #3411 (andygrove)
  • perf: [iceberg] Single-pass FileScanTask validation #3443 (mbutrovich)
  • perf: Improve benchmarks for native row-to-columnar used by JVM shuffle #3290 (andygrove)
  • perf: executePlan uses a channel to park executor task thread instead of yield_now() [iceberg] #3553 (mbutrovich)
  • perf: Initialize tokio runtime worker threads from spark.executor.cores #3555 (andygrove)
  • perf: Add Comet config for native Iceberg reader's data file concurrency [iceberg] #3584 (mbutrovich)
  • perf: reuse CometConf.COMET_TRACING_ENABLED, Native, NativeUtil in NativeBatchDecoderIterator #3627 (mbutrovich)
  • perf: Improve performance of native row-to-columnar transition used by JVM shuffle #3289 (andygrove)
  • perf: use aligned pointer reads for SparkUnsafeRow field accessors #3670 (andygrove)
  • perf: Optimize some decimal expressions #3619 (andygrove)

Implemented enhancements:

  • feat: Native columnar to row conversion (Phase 2) #3266 (andygrove)
  • feat: Enable native columnar-to-row by default #3299 (andygrove)
  • feat: add support for width_bucket expression #3273 (davidlghellin)
  • feat: Drop native_comet as a valid option for COMET_NATIVE_SCAN_IMPL config #3358 (andygrove)
  • feat: Support date to timestamp cast #3383 (coderfender)
  • feat: CometExecRDD supports per-partition plan data, reduce Iceberg native scan serialization, add DPP [iceberg] #3349 (mbutrovich)
  • feat: Support right expression #3207 (Shekharrajak)
  • feat: support map_contains_key expression #3369 (peterxcli)
  • feat: add support for make_date expression #3147 (andygrove)
  • feat: add support for next_day expression #3148 (andygrove)
  • feat: implement cast from whole numbers to binary format and bool to decimal #3083 (coderfender)
  • feat: Support for StringSplit #2772 (Shekharrajak)
  • feat: CometNativeScan per-partition plan serde #3511 (mbutrovich)
  • feat: Remove mutable buffers from scan partition/missing columns [iceberg] #3514 (andygrove)
  • feat: pass spark.comet.datafusion.* configs through to DataFusion session #3455 (andygrove)
  • feat: pass vended credentials to Iceberg native scan #3523 (tokoko)
  • feat: Cast date to Numeric (No Op) #3544 (coderfender)
  • feat: add support crc32 expression #3498 (rafafrdz)
  • feat: Support int to timestamp casts #3541 (coderfender)
  • feat(benchmarks): add async-profiler support to TPC benchmark scripts #3613 (andygrove)
  • feat: Cast numeric (non int) to timestamp #3559 (coderfender)
  • feat: [ANSI] Ansi sql error messages #3580 (parthchandra)
  • feat: enable debug assertions in CI profile, fix unaligned memory access bug #3652 (andygrove)
  • feat: Enable native c2r by default, add debug asserts #3649 (andygrove)
  • feat: support Spark luhn_check expression #3573 (n0r0shi)

Documentation updates:

  • docs: Add changelog for 0.13.0 #3260 (andygrove)
  • docs: fix bug in placement of prettier-ignore-end in generated docs #3287 (andygrove)
  • docs: Add contributor guide page for SQL file tests #3333 (andygrove)
  • docs: fix inaccurate claim about mutable buffers in parquet scan docs #3378 (andygrove)
  • docs: Improve documentation on maven usage for running tests #3370 (andygrove)
  • docs: move release process docs to contributor guide #3492 (andygrove)
  • docs: improve release process documentation #3508 (andygrove)
  • docs: update roadm...
Read more

0.13.0

30 Jan 00:30
fefdc26

Choose a tag to compare

0.13.0 Pre-release
Pre-release

DataFusion Comet 0.13.0 Changelog

This release consists of 169 commits from 15 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: NativeScan count assert firing for no reason #2850 (EmilyMatt)
  • fix: Correct link to tracing guide in CometConf #2866 (manuzhang)
  • fix: Fall back to Spark for MakeDecimal with unsupported input type #2815 (andygrove)
  • fix: Normalize s3 paths for PME key retriever #2874 (mbutrovich)
  • fix: modify CometNativeScan to generate the file partitions without instantiating RDD #2891 (mbutrovich)
  • fix: Modulus on decimal data type mismatch #2922 (andygrove)
  • fix: [iceberg] Mark nativeIcebergScanMetadata @transient #2930 (mbutrovich)
  • fix: enable cast tests for Spark 4.0 #2919 (manuzhang)
  • fix: Remove fallback for maps containing complex types #2943 (andygrove)
  • fix: CometShuffleManager hang by deferring SparkEnv access #3002 (Shekharrajak)
  • fix: format decimal to string when casting to short #2916 (manuzhang)
  • fix: [iceberg] reduce granularity of metrics updates in IcebergFileStream #3050 (mbutrovich)
  • fix: native shuffle now reports spill metrics correctly #3197 (andygrove)
  • fix: Prevent native write when input is not Arrow format #3227 (andygrove)
  • fix: Add JDK to Docker image for release build #3262 (hsiang-c)

Performance related:

  • perf: [iceberg] Deduplicate serialized metadata for Iceberg native scan #2933 (mbutrovich)
  • perf: Use await instead of block_on in native shuffle writer #2937 (mbutrovich)
  • perf: refactor executePlan to try to avoid constantly entering Tokio runtime #2938 (mbutrovich)
  • perf: Optimize lpad/rpad to remove unnecessary memory allocations per element #2963 (andygrove)
  • perf: Improve performance of normalize_nan #2999 (andygrove)
  • perf: Improve string expression microbenchmarks #3012 (andygrove)
  • perf: Improve date/time microbenchmarks to avoid redundant/duplicate benchmarks #3020 (andygrove)
  • perf: Improve aggregate expression microbenchmarks #3021 (andygrove)
  • perf: Improve conditional expression microbenchmarks #3024 (andygrove)
  • perf: Improve performance of date truncate #2997 (andygrove)
  • perf: Add microbenchmark for comparison expressions #3026 (andygrove)
  • perf: Implement more microbenchmarks for cast expressions #3031 (andygrove)
  • perf: Add microbenchmark for hash expressions #3028 (andygrove)
  • perf: Improve performance of CAST from string to int #3017 (coderfender)
  • perf: Improve criterion benchmarks for cast string to int #3049 (andygrove)
  • perf: Additional optimizations for cast from string to int #3048 (andygrove)
  • perf: set DataFusion session context's target_partitions to match Spark's spark.task.cpus #3062 (mbutrovich)
  • perf: don't busy-poll Tokio stream for plans without CometScan #3063 (mbutrovich)
  • perf: minor optimizations in process_sorted_row_partition #3059 (andygrove)
  • perf: optimize complex-type hash implementations #3140 (mbutrovich)
  • perf: [iceberg] Remove IcebergFileStream, use iceberg-rust's parallelization, bump iceberg-rust to latest, cache SchemaAdapter #3051 (mbutrovich)
  • perf: [iceberg] reduce nativeIcebergScanMetadata serialization points #3243 (mbutrovich)
  • perf: reduce GC pressure in protobuf serialization #3242 (andygrove)
  • perf: cache serialized query plans to avoid per-partition serialization #3246 (andygrove)
  • perf: [iceberg] Use protobuf instead of JSON to serialize Iceberg partition values #3247 (parthchandra)

Implemented enhancements:

  • feat: Add experimental support for native Parquet writes #2812 (andygrove)
  • feat: Partially implement file commit protocol for native Parquet writes #2828 (andygrove)
  • feat: CometNativeWriteExec support with native scan as a child #2839 (mbutrovich)
  • feat: Add support for explode and explode_outer for array inputs #2836 (andygrove)
  • feat: Support ANSI mode SUM (Decimal types) #2826 (coderfender)
  • feat: Add expression registry to native planner #2851 (andygrove)
  • feat: Implement native operator registry #2875 (andygrove)
  • feat: Improve fallback reporting for native_datafusion scan #2879 (andygrove)
  • feat: Enable bucket pruning with native_datafusion scans #2888 (mbutrovich)
  • feat: support_ansi-mode_aggregated_benchmarking #2901 (coderfender)
  • feat: [iceberg] REST catalog support for CometNativeIcebergScan #2895 (mbutrovich)
  • feat: [iceberg] Support session token in Iceberg Native scan #2913 (hsiang-c)
  • feat: Make shuffle writer buffer size configurable #2899 (andygrove)
  • feat: Add partial support for from_json #2934 (andygrove)
  • feat: Create benchmarks comet cast #2932 (coderfender)
  • feat: Support string decimal cast #2925 (coderfender)
  • feat: Remove unnecessary transition for native writes #2960 (comphead)
  • feat: Initial implementation of size for array inputs #2862 (andygrove)
  • feat: Support ANSI mode sum expr (int inputs) #2600 (coderfender)
  • feat: Support casting string float types #2835 (coderfender)
  • feat: Support ANSI mode avg expr (int inputs) #2817 (coderfender)
  • feat: Add support for remote Parquet HDFS writer with openDAL #2929 (comphead)
  • feat: Expand murmur3 hash support to complex types #3077 (andygrove)
  • feat: Comet Writer should respect object store settings #3042 (comphead)
  • feat: add support for unix_date expression #3141 (andygrove)
  • feat: add partial support for date_format expression #3201 (andygrove)
  • feat: add complex type support to native Parquet writer [#32...
Read more

0.12.0

01 Dec 16:23
6086438

Choose a tag to compare

0.12.0 Pre-release
Pre-release

DataFusion Comet 0.12.0 Changelog

This release consists of 105 commits from 13 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: Fix None.get in stringDecode when bin child cannot be converted #2606 (cfmcgrady)
  • fix: Update FuzzDataGenerator to produce dictionary-encoded string arrays & fix bugs that this exposes #2635 (andygrove)
  • fix: Fallback to Spark for lpad/rpad for unsupported arguments & fix negative length handling #2630 (andygrove)
  • fix: Mark SortOrder with floating-point as incompatible #2650 (andygrove)
  • fix: Fall back to Spark for trunc / date_trunc functions when format string is unsupported, or is not a literal value #2634 (andygrove)
  • fix: [native_datafusion] only pass single partition of PartitionedFiles into DataSourceExec #2675 (mbutrovich)
  • fix: Fix subcommands options in fuzz-testing #2684 (manuzhang)
  • fix: Do not replace SMJ with HJ for LeftSemi #2687 (comphead)
  • fix: Apply spotless on Iceberg 1.8.1 diff [iceberg] #2700 (hsiang-c)
  • fix: Fix generate-user-guide-reference-docs failure when mvn command is not executed at root #2691 (manuzhang)
  • fix: Fix missing SortOrder fallback reason in range partitioning #2716 (andygrove)
  • fix: CometLiteral class cast exception with arrays #2718 (andygrove)
  • fix: NormalizeNaNAndZero::children() returns child's child #2732 (mbutrovich)
  • fix: checkSparkMaybeThrows should compare Spark and Comet results in success case #2728 (andygrove)
  • fix: Mark WindowsExec as incompatible #2748 (andygrove)
  • fix: Add strict floating point mode and fallback to Spark for min/max/sort on floating point inputs when enabled #2747 (andygrove)
  • fix: Implement producedAttributes for CometWindowExec #2789 (rahulbabarwal89)
  • fix: Pass all Comet configs to native plan #2801 (andygrove)

Implemented enhancements:

  • feat: Add option to write benchmark results to file #2640 (andygrove)
  • feat: Implement metrics for iceberg compat #2615 (EmilyMatt)
  • feat: Define function signatures in CometFuzz #2614 (andygrove)
  • feat: cherry-pick UUID conversion logic from #2528 #2648 (mbutrovich)
  • feat: support concat for strings #2604 (comphead)
  • feat: Add support for abs #2689 (andygrove)
  • feat: Support variadic function in CometFuzz #2682 (manuzhang)
  • feat: CometExecRule refactor: Unify CometNativeExec creation with Serde in CometOperatorSerde trait #2768 (andygrove)
  • feat: support cot #2755 (psvri)
  • feat: Add bash script to build and run fuzz testing #2686 (manuzhang)
  • feat: Add getSupportLevel to CometAggregateExpressionSerde trait #2777 (andygrove)
  • feat: Add CI check to ensure generated docs are in sync with code #2779 (andygrove)
  • feat: Add prettier enforcement #2783 (andygrove)
  • feat: hyperbolic trig functions #2784 (psvri)
  • feat: [iceberg] Native scan by serializing FileScanTasks to iceberg-rust #2528 (mbutrovich)

Documentation updates:

  • docs: Add changelog for 0.11.0 release #2585 (mbutrovich)
  • docs: Improve documentation layout #2587 (andygrove)
  • docs: Publish 0.11.0 user guide #2589 (andygrove)
  • docs: Put Comet logo in top nav bar, respect light/dark mode #2591 (andygrove)
  • docs: Improve main landing page #2593 (andygrove)
  • docs: Improve site navigation #2597 (andygrove)
  • docs: Update benchmark results #2596 (andygrove)
  • docs: Upgrade pydata-sphinx-theme to 0.16.1 #2602 (andygrove)
  • docs: Fix redirect #2603 (andygrove)
  • docs: Fix broken image link #2613 (andygrove)
  • docs: Add FFI docs to contributor guide #2668 (andygrove)
  • docs: Various documentation updates #2674 (andygrove)
  • docs: Add supported SortOrder expressions and fix a typo #2694 (andygrove)
  • docs: Minor docs update for running Spark SQL tests #2712 (andygrove)
  • docs: Update contributor guide for adding a new expression #2704 (andygrove)
  • docs: Documentation updates for LocalTableScan and WindowExec #2742 (andygrove)
  • docs: Typo fix #2752 (wForget)
  • docs: Categorize some configs as testing and add notes about known time zone issues #2740 (andygrove)
  • docs: Run prettier on all markdown files #2782 (andygrove)
  • docs: Ignore prettier formatting for generated tables #2790 (andygrove)
  • docs: Add new section to contributor guide, explaining how to add a new operator #2758 (andygrove)

Other:

  • chore: Start 0.12.0 development #2584 (mbutrovich)
  • chore: Bump Spark from 3.5.6 to 3.5.7 #2574 (cfmcgrady)
  • chore(deps): bump parquet from 56.0.0 to 56.2.0 in /native #2608 (dependabot[bot])
  • chore(deps): bump tikv-jemallocator from 0.6.0 to 0.6.1 in /native #2609 (dependabot[bot])
  • chore(deps): bump tikv-jemalloc-ctl from 0.6.0 to 0.6.1 in /native #2610 (dependabot[bot])
  • tests: FuzzDataGenerator instead of Parquet-specific generator #2616 (mbutrovich)
  • chore: Simplify on-heap memory configuration #2599 (andygrove)
  • Feat: Add sha1 function impl #2471 (kazantsev-maksim)
  • chore: Refactor Parquet/DataFrame fuzz data generators #2629 (andygrove)
  • chore: Remove needless from_raw calls #2638 (EmilyMatt)
  • chore: support DataFusion 50.3.0 #2605 (comphead)
  • chore(deps): bump actions/upload-artifact from 4 to 5 #2654 (dependabot[bot])
  • chore(deps): bump cc from 1.2.42 to 1.2.43 in /native #2653 (dependabot[bot])
  • chore(deps): bump actions/download-artifact from 5 to 6 #2652 (dependabot[bot])
  • chore: extract c...
Read more

0.11.0

19 Oct 18:00

Choose a tag to compare

0.11.0 Pre-release
Pre-release

DataFusion Comet 0.11.0 Changelog

This release consists of 131 commits from 15 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: temporarily ignore test for hdfs file systems #2359 (parthchandra)
  • fix: Check reused broadcast plan in non-AQE and make setNumPartitions thread safe #2398 (wForget)
  • fix: correct missingInput for CometHashAggregateExec #2409 (comphead)
  • fix:clippy errros rust 1.9.0 update #2419 (coderfender)
  • fix: Avoid spark plan execution cache preventing CometBatchRDD numPartitions change #2420 (wForget)
  • fix: regressions in CometToPrettyStringSuite #2384 (hsiang-c)
  • fix: Byte array Literals failed on cast #2432 (comphead)
  • fix: Do not push down subquery filters on native_datafusion scan #2438 (wForget)
  • fix: Improve error handling when resolving S3 bucket region #2440 (andygrove)
  • fix: [iceberg] additional parquet independent api for iceberg integration #2442 (parthchandra)
  • fix: Specify reqwest crate features #2446 (andygrove)
  • fix: distributed RangePartitioning bounds calculation with native shuffle #2258 (mbutrovich)
  • fix: fix regression in tpcbench.py #2512 (andygrove)
  • fix: [iceberg] Close reader instance in ReadConf #2510 (hsiang-c)
  • fix: Enable plan stability tests for auto scan #2516 (andygrove)
  • fix: Capture unexpected output when retrieving JVM 17 args in Makefile #2566 (zuston)

Performance related:

  • perf: New Configuration from shared conf to avoid high costs #2402 (wForget)
  • perf: Use DataFusion's count_udaf instead of SUM(IF(expr IS NOT NULL, 1, 0)) #2407 (andygrove)
  • perf: Improve BroadcastExchangeExec conversion #2417 (wForget)

Implemented enhancements:

  • feat: Add dynamic enabled and allowIncompat configs for all supported expressions #2329 (andygrove)
  • feat: feature specific tests #2372 (parthchandra)
  • feat: Support more date part expressions #2316 (wForget)
  • feat: rpad support column for second arg instead of just literal #2099 (coderfender)
  • feat: Support comet native log level conf #2379 (wForget)
  • feat: Enable WeekDay function #2411 (wForget)
  • feat: Add nested Array literal support #2181 (comphead)
  • feat:add_additional_char_support_rpad #2436 (coderfender)
  • feat: do not fallback to Spark for COUNT(distinct) #2429 (comphead)
  • feat: implement_ansi_eval_mode_arithmetic #2136 (coderfender)
  • feat: Add plan conversion statistics to extended explain info #2412 (andygrove)
  • feat: implement_comet_native_lpad_expr #2102 (coderfender)
  • feat: Add backtrace feature to simplify enabling native backtraces in CometNativeException #2515 (andygrove)
  • feat: Support reverse function with ArrayType input #2481 (cfmcgrady)
  • feat: Change default off-heap memory pool from greedy_unified to fair_unified #2526 (andygrove)
  • feat: Make DiskManager max_temp_directory_size configurable #2479 (manuzhang)
  • feat: Parquet Modular Encryption with Spark KMS for native readers #2447 (mbutrovich)
  • feat: Add support for Spark-compatible cast from integral to decimal #2472 (coderfender)
  • feat:Support ANSI mode integral divide #2421 (coderfender)
  • feat: Add config to enable running Comet in onheap mode #2554 (andygrove)
  • feat:support ansi mode rounding function #2542 (coderfender)
  • feat:support ansi mode remainder function #2556 (coderfender)
  • feat: Implement array-to-string cast support #2425 (cfmcgrady)
  • feat: Various improvements to memory pool configuration, logging, and documentation #2538 (andygrove)
  • feat: Enable complex types for columnar shuffle #2573 (mbutrovich)
  • feat: support_decimal_types_bool_cast_native_impl #2490 (coderfender)
  • feat: Use buf write to reduce system call on index write #2579 (zuston)

Documentation updates:

  • doc: Document usage IcebergCometBatchReader.java #2347 (comphead)
  • docs: Add changelog for 0.10.0 release #2361 (andygrove)
  • docs: Fix error in docs #2373 (andygrove)
  • docs: Fix more comet versions in docs #2374 (andygrove)
  • docs: Publish 0.10.0 user guide #2394 (andygrove)
  • doc: macos benches doc clarifications #2418 (comphead)
  • docs: update configs.md after #2422 #2428 (mbutrovich)
  • docs: update docs and tuning guide related to native shuffle #2487 (mbutrovich)
  • docs: Improve EC2 benchmarking guide #2474 (andygrove)
  • docs: docs_update_ansi_support #2496 (coderfender)
  • docs:support lpad expression documentation update #2517 (coderfender)
  • docs: doc changes to support ANSI mode integral divide #2570 (coderfender)
  • docs: Split configuration guide into different sections (scan, exec, shuffle, etc) #2568 (andygrove)
  • docs: doc update to support ANSI mode remainder function #2576 (coderfender)
  • docs: Documentation updates #2581 (andygrove)

Other:

  • chore(deps): bump uuid from 1.18.0 to 1.18.1 in /native #2336 (dependabot[bot])
  • build: Check that all Scala test suites run in PR builds #2304 (andygrove)
  • chore: Start 0.11.0 development #2365 (andygrove)
  • chore: Split expression serde hash map into separate categories #2322 (andygrove)
  • chore: exclude Iceberg diffs from rat checks #2376 (hsiang-c)
  • chore: Refactor UnaryMinus serde #2378 (andygrove)
  • chore: Revert "chore: [1941-Part1]: Introduce map_sort scalar function (#2#2381 (comphead)
  • chore: Refactor Literal serde [#2377](https://github.com/apache/datafusion-comet/pull/...
Read more

0.10.1

06 Oct 18:44

Choose a tag to compare

0.10.1 Pre-release
Pre-release

DataFusion Comet 0.10.1 Changelog

This release consists of 7 commits from 1 contributors. See credits at the end of this changelog for more information.

Documentation updates:

  • docs: [branch-0.10] Update version number in branch-0.10 user guide #2395 (andygrove)

Other:

  • chore: [branch-0.10] Support Spark 4.0.1 instead of 4.0.0 (#2414) #2497 (andygrove)
  • build: [branch-0.10] Stop caching libcomet in CI (#2498) #2502 (andygrove)
  • chore: [branch-0.10] perf: Improve BroadcastExchangeExec conversion #2501 (andygrove)
  • chore: [branch-0.10] [iceberg] additional parquet independent api for iceberg integration (#2442) #2499 (andygrove)
  • fix: [branch-0.10] Avoid spark plan execution cache preventing CometBatchRDD numPartitions change (#2420) #2503 (andygrove)
  • build: [branch-0.10] Bump version to 0.10.1 #2508 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

     7	Andy Grove

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

0.10.0

16 Sep 17:21
9cb0cc4

Choose a tag to compare

0.10.0 Pre-release
Pre-release

DataFusion Comet 0.10.0 Changelog

This release consists of 183 commits from 26 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: [Iceberg] Fix decimal corruption #1985 (andygrove)
  • fix: broken link in development.md #2024 (petern48)
  • fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec #2000 (huaxingao)
  • fix: hdfs read into buffer fully #2031 (parthchandra)
  • fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY #2018 (andygrove)
  • fix: clean up [iceberg] integration APIs #2032 (huaxingao)
  • fix: zero Arrow Array offset before sending across FFI #2052 (mbutrovich)
  • fix: [iceberg] more fixes for Iceberg integration APIs. #2078 (parthchandra)
  • fix: Add support for StringDecode in Spark 4.0.0 #2075 (peter-toth)
  • fix: Avoid double free in CometUnifiedShuffleMemoryAllocator #2122 (andygrove)
  • fix: Remove duplicate serde code #2098 (andygrove)
  • fix: Improve logic for determining when an UnpackOrDeepCopy is needed #2142 (andygrove)
  • fix: Add CopyExec to inputs to SortMergeJoinExec #2155 (andygrove)
  • fix: Fix repeatedly url-decode path when reading parquet from s3 using native parquet reader #2138 (Kontinuation)
  • fix: [iceberg] Switch to OSS Spark and run Iceberg Spark tests in parallel #1987 (hsiang-c)
  • fix: [iceberg] Fall back to spark for schemas with empty structs #2204 (andygrove)
  • fix: Fix failing TPC-DS workflow in PR CI runs #2207 (andygrove)
  • fix: [iceberg] order query result deterministically #2208 (hsiang-c)
  • fix: use spark.comet.batchSize instead of conf.arrowMaxRecordsPerBatch for data that is coming from Java #2196 (rluvaton)
  • fix: if expr nullable #2217 (Asura7969)
  • fix: Support auto scan mode with Spark 4.0.0 #1975 (andygrove)
  • fix: Make Sha2 fallback message more user-friendly #2213 (rishvin)
  • fix: separate type checking for CometExchange and CometColumnarExchange #2241 (mbutrovich)
  • fix: Fix potential resource leak in native shuffle block reader #2247 (andygrove)
  • fix: Remove unreachable code in CometScanRule #2252 (andygrove)
  • fix: Fall back to native_comet for encrypted Parquet scans #2250 (andygrove)
  • fix: Fall back to native_comet when object store not supported by native_iceberg_compat #2251 (andygrove)
  • fix: split expr.proto file (new) #2267 (kination)
  • fix: handle cast to dictionary vector introduced by case when #2044 (parthchandra)
  • fix: Remove check for custom S3 endpoints #2288 (andygrove)
  • fix: implement lazy evaluation in Coalesce function #2270 (coderfender)
  • fix: Update benchmarking scripts #2293 (andygrove)
  • fix: Fix regression in NativeConfigSuite #2299 (andygrove)
  • fix: Validating object store configs should not throw exception #2308 (andygrove)
  • fix: TakeOrderedAndProjectExec is not reporting all fallback reasons #2323 (kazuyukitanimura)
  • fix: Fallback length function with binary input #2349 (wForget)

Performance related:

  • perf: Optimize AvgDecimalGroupsAccumulator #1893 (leung-ming)
  • perf: Optimize SumDecimalGroupsAccumulator::update_single #2069 (leung-ming)
  • perf: Avoid FFI copy in ScanExec when reading data from exchanges #2268 (andygrove)

Implemented enhancements:

  • feat: Add from_unixtime support #1943 (kazuyukitanimura)
  • feat: randn expression support #2010 (akupchinskiy)
  • feat: monotonically_increasing_id and spark_partition_id implementation #2037 (akupchinskiy)
  • feat: support map_entries #2059 (comphead)
  • feat: Support Array Literal #2057 (comphead)
  • feat: Add new trait for operator serde #2115 (andygrove)
  • feat: limit with offset support #2070 (akupchinskiy)
  • feat: Include scan implementation name in CometScan nodeName #2141 (andygrove)
  • feat: Add config option to log fallback reasons #2154 (andygrove)
  • feat: [iceberg] Enable Comet shuffle in Iceberg diff #2205 (andygrove)
  • feat: Improve shuffle fallback reporting #2194 (andygrove)
  • feat: Reset data buf of NativeBatchDecoderIterator on close #2235 (wForget)
  • feat: Improve fallback mechanism for ANSI mode #2211 (andygrove)
  • feat: Support hdfs with OpenDAL #2244 (wForget)
  • feat: Ignore fallback info for command execs #2297 (wForget)
  • feat: Improve some confusing fallback reasons #2301 (wForget)
  • feat: Make supported hadoop filesystem schemes configurable #2272 (wForget)
  • feat: [1941-Part1]: Introduce map-sort scalar function #2262 (rishvin)
  • feat: [iceberg] delete rows support using selection vectors #2346 (parthchandra)

Documentation updates:

  • docs: Update benchmark results for 0.9.0 #1959 (andygrove)
  • doc: Add comment about local clippy run before submitting a pull request #1961 (akupchinskiy)
  • docs: Minor improvements to Spark SQL test docs #1980 (andygrove)
  • docs: Update Maven links for 0.9.0 release #1988 (andygrove)
  • docs: Documentation updates for 0.9.0 release #1981 (andygrove)
  • docs: Add guide showing comparison between Comet and Gluten #2012 (andygrove)
  • docs: Remove legacy comment in docs #2022 (andygrove)
  • docs: Update Gluten comparision to clarify that Velox is open-source #2043 (andygrove)
  • docs: Improve Gluten comparison based on feedback from the community #2048 (andygrove)
  • docs: added a missing export into the plan stability section #2071 (akupchinskiy)
  • doc: Added documentation for supported map functions #2074 (codetyri0n)
  • doc: Alternative way to start Spark Master to run benchmarks #2072 (comphead)
  • docs: Update to support try arithmetic functions #2143 (coderfender)
  • doc: update macos standalone spark start instructions #2103 (comphead)
  • docs: Update confs to bypass Iceberg Spark issues #2166 (hsiang-c)
  • docs: Add Roadmap #2191 (andygrove)
  • docs: Update installation guide for 0.9.1 #2230 (andygrov...
Read more

0.9.1

25 Aug 16:49
a168c9a

Choose a tag to compare

0.9.1 Pre-release
Pre-release

DataFusion Comet 0.9.1 Changelog

This release consists of 2 commits from 1 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: [branch-0.9] Backport FFI fix #2164 (andygrove)
  • fix: [branch-0.9] Avoid double free in CometUnifiedShuffleMemoryAllocator #2201 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

     2	Andy Grove

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.