Skip to content

Releases: apache/incubator-gluten

v1.5.0-rc0

03 Oct 17:40
42e3f92
Compare
Choose a tag to compare
v1.5.0-rc0 Pre-release
Pre-release

What's Changed

  • [GLUTEN-8846][CH] [Part 3] Add benchmark for Icerberg Delete by @baibaichen in #9192
  • [GLUTEN-9020][CH] Support delta DV BitmapAggregator by @loneylee in #9138
  • [GLUTEN-9197][CH] Simplify sum aggregate expression by @taiyang-li in #9198
  • [VL] Enable more ut in VeloxTestSettings by @WangGuangxin in #9080
  • [GLUTEN-9199][VL] Fix error when creating shuffle file: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments by @zhztheplayer in #9200
  • [CORE] Fix duplicate setting for config LEGACY_TIME_PARSER_POLICY by @jinchengchenghh in #9201
  • [GLUTEN-9176][CH] Rewrite aggregate if to aggregate with filter clause by @taiyang-li in #9185
  • [GLUTEN-8557][CH] Flatten nested And/Or for performance optimization by @KevinyhZou in #8558
  • Revert "[GLUTEN-9164][CH]Enable row group level bloom filter push down" by @taiyang-li in #9214
  • [GLUTEN-9182][VL] Support new s3 configuration in Gluten by @dcoliversun in #9183
  • [VL] Celeborn shuffle reader OOM with many empty input stream by @marin-ma in #9221
  • [GLUTEN-8821][VL] Update aggregate/generator/window support doc and script by @marin-ma in #8971
  • [VL] Change to use Velox's wget_and_untar in setup-centos7.sh by @yaooqinn in #9207
  • [GLUTEN-9196][CH] Use wide-table aggregation to eliminate multi-table joins by @lgbo-ustc in #9155
  • [GLUTEN-9149][CORE] Remove Spark-specific code from JniLibLoader & JniWorkspace by @shuai-xu in #9150
  • [VL][CI] Change to use JDK-17 for Spark 3.3/3.4/3.5 tests by @PHILO-HE in #9209
  • [CORE][VL] Hide child nodes from implementations of OffloadSingleNode by @zhztheplayer in #9220
  • [GLUTEN-9008][VL] Support json_object_keys function by @dcoliversun in #9009
  • [GLUTEN-9239][CH] Support JDK17 for the CH backend by @zzcclp in #9242
  • [GLUTEN-9152][CORE] Avoid unnecessary serialization of hadoop conf by @zml1206 in #9153
  • [GLUTEN-9240][VL] Write NULL value into relation in gluten unit tests by @dcoliversun in #9241
  • [VL][CI] bump to use ubuntu-22.04 runner by @zhouyuan in #9262
  • [GLUTEN-9177][CH]Fix diff on parse host of url and refactor SparkParseURL by @KevinyhZou in #9179
  • [CORE] Decrease offheap memory size in resource profile for whole stage fallback case by @PHILO-HE in #8911
  • [GLUTEN-9205][CH] Support deletion vector native write by @loneylee in #9248
  • [VL] Delete global reference to a class object in JNI unload by @PHILO-HE in #9268
  • [GLUTEN-9245][VL] Fix partial project expression contains subquery by @jinchengchenghh in #9259
  • [GLUTEN-9244][CORE] Change the way of passing default timezone to native config by @zml1206 in #9249
  • [GLUTEN-8497][VL] Fix columnar batch type mismatch in table cache by @zhztheplayer in #9230
  • [VL] Support Spark legacy statistical aggregation function behavior by @NEUpanning in #9181
  • [CORE] Remove library unloading API from JniLibLoader as unused by @zhztheplayer in #9277
  • [GLUTEN-9237][CH] Fix the nullability missmatch issue for the Nothing type by @lgbo-ustc in #9238
  • [VL] Disable FlushableHashAggreagte when aggregates contains sum/avg for floating type by @kecookier in #8986
  • [CORE] Refine the test with specified spark version by @yikf in #9274
  • [CH] Add a comment to explain why the endpoint uses a single thread by @dcoliversun in #9257
  • [GLUTEN-8891][VL] Refine local ssd cache feature by @zhouyuan in #9228
  • [GLUTEN-9267][CH] Fix a bug in EliminateDeduplicateAggregateWithAnyJoin by @lgbo-ustc in #9293
  • [VL] Remove param original of ColumnarPartialProjectExec by @zml1206 in #9290
  • [GLUTEN-9178][CH] Fix cse in aggregate operator not working by @loneylee in #9301
  • [CORE] Post events until both spark ui and gluten ui are enable by @yikf in #9272
  • [CORE] Correctly handle driver configurations when spark.sql.extensions is explicitly set for GlutenSessionExtensions by @zhztheplayer in #9312
  • [GLUTEN-8851][VL] feat: Support cudf by @jinchengchenghh in #9229
  • [GLUTEN-9288][VL] Enable array_prepend function for spark 3.5+ by @dcoliversun in #9305
  • [GLUTEN-9317][CH]Fix: duplicated column names in shuffle read by @lgbo-ustc in #9318
  • [Gluten-9254][CH] Support RDDScanExec by @loneylee in #9270
  • [VL] Count total JVM memory as the on-heap portion for the off-heap sizing feature by @zhztheplayer in #9321
  • [GLUTEN-9300][DOC] Support replacement expression in gen-function-support-docs by @dcoliversun in #9331
  • [GLUTEN-9239][CH] [PART-1] Support Java-17 Rmove JNI_OnUnload by @baibaichen in #9275
  • [GLUTEN-7652][VL] Support binary as string by @wForget in #9325
  • [Gluten-9334][CH] Support delta metadata column file_path and row_index for mergetree by @loneylee in #9340
  • [GLUTEN-6867][CH] Fix Bug that cann't read file on minio by @baibaichen in #9332
  • [VL] Provide a configuration option to completely turn off off-heap memory tracking with Spark memory manager by @zhztheplayer in #9341
  • [GLUTEN-9313][VL] ColumnarPartialProject supports built-in but blacklisted function by @WangGuangxin in #9315
  • [GLUTEN-8772][CORE] refactor: Refactoring the usage of SubstraitContext#functionMap by @wypb in #8775
  • [VL] Move pre-configuration code of dynamic off-heap sizing to its own place by @zhztheplayer in #9336
  • [GLUTEN-9163][VL] Use stream de/compressor in sort-based shuffle by @marin-ma in #9278
  • [GLUTEN-9287][VL] Enable array_compact function for Spark 3.4+ by @dcoliversun in #9349
  • [GLUTEN-9095][UT] Remove Vanilla Spark InternalRow based checkEvaluation by @ArnavBalyan in #9096
  • [CORE] Make max broadcast table size configurable by @yaooqinn in #9359
  • [CH] Fix build error by @exmy in #9363
  • [GLUTEN-9243][VL] Fix cuda docker image by @zhouyuan in #9333
  • [GLUTEN-8912][VL] Add Offset support for CollectLimitExec by @ArnavBalyan in #8914
  • [GLUTEN-7589][VL] Support date_trunc function by @zml1206 in #7611
  • [GLUTEN-9279] Not pulling out expression from PartialMerge aggregate function to avoid invalid reference binding in ProjectExecTransformer by @Z1Wu in #9280
  • [Gluten-8792][CH] Support delta project incrementMetric expr by @loneylee in #9353
  • [GLUTEN-9034][VL] Add VeloxResizeBatchesExec for Shuffle by @WangGuangxin in #9035
  • Fix ColumnarToRowRemovalGuard not able to be copied by @yaooqinn in #9384
  • [GLUTEN-8846][CH] [Part 4] Add full-chain UT by @jlfsdtc in #9256
  • [VL] Follow up on #9384 to avoid swallowing exceptions in UT by @zhztheplayer in #9393
  • [GLUTEN-9163][VL] Separate compression buffer and disk write buffer configuration by @marin-ma in #9356
  • [VL][INFRA] Improve build bundle package workflow by @wForget in #9404
  • [VL] Refactor WholeStageTransformer to remove some duplicate code by @wyp...
Read more

v1.4.0

19 Jun 09:43
50dd117
Compare
Choose a tag to compare

Release Notes - Gluten version 1.4.0

Highlights

  • Spark 3.2.2/3.3.1/3.4.4(upgraded)/3.5.2
  • Add more spark functions support including date_format, make_date, map_filter, map_concat, from_json, btrim, array_append, and more
  • Add more spark operators support including Range, CollectLimit, and more
  • Update OAP's Velox codebase to 2025/05/12
  • Join optimizations: BNLJ full outer join
  • Shuffle optimizations: RSS ShuffleReader optimization and bug fixing
  • RSS: Celeborn 0.5.4(upgraded)/Uniffle 0.9.2(upgraded)
  • Query Plan: RAS cost model optimizations and refactor
  • Datalake: Add Iceberg/Hudi in test
  • CI: Docker image and JDK version update
  • Support dynamically adjust Stage Resource Profile
  • Support Query Trace
  • Add Qualification Tool
  • Fix OOM issues for some untracked memory

What's Changed

  • [GLUTEN-8327][CORE][Part-3] Introduce the ConfigEntry to gluten config by @yikf in #8431
  • [VL] Fix wrong warning of "Memory overhead is set to ..." under default Spark config settings by @zhztheplayer in #8448
  • [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. by @yikf in #8386
  • Revert "[CH] Disable gluten arm ci" by @lwz9103 in #8460
  • [GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer by @ArnavBalyan in #8454
  • [CH] Add tools to dump ActionsDAG into tree graph by @taiyang-li in #8461
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_08) by @GlutenPerfBot in #8457
  • [VL] Update document of build gluten in Docker by @FelixYBW in #8459
  • [GLUTEN-8462][CORE] Raise a meaningful error when no component is found from classpath by @zhztheplayer in #8468
  • [GLUTEN-8453][VL] Follow-up to #8454 to add a ensureVeloxBatch API for limited use cases by @zhztheplayer in #8463
  • [VL] Refactor Velox.md by @FelixYBW in #8478
  • [GLUTEN-8465] [VL] Bump Celeborn to 0.5.3 by @SteNicholas in #8467
  • [GLUTEN-8455][VL] Fallback Scan for Encrypted Parquet Files by @ArnavBalyan in #8456
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_09) by @GlutenPerfBot in #8472
  • [CORE] Refactor columnar noop write rule by @jackylee-ch in #8422
  • [GLUTEN-8462][CH] Fixed the loading of Components and Backend by @gleonSun in #8464
  • [GLUTEN-8414][VL] Override doCanonicalize in ColumnarPartialProjectEx… by @lifulong in #8415
  • [GLUTEN-8397][CH][Part-2] Fix statica_cast failed on macos by @yxheartipp in #8485
  • [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it by @KevinyhZou in #8351
  • [GLUTEN-8481][VL] Clean up shuffle reader cpp code by @marin-ma in #8482
  • [Core] Bump version to 1.4.0-SNAPSHOT by @weiting-chen in #8452
  • [GLUTEN-8483][CORE] A stable and universal way to find component files by @zhztheplayer in #8486
  • [DOC][VL] Fix typo in microbenchmark.md by @marin-ma in #8495
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250110) by @kyligence-git in #8490
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_10) by @GlutenPerfBot in #8489
  • [GLUTEN-8476][VL] Fix allocate and free memory by @jkhaliqi in #8477
  • [GLUTEN-8503][VL] Fix macro parenthesis CVE by @jkhaliqi in #8504
  • [GLUTEN-8471][VL] Fix usage of uninitialized variables by @jkhaliqi in #8470
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_11) by @GlutenPerfBot in #8507
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_12) by @GlutenPerfBot in #8508
  • [GLUTEN-8497][VL] A bad test case that fails columnar table cache query by @zhztheplayer in #8498
  • [DOC] Update README.md by @PHILO-HE in #8444
  • [GLUTEN-8319][VL] Support date_format Spark function by @PHILO-HE in #8323
  • [GLUTEN-8487][VL] adding JDK11 based Centos8 image by @zhouyuan in #8513
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_14) by @GlutenPerfBot in #8522
  • [GLUTEN-8020][VL] Remove the libhdfs3 installation script required for static linking by @JkSelf in #8013
  • [GLUTEN-8532][VL] Fix parenthesis within macro by @jkhaliqi in #8533
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_15) by @GlutenPerfBot in #8536
  • [CORE] Use RAS's cost model for legacy transition planner to evaluate cost of transitions by @zhztheplayer in #8527
  • [GLUTEN-8487][VL] adding JDK17 based Centos8 image (#8513) by @zhouyuan in #8539
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250115) by @kyligence-git in #8537
  • [GLUTEN-8479][CORE][Part-1] Remove unnecessary config by @yikf in #8480
  • [GLUTEN-8520][VL] Fix bitwise operators by @jkhaliqi in #8521
  • [GLUTEN-8524][VL] Fix input output errors by @jkhaliqi in #8525
  • [GLUTEN-6876][VL] update spark 3.5.2 in doc by @FelixYBW in #8543
  • [GLUTEN-8455][VL] Port encrypted file checks to shim layer by @ArnavBalyan in #8501
  • [CORE][VL] Cost model code refactors by @zhztheplayer in #8541
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_16) by @GlutenPerfBot in #8546
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250116) by @kyligence-git in #8544
  • [GLUTEN-8432][CH]Remove duplicate output attributes of aggregate's child by @lgbo-ustc in #8450
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_17) by @GlutenPerfBot in #8553
  • [GLUTEN-8497][CORE] A unified CallInfo API to replace AdaptiveContext by @zhztheplayer in #8551
  • [GLUTEN-8529][CH]Fix get_json_object when path has asterisk by @KevinyhZou in #8540
  • [MINOR] Fix comment of function VeloxAggregateFunctionsBuilder.create by @zml1206 in #8549
  • [CORE] Optimize duplicated code for create rel node by @zml1206 in #8548
  • [GLUTEN-7706][CORE] Support Spark-344 + JDK17 by @zhouyuan in #7789
  • [GLUTEN-8475][VL] Fix C-style casts to C++-style by @jkhaliqi in #8474
  • [GLUTEN-8534][VL] Fix allowing loops to iterate beyond end of array by @jkhaliqi in #8535
  • [GLUTEN-8538][VL] Fix incorrect calculation of buffer size by @jkhaliqi in #8542
  • [CORE][CH] Support MicroBatchScanExec with KafkaScan in batch mode by @loneylee in #8321
  • [CORE][MIRROR] Change config.defaultValue.get.toString to config.defaultValueString by @jackylee-ch in #8572
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_18) by @GlutenPerfBot in #8561
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_19) by @GlutenPerfBot in #8563
  • [GLUTEN-8406][CH] Replace from_json(s, 'Map<String, String>')[k] with get_json_object(s, '$.k') by @lgbo-ustc in #8409
  • [GLUTEN-8479][CORE][Part-2] All configurations should be defined through ConfigEntry by @yikf in #8559
  • [VL] CMake configuration cleanup to remove variable VELOX_COMPONENTS_PATH by @zhztheplayer in #8579
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250121) by @kyligence-git in #8577
  • [DOC] Fix outdated operators in documentation by @ArnavBalyan in #8582
  • [GLUTEN-8379][VL] Support query trace by @jinchengchenghh in https://gi...
Read more

v1.4.0-rc2

06 Jun 09:52
50dd117
Compare
Choose a tag to compare
v1.4.0-rc2 Pre-release
Pre-release

What's Changed

  • [GLUTEN-8327][CORE][Part-3] Introduce the ConfigEntry to gluten config by @yikf in #8431
  • [VL] Fix wrong warning of "Memory overhead is set to ..." under default Spark config settings by @zhztheplayer in #8448
  • [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. by @yikf in #8386
  • Revert "[CH] Disable gluten arm ci" by @lwz9103 in #8460
  • [GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer by @ArnavBalyan in #8454
  • [CH] Add tools to dump ActionsDAG into tree graph by @taiyang-li in #8461
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_08) by @GlutenPerfBot in #8457
  • [VL] Update document of build gluten in Docker by @FelixYBW in #8459
  • [GLUTEN-8462][CORE] Raise a meaningful error when no component is found from classpath by @zhztheplayer in #8468
  • [GLUTEN-8453][VL] Follow-up to #8454 to add a ensureVeloxBatch API for limited use cases by @zhztheplayer in #8463
  • [VL] Refactor Velox.md by @FelixYBW in #8478
  • [GLUTEN-8465] [VL] Bump Celeborn to 0.5.3 by @SteNicholas in #8467
  • [GLUTEN-8455][VL] Fallback Scan for Encrypted Parquet Files by @ArnavBalyan in #8456
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_09) by @GlutenPerfBot in #8472
  • [CORE] Refactor columnar noop write rule by @jackylee-ch in #8422
  • [GLUTEN-8462][CH] Fixed the loading of Components and Backend by @gleonSun in #8464
  • [GLUTEN-8414][VL] Override doCanonicalize in ColumnarPartialProjectEx… by @lifulong in #8415
  • [GLUTEN-8397][CH][Part-2] Fix statica_cast failed on macos by @yxheartipp in #8485
  • [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it by @KevinyhZou in #8351
  • [GLUTEN-8481][VL] Clean up shuffle reader cpp code by @marin-ma in #8482
  • [Core] Bump version to 1.4.0-SNAPSHOT by @weiting-chen in #8452
  • [GLUTEN-8483][CORE] A stable and universal way to find component files by @zhztheplayer in #8486
  • [DOC][VL] Fix typo in microbenchmark.md by @marin-ma in #8495
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250110) by @kyligence-git in #8490
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_10) by @GlutenPerfBot in #8489
  • [GLUTEN-8476][VL] Fix allocate and free memory by @jkhaliqi in #8477
  • [GLUTEN-8503][VL] Fix macro parenthesis CVE by @jkhaliqi in #8504
  • [GLUTEN-8471][VL] Fix usage of uninitialized variables by @jkhaliqi in #8470
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_11) by @GlutenPerfBot in #8507
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_12) by @GlutenPerfBot in #8508
  • [GLUTEN-8497][VL] A bad test case that fails columnar table cache query by @zhztheplayer in #8498
  • [DOC] Update README.md by @PHILO-HE in #8444
  • [GLUTEN-8319][VL] Support date_format Spark function by @PHILO-HE in #8323
  • [GLUTEN-8487][VL] adding JDK11 based Centos8 image by @zhouyuan in #8513
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_14) by @GlutenPerfBot in #8522
  • [GLUTEN-8020][VL] Remove the libhdfs3 installation script required for static linking by @JkSelf in #8013
  • [GLUTEN-8532][VL] Fix parenthesis within macro by @jkhaliqi in #8533
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_15) by @GlutenPerfBot in #8536
  • [CORE] Use RAS's cost model for legacy transition planner to evaluate cost of transitions by @zhztheplayer in #8527
  • [GLUTEN-8487][VL] adding JDK17 based Centos8 image (#8513) by @zhouyuan in #8539
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250115) by @kyligence-git in #8537
  • [GLUTEN-8479][CORE][Part-1] Remove unnecessary config by @yikf in #8480
  • [GLUTEN-8520][VL] Fix bitwise operators by @jkhaliqi in #8521
  • [GLUTEN-8524][VL] Fix input output errors by @jkhaliqi in #8525
  • [GLUTEN-6876][VL] update spark 3.5.2 in doc by @FelixYBW in #8543
  • [GLUTEN-8455][VL] Port encrypted file checks to shim layer by @ArnavBalyan in #8501
  • [CORE][VL] Cost model code refactors by @zhztheplayer in #8541
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_16) by @GlutenPerfBot in #8546
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250116) by @kyligence-git in #8544
  • [GLUTEN-8432][CH]Remove duplicate output attributes of aggregate's child by @lgbo-ustc in #8450
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_17) by @GlutenPerfBot in #8553
  • [GLUTEN-8497][CORE] A unified CallInfo API to replace AdaptiveContext by @zhztheplayer in #8551
  • [GLUTEN-8529][CH]Fix get_json_object when path has asterisk by @KevinyhZou in #8540
  • [MINOR] Fix comment of function VeloxAggregateFunctionsBuilder.create by @zml1206 in #8549
  • [CORE] Optimize duplicated code for create rel node by @zml1206 in #8548
  • [GLUTEN-7706][CORE] Support Spark-344 + JDK17 by @zhouyuan in #7789
  • [GLUTEN-8475][VL] Fix C-style casts to C++-style by @jkhaliqi in #8474
  • [GLUTEN-8534][VL] Fix allowing loops to iterate beyond end of array by @jkhaliqi in #8535
  • [GLUTEN-8538][VL] Fix incorrect calculation of buffer size by @jkhaliqi in #8542
  • [CORE][CH] Support MicroBatchScanExec with KafkaScan in batch mode by @loneylee in #8321
  • [CORE][MIRROR] Change config.defaultValue.get.toString to config.defaultValueString by @jackylee-ch in #8572
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_18) by @GlutenPerfBot in #8561
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_19) by @GlutenPerfBot in #8563
  • [GLUTEN-8406][CH] Replace from_json(s, 'Map<String, String>')[k] with get_json_object(s, '$.k') by @lgbo-ustc in #8409
  • [GLUTEN-8479][CORE][Part-2] All configurations should be defined through ConfigEntry by @yikf in #8559
  • [VL] CMake configuration cleanup to remove variable VELOX_COMPONENTS_PATH by @zhztheplayer in #8579
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250121) by @kyligence-git in #8577
  • [DOC] Fix outdated operators in documentation by @ArnavBalyan in #8582
  • [GLUTEN-8379][VL] Support query trace by @jinchengchenghh in #8380
  • [GLUTEN-8266][VL][CI] Pre-install uniffle in docker image by @zhouyuan in #8578
  • [VL] Update the Scaladoc of Component API by @zhztheplayer in #8589
  • [GLUTEN-8455][VL] Support encrypted parquet fallback for 3.5 by @ArnavBalyan in #8560
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_22) by @GlutenPerfBot in #8587
  • [GLUTEN-8580][CORE][Part-1] Clean up unnecessary code related to input file expression by @zml1206 in #8584
  • [GLUTEN-8379][VL] Fix typo in query trace document by @jinchengchenghh in https://github.com...
Read more

v1.4.0-rc1

21 May 12:42
bb28bb7
Compare
Choose a tag to compare
v1.4.0-rc1 Pre-release
Pre-release

What's Changed

  • [GLUTEN-8327][CORE][Part-3] Introduce the ConfigEntry to gluten config by @yikf in #8431
  • [VL] Fix wrong warning of "Memory overhead is set to ..." under default Spark config settings by @zhztheplayer in #8448
  • [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. by @yikf in #8386
  • Revert "[CH] Disable gluten arm ci" by @lwz9103 in #8460
  • [GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer by @ArnavBalyan in #8454
  • [CH] Add tools to dump ActionsDAG into tree graph by @taiyang-li in #8461
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_08) by @GlutenPerfBot in #8457
  • [VL] Update document of build gluten in Docker by @FelixYBW in #8459
  • [GLUTEN-8462][CORE] Raise a meaningful error when no component is found from classpath by @zhztheplayer in #8468
  • [GLUTEN-8453][VL] Follow-up to #8454 to add a ensureVeloxBatch API for limited use cases by @zhztheplayer in #8463
  • [VL] Refactor Velox.md by @FelixYBW in #8478
  • [GLUTEN-8465] [VL] Bump Celeborn to 0.5.3 by @SteNicholas in #8467
  • [GLUTEN-8455][VL] Fallback Scan for Encrypted Parquet Files by @ArnavBalyan in #8456
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_09) by @GlutenPerfBot in #8472
  • [CORE] Refactor columnar noop write rule by @jackylee-ch in #8422
  • [GLUTEN-8462][CH] Fixed the loading of Components and Backend by @gleonSun in #8464
  • [GLUTEN-8414][VL] Override doCanonicalize in ColumnarPartialProjectEx… by @lifulong in #8415
  • [GLUTEN-8397][CH][Part-2] Fix statica_cast failed on macos by @yxheartipp in #8485
  • [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it by @KevinyhZou in #8351
  • [GLUTEN-8481][VL] Clean up shuffle reader cpp code by @marin-ma in #8482
  • [Core] Bump version to 1.4.0-SNAPSHOT by @weiting-chen in #8452
  • [GLUTEN-8483][CORE] A stable and universal way to find component files by @zhztheplayer in #8486
  • [DOC][VL] Fix typo in microbenchmark.md by @marin-ma in #8495
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250110) by @kyligence-git in #8490
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_10) by @GlutenPerfBot in #8489
  • [GLUTEN-8476][VL] Fix allocate and free memory by @jkhaliqi in #8477
  • [GLUTEN-8503][VL] Fix macro parenthesis CVE by @jkhaliqi in #8504
  • [GLUTEN-8471][VL] Fix usage of uninitialized variables by @jkhaliqi in #8470
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_11) by @GlutenPerfBot in #8507
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_12) by @GlutenPerfBot in #8508
  • [GLUTEN-8497][VL] A bad test case that fails columnar table cache query by @zhztheplayer in #8498
  • [DOC] Update README.md by @PHILO-HE in #8444
  • [GLUTEN-8319][VL] Support date_format Spark function by @PHILO-HE in #8323
  • [GLUTEN-8487][VL] adding JDK11 based Centos8 image by @zhouyuan in #8513
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_14) by @GlutenPerfBot in #8522
  • [GLUTEN-8020][VL] Remove the libhdfs3 installation script required for static linking by @JkSelf in #8013
  • [GLUTEN-8532][VL] Fix parenthesis within macro by @jkhaliqi in #8533
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_15) by @GlutenPerfBot in #8536
  • [CORE] Use RAS's cost model for legacy transition planner to evaluate cost of transitions by @zhztheplayer in #8527
  • [GLUTEN-8487][VL] adding JDK17 based Centos8 image (#8513) by @zhouyuan in #8539
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250115) by @kyligence-git in #8537
  • [GLUTEN-8479][CORE][Part-1] Remove unnecessary config by @yikf in #8480
  • [GLUTEN-8520][VL] Fix bitwise operators by @jkhaliqi in #8521
  • [GLUTEN-8524][VL] Fix input output errors by @jkhaliqi in #8525
  • [GLUTEN-6876][VL] update spark 3.5.2 in doc by @FelixYBW in #8543
  • [GLUTEN-8455][VL] Port encrypted file checks to shim layer by @ArnavBalyan in #8501
  • [CORE][VL] Cost model code refactors by @zhztheplayer in #8541
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_16) by @GlutenPerfBot in #8546
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250116) by @kyligence-git in #8544
  • [GLUTEN-8432][CH]Remove duplicate output attributes of aggregate's child by @lgbo-ustc in #8450
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_17) by @GlutenPerfBot in #8553
  • [GLUTEN-8497][CORE] A unified CallInfo API to replace AdaptiveContext by @zhztheplayer in #8551
  • [GLUTEN-8529][CH]Fix get_json_object when path has asterisk by @KevinyhZou in #8540
  • [MINOR] Fix comment of function VeloxAggregateFunctionsBuilder.create by @zml1206 in #8549
  • [CORE] Optimize duplicated code for create rel node by @zml1206 in #8548
  • [GLUTEN-7706][CORE] Support Spark-344 + JDK17 by @zhouyuan in #7789
  • [GLUTEN-8475][VL] Fix C-style casts to C++-style by @jkhaliqi in #8474
  • [GLUTEN-8534][VL] Fix allowing loops to iterate beyond end of array by @jkhaliqi in #8535
  • [GLUTEN-8538][VL] Fix incorrect calculation of buffer size by @jkhaliqi in #8542
  • [CORE][CH] Support MicroBatchScanExec with KafkaScan in batch mode by @loneylee in #8321
  • [CORE][MIRROR] Change config.defaultValue.get.toString to config.defaultValueString by @jackylee-ch in #8572
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_18) by @GlutenPerfBot in #8561
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_19) by @GlutenPerfBot in #8563
  • [GLUTEN-8406][CH] Replace from_json(s, 'Map<String, String>')[k] with get_json_object(s, '$.k') by @lgbo-ustc in #8409
  • [GLUTEN-8479][CORE][Part-2] All configurations should be defined through ConfigEntry by @yikf in #8559
  • [VL] CMake configuration cleanup to remove variable VELOX_COMPONENTS_PATH by @zhztheplayer in #8579
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250121) by @kyligence-git in #8577
  • [DOC] Fix outdated operators in documentation by @ArnavBalyan in #8582
  • [GLUTEN-8379][VL] Support query trace by @jinchengchenghh in #8380
  • [GLUTEN-8266][VL][CI] Pre-install uniffle in docker image by @zhouyuan in #8578
  • [VL] Update the Scaladoc of Component API by @zhztheplayer in #8589
  • [GLUTEN-8455][VL] Support encrypted parquet fallback for 3.5 by @ArnavBalyan in #8560
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_22) by @GlutenPerfBot in #8587
  • [GLUTEN-8580][CORE][Part-1] Clean up unnecessary code related to input file expression by @zml1206 in #8584
  • [GLUTEN-8379][VL] Fix typo in query trace document by @jinchengchenghh in https://github.com...
Read more

v1.4.0-rc0

08 Apr 14:12
88899db
Compare
Choose a tag to compare
v1.4.0-rc0 Pre-release
Pre-release

What's Changed

  • [GLUTEN-8327][CORE][Part-3] Introduce the ConfigEntry to gluten config by @yikf in #8431
  • [VL] Fix wrong warning of "Memory overhead is set to ..." under default Spark config settings by @zhztheplayer in #8448
  • [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. by @yikf in #8386
  • Revert "[CH] Disable gluten arm ci" by @lwz9103 in #8460
  • [GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer by @ArnavBalyan in #8454
  • [CH] Add tools to dump ActionsDAG into tree graph by @taiyang-li in #8461
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_08) by @GlutenPerfBot in #8457
  • [VL] Update document of build gluten in Docker by @FelixYBW in #8459
  • [GLUTEN-8462][CORE] Raise a meaningful error when no component is found from classpath by @zhztheplayer in #8468
  • [GLUTEN-8453][VL] Follow-up to #8454 to add a ensureVeloxBatch API for limited use cases by @zhztheplayer in #8463
  • [VL] Refactor Velox.md by @FelixYBW in #8478
  • [GLUTEN-8465] [VL] Bump Celeborn to 0.5.3 by @SteNicholas in #8467
  • [GLUTEN-8455][VL] Fallback Scan for Encrypted Parquet Files by @ArnavBalyan in #8456
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_09) by @GlutenPerfBot in #8472
  • [CORE] Refactor columnar noop write rule by @jackylee-ch in #8422
  • [GLUTEN-8462][CH] Fixed the loading of Components and Backend by @gleonSun in #8464
  • [GLUTEN-8414][VL] Override doCanonicalize in ColumnarPartialProjectEx… by @lifulong in #8415
  • [GLUTEN-8397][CH][Part-2] Fix statica_cast failed on macos by @yxheartipp in #8485
  • [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it by @KevinyhZou in #8351
  • [GLUTEN-8481][VL] Clean up shuffle reader cpp code by @marin-ma in #8482
  • [Core] Bump version to 1.4.0-SNAPSHOT by @weiting-chen in #8452
  • [GLUTEN-8483][CORE] A stable and universal way to find component files by @zhztheplayer in #8486
  • [DOC][VL] Fix typo in microbenchmark.md by @marin-ma in #8495
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250110) by @kyligence-git in #8490
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_10) by @GlutenPerfBot in #8489
  • [GLUTEN-8476][VL] Fix allocate and free memory by @jkhaliqi in #8477
  • [GLUTEN-8503][VL] Fix macro parenthesis CVE by @jkhaliqi in #8504
  • [GLUTEN-8471][VL] Fix usage of uninitialized variables by @jkhaliqi in #8470
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_11) by @GlutenPerfBot in #8507
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_12) by @GlutenPerfBot in #8508
  • [GLUTEN-8497][VL] A bad test case that fails columnar table cache query by @zhztheplayer in #8498
  • [DOC] Update README.md by @PHILO-HE in #8444
  • [GLUTEN-8319][VL] Support date_format Spark function by @PHILO-HE in #8323
  • [GLUTEN-8487][VL] adding JDK11 based Centos8 image by @zhouyuan in #8513
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_14) by @GlutenPerfBot in #8522
  • [GLUTEN-8020][VL] Remove the libhdfs3 installation script required for static linking by @JkSelf in #8013
  • [GLUTEN-8532][VL] Fix parenthesis within macro by @jkhaliqi in #8533
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_15) by @GlutenPerfBot in #8536
  • [CORE] Use RAS's cost model for legacy transition planner to evaluate cost of transitions by @zhztheplayer in #8527
  • [GLUTEN-8487][VL] adding JDK17 based Centos8 image (#8513) by @zhouyuan in #8539
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250115) by @kyligence-git in #8537
  • [GLUTEN-8479][CORE][Part-1] Remove unnecessary config by @yikf in #8480
  • [GLUTEN-8520][VL] Fix bitwise operators by @jkhaliqi in #8521
  • [GLUTEN-8524][VL] Fix input output errors by @jkhaliqi in #8525
  • [GLUTEN-6876][VL] update spark 3.5.2 in doc by @FelixYBW in #8543
  • [GLUTEN-8455][VL] Port encrypted file checks to shim layer by @ArnavBalyan in #8501
  • [CORE][VL] Cost model code refactors by @zhztheplayer in #8541
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_16) by @GlutenPerfBot in #8546
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250116) by @kyligence-git in #8544
  • [GLUTEN-8432][CH]Remove duplicate output attributes of aggregate's child by @lgbo-ustc in #8450
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_17) by @GlutenPerfBot in #8553
  • [GLUTEN-8497][CORE] A unified CallInfo API to replace AdaptiveContext by @zhztheplayer in #8551
  • [GLUTEN-8529][CH]Fix get_json_object when path has asterisk by @KevinyhZou in #8540
  • [MINOR] Fix comment of function VeloxAggregateFunctionsBuilder.create by @zml1206 in #8549
  • [CORE] Optimize duplicated code for create rel node by @zml1206 in #8548
  • [GLUTEN-7706][CORE] Support Spark-344 + JDK17 by @zhouyuan in #7789
  • [GLUTEN-8475][VL] Fix C-style casts to C++-style by @jkhaliqi in #8474
  • [GLUTEN-8534][VL] Fix allowing loops to iterate beyond end of array by @jkhaliqi in #8535
  • [GLUTEN-8538][VL] Fix incorrect calculation of buffer size by @jkhaliqi in #8542
  • [CORE][CH] Support MicroBatchScanExec with KafkaScan in batch mode by @loneylee in #8321
  • [CORE][MIRROR] Change config.defaultValue.get.toString to config.defaultValueString by @jackylee-ch in #8572
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_18) by @GlutenPerfBot in #8561
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_19) by @GlutenPerfBot in #8563
  • [GLUTEN-8406][CH] Replace from_json(s, 'Map<String, String>')[k] with get_json_object(s, '$.k') by @lgbo-ustc in #8409
  • [GLUTEN-8479][CORE][Part-2] All configurations should be defined through ConfigEntry by @yikf in #8559
  • [VL] CMake configuration cleanup to remove variable VELOX_COMPONENTS_PATH by @zhztheplayer in #8579
  • [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250121) by @kyligence-git in #8577
  • [DOC] Fix outdated operators in documentation by @ArnavBalyan in #8582
  • [GLUTEN-8379][VL] Support query trace by @jinchengchenghh in #8380
  • [GLUTEN-8266][VL][CI] Pre-install uniffle in docker image by @zhouyuan in #8578
  • [VL] Update the Scaladoc of Component API by @zhztheplayer in #8589
  • [GLUTEN-8455][VL] Support encrypted parquet fallback for 3.5 by @ArnavBalyan in #8560
  • [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_22) by @GlutenPerfBot in #8587
  • [GLUTEN-8580][CORE][Part-1] Clean up unnecessary code related to input file expression by @zml1206 in #8584
  • [GLUTEN-8379][VL] Fix typo in query trace document by @jinchengchenghh in https://github.com...
Read more

v1.3.0

24 Jan 02:26
646329d
Compare
Choose a tag to compare

Release Notes - Gluten version 1.3.0

Highlights

  • Spark 3.2.2/3.3.1/3.4.3(upgraded)/3.5.2(upgraded)
  • 268+ spark functions including json
  • Update OAP's Velox codebase to 2025/01/07
  • Join: Sort Merge Join support
  • Shuffle: Sort based Shuffle(Row)
  • Query Plan: RAS Optimization
  • Datalake: Hudi 0.15.0 support/Iceberg 1.5.0/Delta 3.2.0
  • RSS: Celeborn 0.5.2/Uniffle 0.9.1
  • File Format: CSV support via arrow
  • JVM libhdfs with viewfs/kerberos support
  • Partial Project(UDF) support
  • Mix backend refactor
  • Bucket write in partitioned Hive table
  • CI/Nightly Package Tools Update
  • Build & Compile Tools Update(recommend to use vcpkg with static build)
  • Fix several result mismatch issues
  • Fix OOM/Yarn Kill unstable issues

What's Changed

Read more

v1.3.0-rc0

16 Jan 12:22
646329d
Compare
Choose a tag to compare
v1.3.0-rc0 Pre-release
Pre-release

Release Notes - Gluten version 1.3.0-rc0

Highlights

  • Spark 3.2.2/3.3.1/3.4.3(upgraded)/3.5.2(upgraded)
  • 268+ spark functions including json
  • Update OAP's Velox codebase to 2025/01/07
  • Join: Sort Merge Join support
  • Shuffle: Sort based Shuffle(Row)
  • Query Plan: RAS Optimization
  • Datalake: Hudi 0.15.0 support/Iceberg 1.5.0/Delta 3.2.0
  • RSS: Celeborn 0.5.2/Uniffle 0.9.1
  • File Format: CSV support via arrow
  • JVM libhdfs with viewfs/kerberos support
  • Partial Project(UDF) support
  • Mix backend refactor
  • Bucket write in partitioned Hive table
  • CI/Nightly Package Tools Update
  • Build & Compile Tools Update(recommend to use vcpkg with static build)
  • Fix several result mismatch issues
  • Fix OOM/Yarn Kill unstable issues

What's Changed

Read more

v1.3.0-preview

07 Jan 12:30
fe02fee
Compare
Choose a tag to compare
v1.3.0-preview Pre-release
Pre-release

What's Changed

Read more

v1.2.1

12 Dec 12:28
1a50a68
Compare
Choose a tag to compare

Highlight

  • 3 Shuffle, Spill related bug fix
  • 5 RSS(Celeborn, Uniffle) related bug fix
  • 4 Compile & Package related bug fix
  • 10 CI/CD related bug fix
  • Move to use OAP's Velox v1.2.2
  • 4 major issue fixed in OAP's Velox
  • More minor bug fix, please check below full list

What's Changed

Full Changelog: v1.2.0...v1.2.1

v1.2.1-rc0

29 Nov 16:42
1a50a68
Compare
Choose a tag to compare
v1.2.1-rc0 Pre-release
Pre-release

What's Changed

Full Changelog: v1.2.0...v1.2.1-rc0