Release Notes - Gluten version 1.4.0
Highlights
- Spark 3.2.2/3.3.1/3.4.4(upgraded)/3.5.2
- Add more spark functions support including date_format, make_date, map_filter, map_concat, from_json, btrim, array_append, and more
- Add more spark operators support including Range, CollectLimit, and more
- Update OAP's Velox codebase to 2025/05/12
- Join optimizations: BNLJ full outer join
- Shuffle optimizations: RSS ShuffleReader optimization and bug fixing
- RSS: Celeborn 0.5.4(upgraded)/Uniffle 0.9.2(upgraded)
- Query Plan: RAS cost model optimizations and refactor
- Datalake: Add Iceberg/Hudi in test
- CI: Docker image and JDK version update
- Support dynamically adjust Stage Resource Profile
- Support Query Trace
- Add Qualification Tool
- Fix OOM issues for some untracked memory
What's Changed
- [GLUTEN-8327][CORE][Part-3] Introduce the
ConfigEntry
to gluten config by @yikf in #8431 - [VL] Fix wrong warning of "Memory overhead is set to ..." under default Spark config settings by @zhztheplayer in #8448
- [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. by @yikf in #8386
- Revert "[CH] Disable gluten arm ci" by @lwz9103 in #8460
- [GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer by @ArnavBalyan in #8454
- [CH] Add tools to dump ActionsDAG into tree graph by @taiyang-li in #8461
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_08) by @GlutenPerfBot in #8457
- [VL] Update document of build gluten in Docker by @FelixYBW in #8459
- [GLUTEN-8462][CORE] Raise a meaningful error when no component is found from classpath by @zhztheplayer in #8468
- [GLUTEN-8453][VL] Follow-up to #8454 to add a
ensureVeloxBatch
API for limited use cases by @zhztheplayer in #8463 - [VL] Refactor Velox.md by @FelixYBW in #8478
- [GLUTEN-8465] [VL] Bump Celeborn to 0.5.3 by @SteNicholas in #8467
- [GLUTEN-8455][VL] Fallback Scan for Encrypted Parquet Files by @ArnavBalyan in #8456
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_09) by @GlutenPerfBot in #8472
- [CORE] Refactor columnar noop write rule by @jackylee-ch in #8422
- [GLUTEN-8462][CH] Fixed the loading of Components and Backend by @gleonSun in #8464
- [GLUTEN-8414][VL] Override doCanonicalize in ColumnarPartialProjectEx… by @lifulong in #8415
- [GLUTEN-8397][CH][Part-2] Fix statica_cast failed on macos by @yxheartipp in #8485
- [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it by @KevinyhZou in #8351
- [GLUTEN-8481][VL] Clean up shuffle reader cpp code by @marin-ma in #8482
- [Core] Bump version to 1.4.0-SNAPSHOT by @weiting-chen in #8452
- [GLUTEN-8483][CORE] A stable and universal way to find component files by @zhztheplayer in #8486
- [DOC][VL] Fix typo in microbenchmark.md by @marin-ma in #8495
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250110) by @kyligence-git in #8490
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_10) by @GlutenPerfBot in #8489
- [GLUTEN-8476][VL] Fix allocate and free memory by @jkhaliqi in #8477
- [GLUTEN-8503][VL] Fix macro parenthesis CVE by @jkhaliqi in #8504
- [GLUTEN-8471][VL] Fix usage of uninitialized variables by @jkhaliqi in #8470
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_11) by @GlutenPerfBot in #8507
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_12) by @GlutenPerfBot in #8508
- [GLUTEN-8497][VL] A bad test case that fails columnar table cache query by @zhztheplayer in #8498
- [DOC] Update README.md by @PHILO-HE in #8444
- [GLUTEN-8319][VL] Support date_format Spark function by @PHILO-HE in #8323
- [GLUTEN-8487][VL] adding JDK11 based Centos8 image by @zhouyuan in #8513
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_14) by @GlutenPerfBot in #8522
- [GLUTEN-8020][VL] Remove the libhdfs3 installation script required for static linking by @JkSelf in #8013
- [GLUTEN-8532][VL] Fix parenthesis within macro by @jkhaliqi in #8533
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_15) by @GlutenPerfBot in #8536
- [CORE] Use RAS's cost model for legacy transition planner to evaluate cost of transitions by @zhztheplayer in #8527
- [GLUTEN-8487][VL] adding JDK17 based Centos8 image (#8513) by @zhouyuan in #8539
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250115) by @kyligence-git in #8537
- [GLUTEN-8479][CORE][Part-1] Remove unnecessary config by @yikf in #8480
- [GLUTEN-8520][VL] Fix bitwise operators by @jkhaliqi in #8521
- [GLUTEN-8524][VL] Fix input output errors by @jkhaliqi in #8525
- [GLUTEN-6876][VL] update spark 3.5.2 in doc by @FelixYBW in #8543
- [GLUTEN-8455][VL] Port encrypted file checks to shim layer by @ArnavBalyan in #8501
- [CORE][VL] Cost model code refactors by @zhztheplayer in #8541
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_16) by @GlutenPerfBot in #8546
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250116) by @kyligence-git in #8544
- [GLUTEN-8432][CH]Remove duplicate output attributes of aggregate's child by @lgbo-ustc in #8450
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_17) by @GlutenPerfBot in #8553
- [GLUTEN-8497][CORE] A unified CallInfo API to replace AdaptiveContext by @zhztheplayer in #8551
- [GLUTEN-8529][CH]Fix get_json_object when path has asterisk by @KevinyhZou in #8540
- [MINOR] Fix comment of function VeloxAggregateFunctionsBuilder.create by @zml1206 in #8549
- [CORE] Optimize duplicated code for create rel node by @zml1206 in #8548
- [GLUTEN-7706][CORE] Support Spark-344 + JDK17 by @zhouyuan in #7789
- [GLUTEN-8475][VL] Fix C-style casts to C++-style by @jkhaliqi in #8474
- [GLUTEN-8534][VL] Fix allowing loops to iterate beyond end of array by @jkhaliqi in #8535
- [GLUTEN-8538][VL] Fix incorrect calculation of buffer size by @jkhaliqi in #8542
- [CORE][CH] Support MicroBatchScanExec with KafkaScan in batch mode by @loneylee in #8321
- [CORE][MIRROR] Change config.defaultValue.get.toString to config.defaultValueString by @jackylee-ch in #8572
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_18) by @GlutenPerfBot in #8561
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_19) by @GlutenPerfBot in #8563
- [GLUTEN-8406][CH] Replace
from_json(s, 'Map<String, String>')[k]
withget_json_object(s, '$.k')
by @lgbo-ustc in #8409 - [GLUTEN-8479][CORE][Part-2] All configurations should be defined through ConfigEntry by @yikf in #8559
- [VL] CMake configuration cleanup to remove variable VELOX_COMPONENTS_PATH by @zhztheplayer in #8579
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250121) by @kyligence-git in #8577
- [DOC] Fix outdated operators in documentation by @ArnavBalyan in #8582
- [GLUTEN-8379][VL] Support query trace by @jinchengchenghh in #8380
- [GLUTEN-8266][VL][CI] Pre-install uniffle in docker image by @zhouyuan in #8578
- [VL] Update the Scaladoc of Component API by @zhztheplayer in #8589
- [GLUTEN-8455][VL] Support encrypted parquet fallback for 3.5 by @ArnavBalyan in #8560
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_22) by @GlutenPerfBot in #8587
- [GLUTEN-8580][CORE][Part-1] Clean up unnecessary code related to input file expression by @zml1206 in #8584
- [GLUTEN-8379][VL] Fix typo in query trace document by @jinchengchenghh in #8590
- [GLUTEN-8580][CORE][Part-2] Don't validate project generated by PushDownInputFileExpression by @zml1206 in #8585
- [GLUTEN-3620][VL] Support Range operator for Velox Backend by @ArnavBalyan in #8161
- [CORE] Bump iceberg version of spark 3.3 to 1.5.0 by @j7nhai in #8418
- [GLUTEN-7544][CORE] Add Qualification Tool by @srinivasst in #8484
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_23) by @GlutenPerfBot in #8594
- [GLUTEN-8565][VL] Remove unused code in velox batches by @ArnavBalyan in #8602
- [VL] Remove override of test in GlutenDynamicPartitionPruningSuite by @acvictor in #8575
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_24) by @GlutenPerfBot in #8603
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250124) by @kyligence-git in #8604
- [GLUTEN-8018][VL] Support adjusting stage resource profile dynamically by @zjuwangg in #8209
- [GLUTEN-8410][VL] Support null type in HashAggregate by @WangGuangxin in #8411
- [GLUTEN-8581][VL] Fix Spark legacy date formatter under case insensitive configuration by @weixiuli in #8583
- [GLUTEN-8609][VL] Remove force cleanup in build_velox.sh by @marin-ma in #8610
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_25) by @GlutenPerfBot in #8613
- [CH][DOC] Fix Maven Build Gluten ClickHouse Command by @jlfsdtc in #8622
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_26) by @GlutenPerfBot in #8618
- [GLUTEN-8611][VL] Set VELOX_GFLAGS_TYPE by checking GLUTEN_VCPKG_ENABLED in build_velox.sh by @marin-ma in #8612
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_27) by @GlutenPerfBot in #8625
- [GLUTEN-8627][VL] Fix cpp build and build script on MacOS by @marin-ma in #8628
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_28) by @GlutenPerfBot in #8629
- [VL] Update document, remove the experimental word for spark.gluten.enabled by @FelixYBW in #8615
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_29) by @marin-ma in #8636
- [GLUTEN-8644][CI] Bump version of upload-artifact/download-artifact to v4 by @marin-ma in #8645
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_31) by @GlutenPerfBot in #8642
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250201) by @kyligence-git in #8647
- [VL][MIRROR] Fix build faile don Macos with INSTALL_PREFIX not set by @jackylee-ch in #8654
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_01) by @GlutenPerfBot in #8646
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250204) by @kyligence-git in #8658
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_04) by @GlutenPerfBot in #8657
- [GLUTEN-8623][CH] Support File meta and row index for parquet by @baibaichen in #8624
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_05) by @GlutenPerfBot in #8661
- [GLUTEN-8631][UNIFFLE] Bump Uniffle to 0.9.2 by @SteNicholas in #8632
- [GLUTEN-8655][CH] Refactor: remove clickhouse.lib.path by @baibaichen in #8656
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250205) by @kyligence-git in #8660
- [GLUTEN-8574][VL]CI: adding Spark-344 unit tests on JDK8 and adding Spark-352 unit tests on JDK17 by @zhouyuan in #8591
- [VL] Bump GHA upload/restore action by @zhouyuan in #8672
- [VL] nit: Remove shadowed variables in SubstraitToVeloxPlan.cc by @zhztheplayer in #8677
- Bump junit:junit from 4.12 to 4.13.1 in /tools/qualification-tool by @dependabot in #8667
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250207) by @kyligence-git in #8681
- [VL] Enable make_date function by @zhli1142015 in #8683
- [GLUTEN-8616] [VL] Make filescan limit for encrypted fallback as configurable by @ArnavBalyan in #8621
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_06) by @GlutenPerfBot in #8664
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_07) by @GlutenPerfBot in #8680
- [GLUTEN-8678] Fix jar name on macos by @marin-ma in #8679
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250208) by @kyligence-git in #8688
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_08) by @GlutenPerfBot in #8687
- [GLUTEN-8689][VL] Enable some test cases in GlutenSQLQueryTestSuite by @marin-ma in #8690
- [GLUTEN-8685][VL] Add null check to avoid core dump when rss push partition data size is large by @zjuwangg in #8686
- [GLUTEN-8479][CORE][Part-3] Split backend configs to its corresponding modules by @yikf in #8586
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_09) by @GlutenPerfBot in #8691
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250210) by @kyligence-git in #8695
- [GLUTEN-8598][CH] Fix diff for cast string to long by @exmy in #8701
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_11) by @GlutenPerfBot in #8697
- [CORE-8569][CH] Support DeltaOptimizedWriterTransformer by @loneylee in #8570
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250211) by @kyligence-git in #8698
- [VL] Fix timezone for Parquet timestamp write by @rui-mo in #8317
- [GLUTEN-8675][CH] Rewrite union of multiple aggregates into one by @lgbo-ustc in #8676
- [VL] Skip the velox download if velox_branch not exists by @FelixYBW in #8682
- [GLUTEN-6067] Open spark 35 ut by @baibaichen in #8555
- [GLUEN-8696][CH] Fix arm building of benchmark by @taiyang-li in #8703
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_12) by @GlutenPerfBot in #8711
- [GLUTEN-8528][CH]Support approx_count_distinct by @taiyang-li in #8550
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250213) by @kyligence-git in #8717
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_13) by @GlutenPerfBot in #8719
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250214) by @kyligence-git in #8726
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_14) by @GlutenPerfBot in #8725
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250215) by @kyligence-git in #8735
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_15) by @GlutenPerfBot in #8734
- [GLUTEN-8705][CH] Enable MemorySpillScheduler by @lgbo-ustc in #8706
- [GLUTEN-8492][CH] Offload RangeExec by @taiyang-li in #8518
- [GLUTEN-8749][CH] Explicitly cast input data type for std::min by @yxheartipp in #8750
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_18) by @GlutenPerfBot in #8755
- [GLUTEN-8704][CH] try accelerate some spark* function by optimizing tight loops by @taiyang-li in #8708
- [GLUTEN-8723][CH] Fix slice unexpected exception by @taiyang-li in #8759
- [GLUTEN-8151][CORE] Remove supportRangeExec api by @taiyang-li in #8760
- [VL] Add config for whether to enable spill on Window by @liujiayi771 in #8766
- [GLUTEN-8769][CH] Fix failed uts introduced by approx_count_distinct by @taiyang-li in #8765
- [VL] nit: gluten-it: Fix non-POSIX shell warnings in centos-7-deps.sh by @zhztheplayer in #8753
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_19) by @GlutenPerfBot in #8768
- [GLUTEN-8748][CH] Support function monotonically_increasing_id by @loneylee in #8771
- [VL] Should convert kSpillReadBufferSize and kShuffleSpillDiskWriteBufferSize to number by @boneanxs in #8684
- [VL] Skip PartialProjectRule if spark.gluten.sql.columnar.partial.project is false by @Yohahaha in #8773
- [Gluten-8715][CH] Fix NaN diff by @zhanglistar in #8718
- [GLUTEN-8779][VL][Minor] Code cleanup for native validation by @marin-ma in #8780
- [VL] Add window spill metrics by @liujiayi771 in #8777
- [GLUTEN-8699][CH]Metric for Shuffle Read Deserializer by @loudongfeng in #8700
- [GLUTEN-8761][VL] Fix mode in UnsafeColumnarBuildSideRelation not get properly serialize by @zjuwangg in #8762
- [GLUTEN-8434][CH] Function bloomFilterContains process improvement by @zhanglistar in #8435
- [VL][DOC] Move irrelevant content from VeloxGlutenUI.md by @PHILO-HE in #8786
- [GLUTEN-8668][VL] Support complex type in ColumnarPartialProject by @WangGuangxin in #8669
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250220) by @kyligence-git in #8783
- [WIP][VL] Fix inconsistency issue of PartitionFile path unescaping & GPL issue by @yaooqinn in #8793
- [GLUTEN-8788][CH] Avoid unnecesssary const column materialization by @taiyang-li in #8789
- [CH][CI] Parallel download clickhouse submodule by @lwz9103 in #8790
- [GLUTEN-8709][VL] Support build on openEuler 24.03 LTS with Velox backend by @kevinw66 in #8710
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250221) by @kyligence-git in #8801
- [GLUTEN-8795][CH] Support to use oss with gluten by @yxheartipp in #8796
- [GLUTEN-8738][VL] Update GlutenSQLQueryTestSuite to exclude or overwrite failed queries for Spark3.5 by @marin-ma in #8739
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_22) by @GlutenPerfBot in #8810
- [CH] add test for new native parquet reader by @liuneng1994 in #8797
- [CH][DOC]Fix CMake Debug Configuration by @jlfsdtc in #8815
- [VL] Add support for some Parquet write options to 3.4 / 3.5 to align with 3.2 / 3.3 by @zhztheplayer in #8816
- [CORE] Add iceberg equality delete file proto definition by @liujiayi771 in #8778
- [VL] Minor cleanups by @zhztheplayer in #8824
- [GLUTEN-8794] Support logging as many fallback reasons as possible by @marin-ma in #8798
- [GLUTEN-8721][VL] Native writer should keep the same compression with vanilla if
hive.exec.compress.output
is true by @yikf in #8722 - [GLUTEN-8784][CH] Coalesce union of multiple scan-projects by @lgbo-ustc in #8785
- [VL] Remove resolving ViewFs file path from scan validation by @PHILO-HE in #8829
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250226) by @kyligence-git in #8831
- [DOC] Document how to debug Java/Scala and how to run a Java/Scala unit test by @PHILO-HE in #8841
- [GLUTEN-8846][CH] [Part 0] Support reading Iceberg equality delete files by @baibaichen in #8847
- [GLUTEN-8811][VL]Fix bucket scan when some partitionValue is empty by @jinchengchenghh in #8834
- [CH][Arm] Disable -Wcast-qual warning to avoid errors with const qualifier loss in arm by @yxheartipp in #8850
- Revert "[CORE] Change DISCLAIMER to DISCLAIMER-WIP" by @yaooqinn in #8845
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_02_25) by @GlutenPerfBot in #8826
- [INFRA] Add missing license header for better ASF Policy compliance by @yaooqinn in #8860
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250301) by @kyligence-git in #8863
- [GLUTEN-8738] Update GlutenSQLQueryTestSuite to match with the original file by @marin-ma in #8837
- [VL] Enable spark function map_filter by @j7nhai in #8842
- Revert "[VL] Enable spark function map_filter" by @baibaichen in #8869
- [CH] optimize performance of sparkArraySort by @taiyang-li in #8844
- [VL] Fix broken links for velox-backend-build-in-docker.md by @yaooqinn in #8875
- [GLUTEN-8821][VL][DOC] Update scalar functions support and add automation script by @marin-ma in #8822
- [GLUTEN-8565][VL] Support CollectLimit Operator by @ArnavBalyan in #8566
- [DOC] Fix Gluten UI page title not correct by @zjuwangg in #8879
- [GLUTEN-8872][CH][Part-1] Support Delta Deletion Vectors read for CH backend by @zzcclp in #8873
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_02) by @GlutenPerfBot in #8867
- [CORE] Make scalatest.testFailureIgnore configurable for convenience by @ccat3z in #8878
- [GLUTEN-8859][CH] Take advantage of
compareSubstrings
to compare substrings by @lgbo-ustc in #8874 - [DOC] Add doc about experimental feature using off-heap to store broadcast build relation by @zjuwangg in #8882
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250304) by @kyligence-git in #8887
- [VL] Enable spark function map_filter by @j7nhai in #8883
- [GLUTEN-8894][VL] Fix buffer overflow in jStringToCString on arm by @kevinw66 in #8895
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_04) by @GlutenPerfBot in #8886
- [GLUTEN-5884][VL] change default load quantum to 8M for local SSD cache by @zhouyuan in #8880
- [INFRA] Switch archive.a.o to closer.lua to avoid abuse while fetch spark resources by @yaooqinn in #8881
- [GLUTEN-8836][CH] Fix partition values with escape char by @lwz9103 in #8840
- Revert "[GLUTEN-5884][VL] change default load quantum to 8M for local… by @FelixYBW in #8900
- [GLUTEN-8742][VL] Improve the cast validation logic on native side by @ArnavBalyan in #8743
- [HOTFIX] Fix binary release name for Spark 3.5.2 with Scala 2.13 by @yaooqinn in #8901
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250305) by @kyligence-git in #8899
- [GLUTEN-8909][VL] Allow dynamic configuration for spark.gluten.auto.adjustStageResource.enabled by @zjuwangg in #8910
- [GLUTEN-8905][VL]Ignore some CSV flaky tests by @jinchengchenghh in #8906
- [GLUTEN-8340][VL] Enable from_json function by @zhli1142015 in #8320
- [CORE] Enlarge defaultRecursionLimit by pre-loading the protobuf class by @PHILO-HE in #8904
- [CORE] Post messages to Gluten web UI only when it is enabled by @PHILO-HE in #8907
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250306) by @kyligence-git in #8917
- [VL] Adding nightly release by @zhouyuan in #8915
- [GLUTEN-8903][Function] Support btrim function by @xinghuayu007 in #8903
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_06) by @GlutenPerfBot in #8916
- [VL] Use isType func to check type by @liujiayi771 in #8902
- [GLUTEN-8799][VL]Support Iceberg with Gluten test framework by @jinchengchenghh in #8800
- Add scala suffix to tar command by @yaooqinn in #8918
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250307) by @kyligence-git in #8925
- [GLUTEN-8926][CH] MergeTree Parameter Configuration Optimization to Prevent Multithreading Competition for activeSession Being None by @gleonSun in #8927
- [GLUTEN-8802][VL] Support build static/dynamic docker images for arm by @kevinw66 in #8803
- [GLUTEN-8921][GLUTEN-8922][CH] Fix checkDecimalOverflowSparkOrNull and lead function by @lwz9103 in #8929
- [DOC] Document Stage Level Resource Profile Adjustment feature by @zjuwangg in #8908
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250308) by @kyligence-git in #8936
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_07) by @GlutenPerfBot in #8924
- [CI] fix spark352-scala2.13 test home by @zhouyuan in #8938
- [GLUTEN-8313][VL] Enable json_array_length by @WangGuangxin in #8314
- [GLUTEN-8802][VL] Add specific jdk version for CentOS 8 docker image by @kevinw66 in #8814
- [GLUTEN-8633] [VL] Rewrite tests for Gluten ColumnarRange by @ArnavBalyan in #8634
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_10) by @GlutenPerfBot in #8942
- [GLUTEN-8939] Fix IllegalAccessError when converting viewfs to hdfs by @wangyum in #8940
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_11) by @GlutenPerfBot in #8955
- [INFRA] Derive from Apache Software Foundation Parent POM by @yaooqinn in #8930
- [GLUTEN-8846][CH] [Part 1] Support Positional Deletes by @baibaichen in #8937
- [GLUTEN-8872][CH][Part-2] Support Delta Deletion Vectors read for CH backend by @zzcclp in #8947
- [VL] Enable map_concat function by @rui-mo in #8781
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_12) by @GlutenPerfBot in #8972
- [CH] Simplify parsing substrait struct fields. by @lgbo-ustc in #8976
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250313) by @kyligence-git in #8979
- [VL] Support casting varchar type to timestamp type by @PHILO-HE in #8357
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_13) by @GlutenPerfBot in #8978
- [GLUTEN-8932][VL] Suppport all the Iceberg test in folder source by @jinchengchenghh in #8952
- [GLUTEN-8958][VL] Add offload rules for DeltaProjectExecTransformer and DeltaFilterExecTransformer by @dcoliversun in #8975
- [VL] Update centos setup scripts to install new tzdata by @zhouyuan in #8988
- [GLUTEN-8993][CELEBORN] Bump Celeborn version to 0.5.4 by @jackylee-ch in #8994
- [VL] Support casting integral type to timestamp type by @PHILO-HE in #8593
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_14) by @GlutenPerfBot in #8989
- [GLUTEN-3289][CH]Fix cast float to string by @KevinyhZou in #8092
- [GLUTEN-8846][CH][Part 2] Add the test case for the iceberg MOR table with the equality deletion and the position deletion by @zzcclp in #8992
- [CORE] Avoid ClassNotFoundException when loading a shaded protobuf class by @PHILO-HE in #8996
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250315) by @kyligence-git in #9007
- [VL] Remove unnecessary build options by @PHILO-HE in #9021
- [GLUTEN-8997][CH] Support regular expression delimiters for
str_to_map
by @lgbo-ustc in #8998 - [GLUTEN-9019][VL] Add a check to fall back "cast decimal to timestamp" by @wForget in #9022
- [GLUTEN-3620][VL] RangeExec support for fallback by user options by @ArnavBalyan in #8913
- [GLUTEN-9015][VL] Support array_append function by @dcoliversun in #9016
- [GLUTEN-8949][Core] Simplify synchronization from JniLibLoader by @ArnavBalyan in #8950
- [GLUTEN-9027][VL] Make CI fail fast if native build job fails by @wForget in #9028
- [GLUTEN-8995][CH] Fix column not found in
row_number
query by @KevinyhZou in #8999 - [GLUTEN-9038][CH] Fix array_sort exception by @taiyang-li in #9040
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250318) by @kyligence-git in #9043
- [VL][CI] Use a unified docker image for Spark tests by @PHILO-HE in #8605
- [GLUTEN-8639][VL] Support casting from double/float to timestamp by @ArnavBalyan in #8640
- [VL] Move hudi test package to org.apache.gluten.execution by @liujiayi771 in #9045
- [GLUTEN-9032][CH]
cast
for values built from nothing type by @lgbo-ustc in #9042 - [GLUTEN-8964][VL] Support BNLJ full outer join without condition by @WangGuangxin in #8965
- [GLUTEN-9050][CH] Remove duplicated uri decode which causes runtime exceptions by @taiyang-li in #9051
- [GLUTEN-8872][CH][Part-3] Fix reading bug for the update operation with the deletion vectors by @zzcclp in #9029
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_17) by @GlutenPerfBot in #9017
- [GLUTEN-8945][VL] Pull out duplicate projections for HashProbe and FilterProject by @zml1206 in #8946
- [GLUTEN-8437][VL] Fix the exception when verifying the PrestoPage header during the Presto deserialization process by @kerwin-zk in #9056
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_18) by @GlutenPerfBot in #9033
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250319) by @kyligence-git in #9053
- [TEST] Remove useless param for runAndCompare by @zml1206 in #9048
- [GLUTEN-8306][VL] Support GetStructField with scalar function as input by @WangGuangxin in #8606
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_19) by @GlutenPerfBot in #9063
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250320) by @kyligence-git in #9067
- [VL] Add API for reserving global off-heap memory from Spark by @zhztheplayer in #9066
- [GLUTEN-9010][VL] Fix GlutenCastSuite for Spark 34 and 35 by @ArnavBalyan in #9011
- [GLUTEN-8948][VL] Fallback iceberg delete from scan by @jinchengchenghh in #8987
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_20) by @GlutenPerfBot in #9068
- [GLUTEN-8956][VL] Add support for casting binary to string by @ArnavBalyan in #8957
- [GLUTEN-9044][CH] Fix virtual columns in mergetree table by @lwz9103 in #9047
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_21) by @GlutenPerfBot in #9082
- Revert "[GLUTEN-9032][CH]
cast
for values built from nothing type" by @lgbo-ustc in #9086 - [VL] Minor refactor for cast expression validation by @PHILO-HE in #9084
- [VL] Acquire off-heap global memory via the new API for off-heap broadcast exchange by @zhztheplayer in #9075
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_22) by @GlutenPerfBot in #9100
- [VL][MINOR] Move HLL rewrite check to the beginning of the rule by @Yohahaha in #9071
- [GLUTEN-9078][CORE] Simplify code of SoftAffinity by @WangGuangxin in #9079
- [GLUTEN-8966][VL] Propagate HashAggregate's ignoreNullKeys when possible by @WangGuangxin in #8967
- [CH][DOC] Update Clang to Clang-19 in CH backend by @jlfsdtc in #9110
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_24) by @GlutenPerfBot in #9105
- [GLUTEN-8565][VL] Minor refactor for Columnar CollectLimit by @ArnavBalyan in #9097
- [GLUTEN-9076][VL] Prioritize offloading supported hive udf in ColumnarPartialProject by @WangGuangxin in #9077
- [TESTS] Disable Spark UI in some tests by @yaooqinn in #9109
- [GLUTEN-9049][CH] Fix diff for cast complex type to string by @exmy in #9072
- [GLUTEN-9093][CH] Add TryCastSuite for CH backend Spark 3.4 by @ArnavBalyan in #9094
- [GLUTEN-9123][VL][CI] pin setuptools version in CI by @zhouyuan in #9124
- [GLUTEN-9117][VL] Fix -DBUILD_BENCHMARKS=ON on macos by @marin-ma in #9118
- [GLUTEN-8974][CH] Replace specical
join + aggregate
case withany join
by @lgbo-ustc in #9059 - [GLUTEN-9120][VL][Minor] Remove s3 check for macOS in build script by @marin-ma in #9122
- [GLUTEN-9076][VL][FOLLOWUP] Simplify code of HiveUDF by @WangGuangxin in #9127
- [GLUTEN-9085][CH] Add UT for mergetree write stats by @lwz9103 in #9089
- [VL] Optimize memory allocation for VeloxRssShuffleReader by @kerwin-zk in #9069
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_25) by @GlutenPerfBot in #9130
- [CH] Fix kafka unstable ut by @loneylee in #9131
- [VL][CI] Dump & upload logs for unit test to GitHub artifact by @yaooqinn in #9024
- [VL][MIRROR] Migrate Velox runtime config flags to dynamic VeloxRuntime settings by @jackylee-ch in #9103
- [VL] Enable filter push-down on nested field by @rui-mo in #7946
- [VL] Account some C++ untracked memory allocations into Spark global off-heap memory by @zhztheplayer in #9115
- [GLUTEN-9113][VL] Remove unused
not_equal
function mapping by @kevincmchen in #9114 - [GLUTEN-9083][CH]Fix the nullability missmatch of nothing type by @lgbo-ustc in #9091
- [VL] Rename some
src-*/
source folders tosrc/
by @zhztheplayer in #9134 - [VL] Improve native plan validation code by @PHILO-HE in #9092
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250326) by @kyligence-git in #9136
- [GLUTEN-9057][VL] Avoid flatten unnecessary vector in ColumnarBatch.select by @WangGuangxin in #9058
- [VL] Remove inlined children plan access during native validation of Exchange / CollectLimit by @zhztheplayer in #9145
- [GLUTEN-9039][CH] Improve array_sort performance when only single argument is input by @taiyang-li in #9157
- [VL] Fix reclaim size for shuffle by @yikf in #9143
- [VL][CI] Add test for rss sort shuffle by @kerwin-zk in #9140
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_26) by @GlutenPerfBot in #9135
- [VL] Use SPARK_COMPILE_VERSION instead of hardcoded for SparkShimDescriptor at compile time by @yaooqinn in #9132
- [GLUTEN-9164][CH]Enable row group level bloom filter push down by @taiyang-li in #9165
- [GLUTEN-9148] Fix shuffle file permission issue when using ColumnarShuffleManager by @wangyum in #9156
- [VL] LocalPartitionWriter should discard the evict since it use hash/sort evict by @yikf in #9167
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_27) by @GlutenPerfBot in #9147
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_28) by @GlutenPerfBot in #9161
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_03_29) by @GlutenPerfBot in #9172
- [VL] Fix weekly build job by @PHILO-HE in #9168
- [GLUTEN-9055][CH] Fix input_file_name diff from hive text table by @taiyang-li in #9142
- [1.4][VL] update oap velox to gluten-1.4.0 by @weiting-chen in #9217
- [1.4] preparing v1.4.0-rc0 release by @weiting-chen in #9260
- [branch-1.4] Port PR #9200 #9320 #9368 #9209 #9262 by @weiting-chen in #9431
- [branch-1.4][VL] Fix docker image name for 1.4 branch by @zhouyuan in #9479
- [Branch-1.4][VL] Update Velox branch to make Velox compatible with old tzdata by @PHILO-HE in #9565
- [Branch-1.4][VL] update docker image name for branch-1.4 by @zhouyuan in #9599
- [branch-1.4][VL] Fix docker build by @PHILO-HE in #9607
- [Branch-1.4][VL] Fix code ref in docker build by @PHILO-HE in #9614
- [Branch-1.4][VL] Port fix: Fix build failure due to libelf vcpkg unavailable files (#9550) by @PHILO-HE in #9601
- [Branch-1.4][GLUTEN-9383][VL] Backport: fix leak when growing capacity by @wForget in #9663
- [Branch-1.4][VL] Port patch #9685: add default config.guess and config.sub by @PHILO-HE in #9707
- [Branch-1.4] Port #9851 #9879 to fix release issue by @weiting-chen in #9895
New Contributors
- @gleonSun made their first contribution in #8464
- @lifulong made their first contribution in #8415
- @jkhaliqi made their first contribution in #8477
Full Changelog: v1.3.0...v1.4.0