feat: add distributed zonemap index build#473
Open
beinan wants to merge 4 commits intolance-format:mainfrom
Open
feat: add distributed zonemap index build#473beinan wants to merge 4 commits intolance-format:mainfrom
beinan wants to merge 4 commits intolance-format:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a distributed zonemap index build path for in .
It keeps the SQL surface unchanged, but changes the execution strategy from the driver-only/single-process flow to Spark-side fragment processing and logical segment commit.
This is intended as the distributed alternative to #466, which stays open as the simpler single-process solution.
What Changed
Validation
Tested with Spark 4.0 / Scala 2.13:
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.5_2.12:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 73, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.4_2.12:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 76, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.5_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 77, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.4_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 78, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-4.0_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 79, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-4.1_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 79, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] lance-spark-root [pom]
[INFO] lance-spark-base_2.13 [jar]
[INFO] lance-spark-4.0_2.13 [jar]
[INFO]
[INFO] ---------------------< org.lance:lance-spark-root >---------------------
[INFO] Building lance-spark-root 0.4.0-beta.3 [1/3]
[INFO] from pom.xml
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- checkstyle:3.3.1:check (validate) @ lance-spark-root ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Starting audit...
Audit done.
[INFO] You have 0 Checkstyle violations.
[INFO]
[INFO] --- spotless:2.43.0:apply (spotless-check) @ lance-spark-root ---
[INFO]
[INFO] ------------------< org.lance:lance-spark-base_2.13 >-------------------
[INFO] Building lance-spark-base_2.13 0.4.0-beta.3 [2/3]
[INFO] from lance-spark-base_2.13/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[WARNING] 2 problems were encountered while building the effective model for org.apache.yetus:audience-annotations:jar:0.5.0 during dependency collection step for project (use -X to see details)
[INFO]
[INFO] --- checkstyle:3.3.1:check (validate) @ lance-spark-base_2.13 ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Starting audit...
Audit done.
[INFO] You have 0 Checkstyle violations.
[INFO]
[INFO] --- spotless:2.43.0:apply (spotless-check) @ lance-spark-base_2.13 ---
[INFO]
[INFO] --- build-helper:3.2.0:add-source (add-java-source) @ lance-spark-base_2.13 ---
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/main/java added.
[INFO]
[INFO] --- resources:3.0.2:resources (default-resources) @ lance-spark-base_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /home/beinanwang/o/lance-spark/lance-spark-base_2.13/src/main/resources
[INFO]
[INFO] --- scala:4.9.10:add-source (scala-compile-first) @ lance-spark-base_2.13 ---
[INFO]
[INFO] --- scala:4.9.10:compile (scala-compile-first) @ lance-spark-base_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 2.3 s
[INFO]
[INFO] --- compiler:3.8.1:compile (default-compile) @ lance-spark-base_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- compiler:3.8.1:compile (default) @ lance-spark-base_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- resources:3.0.2:testResources (default-testResources) @ lance-spark-base_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 70 resources
[INFO]
[INFO] --- scala:4.9.10:testCompile (scala-test-compile) @ lance-spark-base_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 0.1 s
[INFO]
[INFO] --- compiler:3.8.1:testCompile (default-testCompile) @ lance-spark-base_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- surefire:3.2.5:test (default-test) @ lance-spark-base_2.13 ---
[INFO]
[INFO] -------------------< org.lance:lance-spark-4.0_2.13 >-------------------
[INFO] Building lance-spark-4.0_2.13 0.4.0-beta.3 [3/3]
[INFO] from lance-spark-4.0_2.13/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- checkstyle:3.3.1:check (validate) @ lance-spark-4.0_2.13 ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Starting audit...
Audit done.
[INFO] You have 0 Checkstyle violations.
[INFO]
[INFO] --- spotless:2.43.0:apply (spotless-check) @ lance-spark-4.0_2.13 ---
[INFO] Spotless.Java is keeping 1 files clean - 0 were changed to be clean, 0 were already clean, 1 were skipped because caching determined they were already clean
[INFO] Spotless.Scala is keeping 3 files clean - 0 were changed to be clean, 0 were already clean, 3 were skipped because caching determined they were already clean
[INFO]
[INFO] --- antlr4:4.13.1:antlr4 (default) @ lance-spark-4.0_2.13 ---
[INFO] No grammars to process
[INFO] ANTLR 4: Processing source directory /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/main/antlr4
[INFO]
[INFO] --- build-helper:3.2.0:add-source (add-java-source) @ lance-spark-4.0_2.13 ---
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/main/java added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-3.5_2.12/src/main/java added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/src/main/java added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/src/main/scala added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/target/generated-sources/antlr4 added.
[INFO]
[INFO] --- resources:3.0.2:resources (default-resources) @ lance-spark-4.0_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 1 resource
[INFO]
[INFO] --- scala:4.9.10:compile (scala-compile-first) @ lance-spark-4.0_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 0.4 s
[INFO]
[INFO] --- compiler:3.8.1:compile (default-compile) @ lance-spark-4.0_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- build-helper:3.2.0:add-test-source (add-java-test-source) @ lance-spark-4.0_2.13 ---
[INFO] Test Source directory: /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/test/java added.
[INFO] Test Source directory: /home/beinanwang/o/lance-spark/lance-spark-3.5_2.12/src/test/java added.
[INFO] Test Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/src/test/java added.
[INFO]
[INFO] --- resources:3.0.2:testResources (default-testResources) @ lance-spark-4.0_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 70 resources
[INFO]
[INFO] --- scala:4.9.10:testCompile (scala-test-compile) @ lance-spark-4.0_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 0.1 s
[INFO]
[INFO] --- compiler:3.8.1:testCompile (default-testCompile) @ lance-spark-4.0_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- surefire:3.2.5:test (default-test) @ lance-spark-4.0_2.13 ---
[INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO]
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.lance.spark.update.CreateIndexStandardSyntaxTest
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.235 s -- in org.lance.spark.update.CreateIndexStandardSyntaxTest
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for lance-spark-root 0.4.0-beta.3:
[INFO]
[INFO] lance-spark-root ................................... SUCCESS [ 1.645 s]
[INFO] lance-spark-base_2.13 .............................. SUCCESS [ 4.527 s]
[INFO] lance-spark-4.0_2.13 ............................... SUCCESS [ 12.492 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 18.830 s
[INFO] Finished at: 2026-04-23T01:00:54Z
[INFO] ------------------------------------------------------------------------
Notes