Skip to content

feat: add distributed zonemap index build#473

Open
beinan wants to merge 4 commits intolance-format:mainfrom
beinan:user/beinan/zonemap-create-index-distributed
Open

feat: add distributed zonemap index build#473
beinan wants to merge 4 commits intolance-format:mainfrom
beinan:user/beinan/zonemap-create-index-distributed

Conversation

@beinan
Copy link
Copy Markdown
Contributor

@beinan beinan commented Apr 23, 2026

Summary

This PR adds a distributed zonemap index build path for in .

It keeps the SQL surface unchanged, but changes the execution strategy from the driver-only/single-process flow to Spark-side fragment processing and logical segment commit.

This is intended as the distributed alternative to #466, which stays open as the simpler single-process solution.

What Changed

  • build zonemap index segments on Spark tasks and commit them through
  • support Spark-side zonemap pruning for segmented zonemap indices
  • batch multiple fragments into a single zonemap segment so task and segment counts stay bounded
  • keep zonemap restricted to single-column indexing
  • update docs and tests for distributed zonemap behavior

Validation

Tested with Spark 4.0 / Scala 2.13:

[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.5_2.12:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 73, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.4_2.12:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 76, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.5_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 77, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.4_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 78, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-4.0_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 79, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-4.1_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 79, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] lance-spark-root [pom]
[INFO] lance-spark-base_2.13 [jar]
[INFO] lance-spark-4.0_2.13 [jar]
[INFO]
[INFO] ---------------------< org.lance:lance-spark-root >---------------------
[INFO] Building lance-spark-root 0.4.0-beta.3 [1/3]
[INFO] from pom.xml
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- checkstyle:3.3.1:check (validate) @ lance-spark-root ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Starting audit...
Audit done.
[INFO] You have 0 Checkstyle violations.
[INFO]
[INFO] --- spotless:2.43.0:apply (spotless-check) @ lance-spark-root ---
[INFO]
[INFO] ------------------< org.lance:lance-spark-base_2.13 >-------------------
[INFO] Building lance-spark-base_2.13 0.4.0-beta.3 [2/3]
[INFO] from lance-spark-base_2.13/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[WARNING] 2 problems were encountered while building the effective model for org.apache.yetus:audience-annotations:jar:0.5.0 during dependency collection step for project (use -X to see details)
[INFO]
[INFO] --- checkstyle:3.3.1:check (validate) @ lance-spark-base_2.13 ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Starting audit...
Audit done.
[INFO] You have 0 Checkstyle violations.
[INFO]
[INFO] --- spotless:2.43.0:apply (spotless-check) @ lance-spark-base_2.13 ---
[INFO]
[INFO] --- build-helper:3.2.0:add-source (add-java-source) @ lance-spark-base_2.13 ---
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/main/java added.
[INFO]
[INFO] --- resources:3.0.2:resources (default-resources) @ lance-spark-base_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /home/beinanwang/o/lance-spark/lance-spark-base_2.13/src/main/resources
[INFO]
[INFO] --- scala:4.9.10:add-source (scala-compile-first) @ lance-spark-base_2.13 ---
[INFO]
[INFO] --- scala:4.9.10:compile (scala-compile-first) @ lance-spark-base_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 2.3 s
[INFO]
[INFO] --- compiler:3.8.1:compile (default-compile) @ lance-spark-base_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- compiler:3.8.1:compile (default) @ lance-spark-base_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- resources:3.0.2:testResources (default-testResources) @ lance-spark-base_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 70 resources
[INFO]
[INFO] --- scala:4.9.10:testCompile (scala-test-compile) @ lance-spark-base_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 0.1 s
[INFO]
[INFO] --- compiler:3.8.1:testCompile (default-testCompile) @ lance-spark-base_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- surefire:3.2.5:test (default-test) @ lance-spark-base_2.13 ---
[INFO]
[INFO] -------------------< org.lance:lance-spark-4.0_2.13 >-------------------
[INFO] Building lance-spark-4.0_2.13 0.4.0-beta.3 [3/3]
[INFO] from lance-spark-4.0_2.13/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- checkstyle:3.3.1:check (validate) @ lance-spark-4.0_2.13 ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Starting audit...
Audit done.
[INFO] You have 0 Checkstyle violations.
[INFO]
[INFO] --- spotless:2.43.0:apply (spotless-check) @ lance-spark-4.0_2.13 ---
[INFO] Spotless.Java is keeping 1 files clean - 0 were changed to be clean, 0 were already clean, 1 were skipped because caching determined they were already clean
[INFO] Spotless.Scala is keeping 3 files clean - 0 were changed to be clean, 0 were already clean, 3 were skipped because caching determined they were already clean
[INFO]
[INFO] --- antlr4:4.13.1:antlr4 (default) @ lance-spark-4.0_2.13 ---
[INFO] No grammars to process
[INFO] ANTLR 4: Processing source directory /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/main/antlr4
[INFO]
[INFO] --- build-helper:3.2.0:add-source (add-java-source) @ lance-spark-4.0_2.13 ---
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/main/java added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-3.5_2.12/src/main/java added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/src/main/java added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/src/main/scala added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/target/generated-sources/antlr4 added.
[INFO]
[INFO] --- resources:3.0.2:resources (default-resources) @ lance-spark-4.0_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 1 resource
[INFO]
[INFO] --- scala:4.9.10:compile (scala-compile-first) @ lance-spark-4.0_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 0.4 s
[INFO]
[INFO] --- compiler:3.8.1:compile (default-compile) @ lance-spark-4.0_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- build-helper:3.2.0:add-test-source (add-java-test-source) @ lance-spark-4.0_2.13 ---
[INFO] Test Source directory: /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/test/java added.
[INFO] Test Source directory: /home/beinanwang/o/lance-spark/lance-spark-3.5_2.12/src/test/java added.
[INFO] Test Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/src/test/java added.
[INFO]
[INFO] --- resources:3.0.2:testResources (default-testResources) @ lance-spark-4.0_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 70 resources
[INFO]
[INFO] --- scala:4.9.10:testCompile (scala-test-compile) @ lance-spark-4.0_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 0.1 s
[INFO]
[INFO] --- compiler:3.8.1:testCompile (default-testCompile) @ lance-spark-4.0_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- surefire:3.2.5:test (default-test) @ lance-spark-4.0_2.13 ---
[INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO]
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.lance.spark.update.CreateIndexStandardSyntaxTest
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.235 s -- in org.lance.spark.update.CreateIndexStandardSyntaxTest
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for lance-spark-root 0.4.0-beta.3:
[INFO]
[INFO] lance-spark-root ................................... SUCCESS [ 1.645 s]
[INFO] lance-spark-base_2.13 .............................. SUCCESS [ 4.527 s]
[INFO] lance-spark-4.0_2.13 ............................... SUCCESS [ 12.492 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 18.830 s
[INFO] Finished at: 2026-04-23T01:00:54Z
[INFO] ------------------------------------------------------------------------

Notes

@github-actions github-actions Bot added the enhancement New feature or request label Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant