[SPARK-56178][SQL] MSCK REPAIR TABLE for V2 file tables by LuciferYang · Pull Request #55129 · apache/spark

LuciferYang · 2026-04-01T03:57:31Z

What changes were proposed in this pull request?

Implement RepairTableExec for V2 file tables to sync filesystem partition directories with the catalog metastore.

Changes:

New RepairTableExec: scans filesystem partitions via FileTable.listPartitionIdentifiers(), compares with catalog, registers missing partitions and drops orphaned entries
DataSourceV2Strategy: route RepairTable and RecoverPartitions for FileTable to the new V2 exec node (non-FileTable V2 tables still throw)

Why are the changes needed?

After SPARK-56175 changed V2SessionCatalog.loadTable to return FileTable instead of V1Table, MSCK REPAIR TABLE and ALTER TABLE RECOVER PARTITIONS on file tables hit the V2 error path (repairTableNotSupportedForV2TablesError). A V2-native implementation is needed.

Does this PR introduce any user-facing change?

No. MSCK REPAIR TABLE and ALTER TABLE RECOVER PARTITIONS continue to work on file tables, now via the V2 path.

How was this patch tested?

Added test in FileDataSourceV2WriteSuite:

Create partitioned table, write data directly to filesystem partition directories, verify catalog has no partitions before repair, run MSCK REPAIR TABLE, verify 3 partitions registered in catalog and data is readable

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

LuciferYang · 2026-04-01T03:58:55Z

The actual change for the current patch is a1742f5, which is the 6th patch in SPARK-56170.

…Frame API writes and delete FallBackFileSourceV2 Key changes: - FileWrite: added partitionSchema, customPartitionLocations, dynamicPartitionOverwrite, isTruncate; path creation and truncate logic; dynamic partition overwrite via FileCommitProtocol - FileTable: createFileWriteBuilder with SupportsDynamicOverwrite and SupportsTruncate; capabilities now include TRUNCATE and OVERWRITE_DYNAMIC; fileIndex skips file existence checks when userSpecifiedSchema is provided (write path) - All file format writes (Parquet, ORC, CSV, JSON, Text, Avro) use createFileWriteBuilder with partition/truncate/overwrite support - DataFrameWriter.lookupV2Provider: enabled FileDataSourceV2 for non-partitioned Append and Overwrite via df.write.save(path) - DataFrameWriter.insertInto: V1 fallback for file sources (TODO: SPARK-56175) - DataFrameWriter.saveAsTable: V1 fallback for file sources (TODO: SPARK-56230, needs StagingTableCatalog) - DataSourceV2Utils.getTableProvider: V1 fallback for file sources (TODO: SPARK-56175) - Removed FallBackFileSourceV2 rule - V2SessionCatalog.createTable: V1 FileFormat data type validation

…catalog table loading, and gate removal Key changes: - FileTable extends SupportsPartitionManagement with createPartition, dropPartition, listPartitionIdentifiers, partitionSchema - Partition operations sync to catalog metastore (best-effort) - V2SessionCatalog.loadTable returns FileTable instead of V1Table, sets catalogTable and useCatalogFileIndex on FileTable - V2SessionCatalog.getDataSourceOptions includes storage.properties for proper option propagation (header, ORC bloom filter, etc.) - V2SessionCatalog.createTable validates data types via FileTable - FileTable.columns() restores NOT NULL constraints from catalogTable - FileTable.partitioning() falls back to userSpecifiedPartitioning or catalog partition columns - FileTable.fileIndex uses CatalogFileIndex when catalog has registered partitions (custom partition locations) - FileTable.schema checks column name duplication for non-catalog tables only - DataSourceV2Utils.getTableProvider: removed FileDataSourceV2 gate - DataFrameWriter.insertInto: enabled V2 for file sources - DataFrameWriter.saveAsTable: V1 fallback (TODO: SPARK-56230) - ResolveSessionCatalog: V1 fallback for FileTable-backed commands (AnalyzeTable, AnalyzeColumn, TruncateTable, TruncatePartition, ShowPartitions, RecoverPartitions, AddPartitions, RenamePartitions, DropPartitions, SetTableLocation, CREATE TABLE validation, REPLACE TABLE blocking) - FindDataSourceTable: streaming V1 fallback for FileTable (TODO: SPARK-56233) - DataSource.planForWritingFileFormat: graceful V2 handling

…ion to FileScan

Enable bucketed writes for V2 file tables via catalog BucketSpec. Key changes: - FileWrite: add bucketSpec field, use V1WritesUtils.getWriterBucketSpec() instead of hardcoded None - FileTable: createFileWriteBuilder passes catalogTable.bucketSpec to the write pipeline - FileDataSourceV2: getTable uses collect to skip BucketTransform (handled via catalogTable.bucketSpec instead) - FileWriterFactory: use DynamicPartitionDataConcurrentWriter for bucketed writes since V2's RequiresDistributionAndOrdering cannot express hash-based ordering - All 6 format Write/Table classes updated with BucketSpec parameter Note: bucket pruning and bucket join (read-path optimization) are not included in this patch (tracked under SPARK-56231).

Add RepairTableExec to sync filesystem partition directories with catalog metastore for V2 file tables. Key changes: - New RepairTableExec: scans filesystem partitions via FileTable.listPartitionIdentifiers(), compares with catalog, registers missing partitions and drops orphaned entries - DataSourceV2Strategy: route RepairTable and RecoverPartitions for FileTable to new V2 exec node

LuciferYang marked this pull request as draft April 1, 2026 03:57

LuciferYang force-pushed the SPARK-56178 branch from a1742f5 to 9895ebc Compare April 2, 2026 05:22

LuciferYang added 5 commits April 2, 2026 14:11

[SPARK-56174][SQL] Complete V2 file write path for DataFrame API

0d24030

[SPARK-56176][SQL] V2-native ANALYZE TABLE/COLUMN with stats propagat…

ab4d091

…ion to FileScan

LuciferYang force-pushed the SPARK-56178 branch from 9895ebc to 90a506c Compare April 2, 2026 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56178][SQL] MSCK REPAIR TABLE for V2 file tables#55129

[SPARK-56178][SQL] MSCK REPAIR TABLE for V2 file tables#55129
LuciferYang wants to merge 6 commits intoapache:masterfrom
LuciferYang:SPARK-56178

LuciferYang commented Apr 1, 2026

Uh oh!

LuciferYang commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented Apr 1, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant