Skip to content

ice: Added support for deleting partitions #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jul 9, 2025
Merged

Conversation

subkanthi
Copy link
Collaborator

@subkanthi subkanthi commented May 12, 2025

Delete partitions.
closes: #7

@subkanthi subkanthi linked an issue May 12, 2025 that may be closed by this pull request
@subkanthi subkanthi changed the title ice: Added support for positional delete file. ice: Added support for deleting partitions May 15, 2025
@subkanthi subkanthi marked this pull request as ready for review May 15, 2025 18:45
@subkanthi subkanthi requested a review from shyiko May 15, 2025 18:45
@subkanthi
Copy link
Collaborator Author

subkanthi commented May 15, 2025

spark-sql (nyc)> select count(*) from taxismay15 where tpep_pickup_datetime='2025-01-01';
2
ice delete-file --namespace=nyc --table=taxismay15 --partition='[{"partition_name": "tpep_pickup_datetime", "value": "2025-01-01T00:00:00"}]'
spark-sql (nyc)> select count(*) from taxismay15 where tpep_pickup_datetime='2025-01-01';
0

@@ -50,6 +50,9 @@ ice create-table flowers.iris_no_copy --schema-from-parquet=file://iris.parquet
local-mc cp iris.parquet local/bucket1/flowers/iris_no_copy/
ice insert flowers.iris_no_copy --no-copy s3://bucket1/flowers/iris_no_copy/iris.parquet

# delete partition
ice delete-file --namespace=nyc --table=taxismay15 --partition='[{"partition_name": "tpep_pickup_datetime", "value": "2025-01-01T00:00:00"}]'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/delete-file/delete to be consistent with ice insert + we're not deleting "file" here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/partition_name/name

for (com.altinity.ice.cli.Main.PartitionFilter pf : partitions) {
org.apache.iceberg.expressions.Expression e =
org.apache.iceberg.expressions.Expressions.equal(pf.partitionName(), pf.value());
expr = (expr == null) ? e : org.apache.iceberg.expressions.Expressions.and(expr, e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be "or" instead of "and"? I think it's time to add tests

@CommandLine.Option(
names = {"--partition"},
description =
"JSON array of partition filters: [{\"partition_name\": \"vendorId\", \"value\": 5}]. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if there was a way to provide a list of values without having to repeat partition name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition: values

RESTCatalog catalog,
String namespace,
String tableName,
List<com.altinity.ice.cli.Main.PartitionFilter> partitions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supporting dryRun (that would print all matched files) would also be nice

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

void deletePartition(
@CommandLine.Option(names = "--namespace", description = "Namespace name", required = true)
String namespace,
@CommandLine.Option(names = "--table", description = "Table name", required = true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifying ns & table via --namespace & --table is inconsistent with the rest of the commands: ice insert, describe, etc. all expect ns.table as an arg.

private Server adminServer;

@SuppressWarnings("rawtypes")
private final GenericContainer etcd =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets move RESTCatalogIT to a separate PR. There are still a few things to iron out here:

}
TableIdentifier tableId = TableIdentifier.parse(name);

DeletePartition.run(catalog, tableId, partitions, dryRun);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/DeletePartition/Delete

@@ -426,6 +428,42 @@ void deleteNamespace(
}
}

@CommandLine.Command(name = "delete", description = "Delete Partition(s).")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@CommandLine.Command(name = "delete", description = "Delete Partition(s).")
@CommandLine.Command(name = "delete", description = ""Delete data from catalog.")

@@ -426,6 +428,42 @@ void deleteNamespace(
}
}

@CommandLine.Command(name = "delete", description = "Delete Partition(s).")
void deletePartition(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/deletePartition/delete

@CommandLine.Option(
names = "--dry-run",
description = "Log files that would be deleted without actually deleting them")
boolean dryRun)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth it to have it true be default considering the nature of the operation

@subkanthi
Copy link
Collaborator Author

Testing:
delete creates a new avro file in metadata.
image

@subkanthi subkanthi requested a review from shyiko June 29, 2025 21:25
}
scan = scan.filter(expr);
}
Iterable<FileScanTask> tasks = scan.planFiles();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan.planFiles returns CloseableIterable which is supposed to be closed

@subkanthi subkanthi requested a review from shyiko June 30, 2025 17:47
for (FileScanTask task : tasks) {
filesToDelete.add(task.file());
}
tasks.close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if tasks.next() throws?

@subkanthi subkanthi requested a review from shyiko July 1, 2025 21:06
filesToDelete.add(task.file());
}
} catch (Exception e) {
logger.error("Error getting files to delete: {}", e.getMessage());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filesToDelete.add(task.file());
}
} catch (Exception e) {
throw e;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point to

catch (Exception e) {
      throw e;
}

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the try-with-resources require catch block, I was wrong

@subkanthi subkanthi requested a review from shyiko July 8, 2025 21:36
@subkanthi subkanthi merged commit 50ee773 into master Jul 9, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ice: Support deletes
2 participants