Skip to content

Bulk job renaming #2972

@jtherrmann

Description

@jtherrmann

We want to support bulk job renaming. Specifically, Vertex should not have to make one API request per job, as this results in potentially thousands of API requests for bulk renaming operations. In discussions with @williamh890, we decided that the new API endpoint should accept a list of job IDs and a new name value. I recommended that we use one TransactWriteItems call per HyP3 API request, so that our API endpoint is guaranteed to be transactional. I believe I incorrectly stated that this would limit us to 25 items per request, but it's actually 100.

Unfortunately, I'm not sure TransactWriteItems is appropriate for bulk item updates. Efficient bulk operations says of TransactWriteItems:

Trade-off – Additional throughput is consumed, 2 WCUs per 1 KB write.

where WCU is a write capacity unit (see here). I'm not sure whether this means that TransactWriteItems updates are twice as slow, or just twice as expensive (billed at twice the cost in AWS). I think it would at least make it more likely that we get throttled, which would result in the operation being slower.

That page also lists BatchExecuteStatement with PartiQL for updating 25 items, though I'm not sure how much efficiency we'd gain or if it would be worth the additional code complexity.

There's also BatchWriteItem for writing 25 items, but does not support updates (only overwrites existing items) and that page recommends UpdateItem instead.

I'm leaning toward just looping through and calling UpdateItem one item at a time. It seems like this should work for renaming thousands of jobs, not sure beyond that. I'm also not sure if we have to manually handle throttling (see the Errors section), but as far as I know we don't worry about this when submitting thousands of jobs, so it's probably fine?

I think the main issue with an update loop is that the HyP3 API request won't be transactional, so we'll have to report partial updates somehow and the client will have to handle that. But I don't think we can get around that if we're not using TransactWriteItems? (Is PartiQL's BatchExecuteStatement transactional?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions