Handle VLM pay-per-token rate limit errors

**Is your feature request related to a problem? Please describe.**
While running PHI Detection against pay per token endpoints, it's easy to get rate limit errors (429).
Please make the operations more robust when encountering these errors.

**Describe the solution you'd like**
1. Make the job re-runnable, to fill in the gaps by inferencing only on the images that failed.
2. Perform a API re-try (up to a reasonable, tunable limit). The retrys should be performed with an exponential backoff in the time delay. E.g. 1 sec, 2 sec, 4 sec, 8 second, ...

**Describe alternatives you've considered**
Provisioned through-put performs at a higher rate, being able to perfectly model when this is required isn't easy. We want to make it easy on the user.

**Additional context**
Add any other context or screenshots about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle VLM pay-per-token rate limit errors #133

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handle VLM pay-per-token rate limit errors #133

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions