Skip to content

Handle VLM pay-per-token rate limit errors #133

@dmoore247

Description

@dmoore247

Is your feature request related to a problem? Please describe.
While running PHI Detection against pay per token endpoints, it's easy to get rate limit errors (429).
Please make the operations more robust when encountering these errors.

Describe the solution you'd like

  1. Make the job re-runnable, to fill in the gaps by inferencing only on the images that failed.
  2. Perform a API re-try (up to a reasonable, tunable limit). The retrys should be performed with an exponential backoff in the time delay. E.g. 1 sec, 2 sec, 4 sec, 8 second, ...

Describe alternatives you've considered
Provisioned through-put performs at a higher rate, being able to perfectly model when this is required isn't easy. We want to make it easy on the user.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions