Batched inference API and support for float16 inference #279

salvaba94 · 2024-01-27T13:13:29Z

This branch adds support for half precision inference and a batched inference API (BatchedModel). Additionally, it includes a short demo showing how to use this API.

ZachOBrien · 2024-02-07T17:47:11Z

@salvaba94 I ran this batch inference demo but did not see any performance benefit to batching. Batch size 1 took 223ms on average, batch size 2 took 447ms on average, etc. It scaled linearly in the batch size.

Did you observe the same behavior? Or were you able to get better throughput via batching?

salvaba94 · 2024-02-10T09:57:17Z

Hi @ZachOBrien, I've just checked it and yes, I see marginal improvements by using batching.

Here are the results:

Batch 1 and float32: 363 ms
Batch 2 and float32: 668 ms
Batch 1 and float16: 217 ms
Batch 2 and float16: 351 ms

I guess the improvement depends on the GPU (this was tested with RTX 2060).

KevinfromTJ · 2024-07-14T02:59:55Z

did you encounter problems like #294 ?

Batched inference API and support for float16 inference

ace383e

rentainhe requested a review from SlongLiu January 28, 2024 03:02

Merge branch 'IDEA-Research:main' into batched_float16_inference

bffa375

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batched inference API and support for float16 inference #279

Batched inference API and support for float16 inference #279

Uh oh!

salvaba94 commented Jan 27, 2024

Uh oh!

ZachOBrien commented Feb 7, 2024

Uh oh!

salvaba94 commented Feb 10, 2024

Uh oh!

KevinfromTJ commented Jul 14, 2024

Uh oh!

Uh oh!

Batched inference API and support for float16 inference #279

Are you sure you want to change the base?

Batched inference API and support for float16 inference #279

Uh oh!

Conversation

salvaba94 commented Jan 27, 2024

Uh oh!

ZachOBrien commented Feb 7, 2024

Uh oh!

salvaba94 commented Feb 10, 2024

Uh oh!

KevinfromTJ commented Jul 14, 2024

Uh oh!

Uh oh!