Feature Request: to enable real batch for multiple images input of VLM

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

I am running VLM like Qwen2.5-VL with multiple images and one prompt inference request to let the VLM do the video likely understadning.

But I find the log and the code show that llamaCPP does not support real batch processing for multiple images. Accutly multiple-images one prompt (video like) inference is real use case for Auto and Robot which need good TTFT and E2E. So, it is vaule to implement the batch processing even with bigger cgraph.

```python

bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_image_f32_batch * imgs_c_ptr, float * vec) {
    const clip_image_f32_batch & imgs = *imgs_c_ptr;
    int batch_size = imgs.entries.size();

    // TODO @ngxson : implement batch size > 1 as a loop
    //                we don't need true batching support because the cgraph will gonna be big anyway
    if (batch_size != 1) {
        return false; // only support batch size of 1
    }

``

### Motivation

llama serve log of 2 images with 1 prompt inference request. 


```shell
srv  update_slots: all slots are idle
srv  params_from_: Chat format: Content-only
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 1365, n_keep = 0, n_prompt_tokens = 29
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 18, n_tokens = 18, progress = 0.620690
slot update_slots: id  0 | task 0 | kv cache rm [18, end)
srv  process_chun: processing image...
encoding image slice...
image slice encoded in 600 ms
decoding image batch 1/1, n_tokens_batch = 208
image decoded (batch 1/1) in 4 ms
srv  process_chun: image processed in 604 ms
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 22, n_tokens = 3, progress = 0.758621
slot update_slots: id  0 | task 0 | kv cache rm [22, end)
srv  process_chun: processing image...
encoding image slice...
image slice encoded in 470 ms
decoding image batch 1/1, n_tokens_batch = 208
image decoded (batch 1/1) in 3 ms
srv  process_chun: image processed in 473 ms
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 29, n_tokens = 6, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 29, n_tokens = 6
slot      release: id  0 | task 0 | stop processing: n_past = 50, truncated = 0
slot print_timing: id  0 | task 0 |
prompt eval time =    2124.96 ms /    29 tokens (   73.27 ms per token,    13.65 tokens per second)
       eval time =     610.92 ms /    22 tokens (   27.77 ms per token,    36.01 tokens per second)
```

### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: to enable real batch for multiple images input of VLM #14530

Prerequisites

Feature Description

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: to enable real batch for multiple images input of VLM #14530

Description

Prerequisites

Feature Description

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions