Skip to content

Conversation

92MING
Copy link

@92MING 92MING commented Jun 4, 2025

ref: #13872

Currently passing media(image/audio) to mtmd is only supported under chat/completion in llama-server.
It is still necessary for allowing mtmd in /completion endpoint, since /completion can have more freedom modification in prompt template(e.g. prefix), or even no template for long article completion.

This PR added an extra field medias(list[object]) under /completion, and each media should contains 2 fields:

  • type: audio/image(image_url as alias), other unknown types will be ignored
  • data: url/base64 of the image/audio.

Then, make sure you have the same number of <__media__> tag in your prompt to label the position.
Here is an example:

data = {
        'stream': False,
        'top_p': 0.95,
        'temperature': 0.8,
        'top_k': 40,
        'prompt': '<start_of_turn>user\nAnalyze the image and provide a short description\n<__media__><end_of_turn>\n<start_of_turn>model\n',
        'medias': [
            {
                'type': 'image',
                'data': <your img_b64 or url>,
            }
        ],
    }
response = requests.post('http://localhost:8080/completion', json=data)

Comment on lines +4341 to +4359
if (medias.is_array()) {
for (auto & m : medias) {
std::string type = json_value(m, "type", std::string());
std::string data = json_value(m, "data", std::string());
if (type.empty() || data.empty()) {
continue;
}
if (type == "image_url" || type == "image" || type == "img") {
if (!opt.allow_image) {
throw std::runtime_error("image input is not supported - hint: if this is unexpected, you may need to provide the mmproj");
}
if (string_starts_with(data, "http")) {
// download remote image
common_remote_params params;
params.headers.push_back("User-Agent: llama.cpp/" + build_info);
params.max_size = 1024 * 1024 * 10; // 10MB
params.timeout = 10; // seconds
SRV_INF("downloading image from '%s'\n", data.c_str());
auto res = common_remote_get_content(data, params);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating this whole code block, extract it to a general function and reuse it in /chat/completion and /completion. DRY code principle

@ngxson
Copy link
Collaborator

ngxson commented Aug 22, 2025

Close and replace by #15108

@ngxson ngxson closed this Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants