Skip to content

Feature Request: Improve Ergonomics of llama-server #7619

@abidlabs

Description

@abidlabs

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

To help users get started with llama-server more easily, I'd like to be able to do something like this:

llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf
curl http://localhost:8080/completion -d '{
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Right now, the setup is more complicated, and I'm wondering whether it can be simplified:

llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 2048
curl --request POST \
    --url http://localhost:8080/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

Motivation

It'd be great if we could make getting started with llama-server easier for users and more welcoming!

Possible Implementation

(1) Make the -c parameter optional? Maybe I'm misunderstanding but I thought the context size is a function of the model so it shouldn't need to be explicitly set
(2) Probably harder but make --hf-file optional and use the largest one that fits in your machine's ram?
(3) Allow the endpoint to take messages in the standard openai format?

"messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions