Feature Request: Improve Ergonomics of `llama-server`

### Prerequisites

- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

To help users get started with `llama-server` more easily, I'd like to be able to do something like this:

```
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf
```

```
curl http://localhost:8080/completion -d '{
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'
```

Right now, the setup is more complicated, and I'm wondering whether it can be simplified:

```
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 2048
```

```
curl --request POST \
    --url http://localhost:8080/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'
```

### Motivation

It'd be great if we could make getting started with `llama-server` easier for users and more welcoming!

### Possible Implementation

(1) Make the `-c` parameter optional? Maybe I'm misunderstanding but I thought the context size is a function of the model so it shouldn't need to be explicitly set
(2) Probably harder but make `--hf-file optional` and use the largest one that fits in your machine's ram?
(3) Allow the endpoint to take messages in the standard openai format?

```py
"messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Improve Ergonomics of `llama-server` #7619

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Improve Ergonomics of llama-server #7619

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Feature Request: Improve Ergonomics of `llama-server` #7619