Ability to use `--hf-repo` without `--hf-file` or `-m` #7504

ngxson · 2024-05-23T22:35:56Z

This PR introduces the ability to run GGUF file without --hf-file or -m

How it works:

Make HTTP GET request to Hub API: "https://huggingface.co/api/models/" << repo << "/tree/main?recursive=true"
Search file and get the first one that matches:

q4_k_m ==> Default to quantize q4_k_m
q4 ==> if q4_k_m is not there, we take just q4 (no matter if it is q4_0 or q4_k_s,...)
00001 ==> if no q4 is found, we take the first shard
gguf ==> if none of the file can be found using all conditions above, we just grab any .gguf that we can find

One the path to the file is determined, we download it via "https://huggingface.co/" << repo << "/resolve/main/" << file_path

--hf-repo works without --hf-file or -m
cache the file to {CACHE_DIRECTORY}/{repo}/{file_path}

CC @julien-c

mofosyne · 2024-05-24T06:38:44Z

Should we really be tightly coupling llama.cpp to huggingface to that point?

teleprint-me · 2024-05-24T06:53:04Z

No, I don't think this is a good idea. Also, I'm in the middle automating getting files from the API. Convert script will eventually do everything in a single shot. Making progress on this.

ngxson · 2024-05-24T07:00:42Z

Most people download gguf (or safetensors to be converted to gguf) from HF anyway, so think more like this PR is "nice to have" feature. (So, not really "tightly coupling" as you said - just something that you can deactivate if you don't want)

Secondly, this feature is mostly done to take advantage of the newly added "Use this model" feature on HF. While other programs like jan or LM Studio only requires the repo name, llama.cpp currently requires both repo + file path (or model name)

And lastly, my implementation maybe still not the best way to do. That's why this PR is in draft. Feedback & ideas are welcomed.

teleprint-me · 2024-05-24T07:17:55Z

This is not a valid application of SoC. This in the Common CLI API. I would say this is coupled. All I can do is voice is my opinion. In the end, it is not up to me. There are better ways to fetch the model from API. I'm more than happy to support a C++ implementation, but not like this. HF API should be separate from the llama.cpp API.

teleprint-me · 2024-05-25T21:43:51Z

#6757 seems related.

--hf-repo without --hf-file

8afc0f3

mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix enhancement New feature or request labels May 24, 2024

mofosyne added the need feedback Testing and feedback with results are needed label May 24, 2024

ngxson mentioned this pull request May 29, 2024

Feature Request: Improve Ergonomics of llama-server #7619

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ability to use `--hf-repo` without `--hf-file` or `-m` #7504

Ability to use `--hf-repo` without `--hf-file` or `-m` #7504

ngxson commented May 23, 2024 •

edited

Loading

Uh oh!

mofosyne commented May 24, 2024

Uh oh!

teleprint-me commented May 24, 2024 •

edited

Loading

Uh oh!

ngxson commented May 24, 2024 •

edited

Loading

Uh oh!

teleprint-me commented May 24, 2024

Uh oh!

teleprint-me commented May 25, 2024

Uh oh!

Uh oh!

Ability to use --hf-repo without --hf-file or -m #7504

Are you sure you want to change the base?

Ability to use --hf-repo without --hf-file or -m #7504

Conversation

ngxson commented May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mofosyne commented May 24, 2024

Uh oh!

teleprint-me commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teleprint-me commented May 24, 2024

Uh oh!

teleprint-me commented May 25, 2024

Uh oh!

Uh oh!

Ability to use `--hf-repo` without `--hf-file` or `-m` #7504

Ability to use `--hf-repo` without `--hf-file` or `-m` #7504

ngxson commented May 23, 2024 •

edited

Loading

teleprint-me commented May 24, 2024 •

edited

Loading

ngxson commented May 24, 2024 •

edited

Loading