Skip to content

Ability to use --hf-repo without --hf-file or -m #7504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented May 23, 2024

This PR introduces the ability to run GGUF file without --hf-file or -m

How it works:

  1. Make HTTP GET request to Hub API: "https://huggingface.co/api/models/" << repo << "/tree/main?recursive=true"
  2. Search file and get the first one that matches:
  • q4_k_m ==> Default to quantize q4_k_m
  • q4 ==> if q4_k_m is not there, we take just q4 (no matter if it is q4_0 or q4_k_s,...)
  • 00001 ==> if no q4 is found, we take the first shard
  • gguf ==> if none of the file can be found using all conditions above, we just grab any .gguf that we can find
  1. One the path to the file is determined, we download it via "https://huggingface.co/" << repo << "/resolve/main/" << file_path
  • --hf-repo works without --hf-file or -m
  • cache the file to {CACHE_DIRECTORY}/{repo}/{file_path}

CC @julien-c

@mofosyne mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix enhancement New feature or request labels May 24, 2024
@mofosyne
Copy link
Collaborator

Should we really be tightly coupling llama.cpp to huggingface to that point?

@teleprint-me
Copy link
Contributor

teleprint-me commented May 24, 2024

No, I don't think this is a good idea. Also, I'm in the middle automating getting files from the API. Convert script will eventually do everything in a single shot. Making progress on this.

@ngxson
Copy link
Collaborator Author

ngxson commented May 24, 2024

Most people download gguf (or safetensors to be converted to gguf) from HF anyway, so think more like this PR is "nice to have" feature. (So, not really "tightly coupling" as you said - just something that you can deactivate if you don't want)

Secondly, this feature is mostly done to take advantage of the newly added "Use this model" feature on HF. While other programs like jan or LM Studio only requires the repo name, llama.cpp currently requires both repo + file path (or model name)

And lastly, my implementation maybe still not the best way to do. That's why this PR is in draft. Feedback & ideas are welcomed.

@teleprint-me
Copy link
Contributor

This is not a valid application of SoC. This in the Common CLI API. I would say this is coupled. All I can do is voice is my opinion. In the end, it is not up to me. There are better ways to fetch the model from API. I'm more than happy to support a C++ implementation, but not like this. HF API should be separate from the llama.cpp API.

@mofosyne mofosyne added the need feedback Testing and feedback with results are needed label May 24, 2024
@teleprint-me
Copy link
Contributor

#6757 seems related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request need feedback Testing and feedback with results are needed Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants