Can we run infinity with gguf quantized models ? #613

Greatz08 · 2025-07-01T14:58:32Z

Greatz08
Jul 1, 2025

I would love to try to quantized gguf model through infinity but i didnt found anything related to gguf in issues, discussions, docs so asking here.

michaelfeil · 2025-07-01T17:45:13Z

michaelfeil
Jul 1, 2025
Maintainer

Sorry, gguf are not working well or are common for embeddings / rerankers. Reason is that operations are mostly compute bound, and not memory bound like llm generation.

1 reply

Greatz08 Jul 2, 2025
Author

@michaelfeil can we use embedding and reranker models gguf with infinity ? If yes then, that's the only thing i want

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can we run infinity with gguf quantized models ? #613

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Can we run infinity with gguf quantized models ? #613

Uh oh!

Uh oh!

Greatz08 Jul 1, 2025

Replies: 1 comment · 1 reply

Uh oh!

michaelfeil Jul 1, 2025 Maintainer

Uh oh!

Greatz08 Jul 2, 2025 Author

Greatz08
Jul 1, 2025

Replies: 1 comment 1 reply

michaelfeil
Jul 1, 2025
Maintainer

Greatz08 Jul 2, 2025
Author