int 8 experimental support #188
puppetm4st3r
started this conversation in
General
Replies: 1 comment
-
|
This should do it. pip install infinity_emb[all]
infinity_emb --no-bettertransformer --model-name-or-path BAAI/bge-small-en-v1.5 --dtype int8
# is only slightly faster than
infinity_emb --no-bettertransformer --model-name-or-path BAAI/bge-small-en-v1.5 --dtype float16Performance is not significantly faster, and memory savings small due to batch size. Also all weights are loaded in fp32 in any case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!, how do I can test the experimental int8 support, I understand that int8 will work on any cuda device, or is a restriction on the gpu model ? fp8 is for h100 and greater and understand.
Beta Was this translation helpful? Give feedback.
All reactions