Update: These fixes have been included in this install.bat file: https://github.com/jllllll/one-click-installers
You should be able to just clone that repo and run install.bat there, without the need for running the code here.
A simple batch file to make the oobabooga one click installer compatible with llama 4bit models and able to run on cuda
- download and extract the windows zip file from here: https://github.com/oobabooga/text-generation-webui/releases
- place
bandaid.batin the same folder asinstall.bat - double click
install.batand let it run all the way through - double click
bandaid.batand let it run all the way through (it will run install.bat again to fix some stuff that this hacky jank messes up. Don't worry, everything that's running is running from within the oobabooga folder. Worst case scenario, you delete everything and start from scratch) - place your models in the
text-generation-webui\modelsfolder. The folder structure should look like this
models\
|
|- model-name-4bit.pt
|- model-name\
|- config.json
|- generation_config.json
|- pytorch_model.bin.index.json
|- special_tokens_map.json
|- tokenizer.model
|- tokenizer_config.json
- make sure tokenizer_config.json says
"tokenizer_class": "LlamaTokenizer"and not"tokenizer_class": "LLaMATokenizer" - double click on
start-webui.bat
And that's it! You should have a working install now. Just double click on start-webui.bat.
Note: Apparently some people are having trouble with Windows 11. You may have to manually edit your
start-webui.batfile and change the linecall python server.py --auto-devices --cai-chattocall python server.py --auto-devices --cai-chat --gptq-bits 4 --gptq-model-type LLaMa, and then double clickinstall.batand let it run all the way through one more time. If you are still getting cuda errors, you are on your own. This is what worked for me. Good luck!
Credit: I just slapped this .bat file together. Most of the hard work was done by the users in this thread: qwopqwop200/GPTQ-for-LLaMa#11 (comment)