·
1 commit
to main
since this release
Uploaded source code, tokenizer model, and requirements.txt. Subsequent versions will include updates to the uploaded source code as well.
Whats new?
- Refactored spaghetti code into more modular scripts and uploaded onto the repository.
- Updated dependency (llama-cpp-python) to 0.3.9 to be compatible with the latest LLMs that use the gpt4o pre-tokenizer.
- Changed context size selection in Config window to an input field, supporting up to 999999 token context size. This is still dependent on how much VRAM (GPU-bound version) and RAM your system has, and the LLM response quality is dependent on whether the LLM is tuned/trained to handle that context size.
CPU-bound version (Windows and Linux):
- Only use this if your computer does not have a NVIDIA GPU that was released in 2006 or later/newer. Performs slower than GPU-bound version.
GPU-bound version (Windows only):
- Use this if your computer has a NVIDIA GPU that was released in 2006 or later/newer. Performs >10-20x faster than CPU-bound version.
- You need to install the latest NVIDIA CUDA Toolkit from https://developer.nvidia.com/cuda-downloads in order to make use of your GPU's CUDA cores (else, performance will be same as CPU-bound version as only CPU will be used). You only need to install the CUDA-related components.
- Even though the code is the same as CPU-bound version, the file size is much bigger as it was built with the NVIDIA CUDA Toolkit to enable utilising of NVIDIA CUDA cores for LLM inference.