Skip to content

v2.0.3 (for Windows and Linux) + Source Code Upload

Latest
Compare
Choose a tag to compare
@rrrusst rrrusst released this 21 Jun 16:34
· 1 commit to main since this release
1d4c0ab

Uploaded source code, tokenizer model, and requirements.txt. Subsequent versions will include updates to the uploaded source code as well.

Whats new?

  1. Refactored spaghetti code into more modular scripts and uploaded onto the repository.
  2. Updated dependency (llama-cpp-python) to 0.3.9 to be compatible with the latest LLMs that use the gpt4o pre-tokenizer.
  3. Changed context size selection in Config window to an input field, supporting up to 999999 token context size. This is still dependent on how much VRAM (GPU-bound version) and RAM your system has, and the LLM response quality is dependent on whether the LLM is tuned/trained to handle that context size.

CPU-bound version (Windows and Linux):

  1. Only use this if your computer does not have a NVIDIA GPU that was released in 2006 or later/newer. Performs slower than GPU-bound version.

GPU-bound version (Windows only):

  1. Use this if your computer has a NVIDIA GPU that was released in 2006 or later/newer. Performs >10-20x faster than CPU-bound version.
  2. You need to install the latest NVIDIA CUDA Toolkit from https://developer.nvidia.com/cuda-downloads in order to make use of your GPU's CUDA cores (else, performance will be same as CPU-bound version as only CPU will be used). You only need to install the CUDA-related components.
  3. Even though the code is the same as CPU-bound version, the file size is much bigger as it was built with the NVIDIA CUDA Toolkit to enable utilising of NVIDIA CUDA cores for LLM inference.