Release v2.0.3 (for Windows and Linux) + Source Code Upload · rrrusst/solairia

Uploaded source code, tokenizer model, and requirements.txt. Subsequent versions will include updates to the uploaded source code as well.

Refactored spaghetti code into more modular scripts and uploaded onto the repository.
Updated dependency (llama-cpp-python) to 0.3.9 to be compatible with the latest LLMs that use the gpt4o pre-tokenizer.
Changed context size selection in Config window to an input field, supporting up to 999999 token context size. This is still dependent on how much VRAM (GPU-bound version) and RAM your system has, and the LLM response quality is dependent on whether the LLM is tuned/trained to handle that context size.

Only use this if your computer does not have a NVIDIA GPU that was released in 2006 or later/newer. Performs slower than GPU-bound version.

Use this if your computer has a NVIDIA GPU that was released in 2006 or later/newer. Performs >10-20x faster than CPU-bound version.
You need to install the latest NVIDIA CUDA Toolkit from https://developer.nvidia.com/cuda-downloads in order to make use of your GPU's CUDA cores (else, performance will be same as CPU-bound version as only CPU will be used). You only need to install the CUDA-related components.
Even though the code is the same as CPU-bound version, the file size is much bigger as it was built with the NVIDIA CUDA Toolkit to enable utilising of NVIDIA CUDA cores for LLM inference.

Provide feedback