v6.7.0 - LONG CONTEXT no see!
General Updates
- CITATIONS! with hyperlinks when searching the Vector DB and getting a response.
- Display of a chat model's max context and how many tokens you've used.
2X Speed Increase
Choose "half" in the database creation settings. It will automatically choose bfloat16 or float16 based on your GPU.
This results in a 2x speed increase with extremely low loss in quality.
Chat Models
Removed Internlm2_5 - 1.8b and Qwen 1.5 - 1.6b as under performing.
Removed Dolphin-Llama 3 - 8b and Internlm2 - 20b as superseded.
Added Danube 3 - 4b with 8k context.
Added Phi 3.5 Mini - 4b with 8k context.
Added Hermes-4-Llama 3.1 - 8b with 8k context
Added Internlm2_5 - 20b with 8k context
The following models now have have 8192 context:
| Model Name | Parameters (billion) | Context Length |
|---|---|---|
| Danube 3 - 4b | 4 | 8192 |
| Dolphin-Qwen 2 - 1.5b | 1.5 | 8192 |
| Phi 3.5 Mini - 4b | 4 | 8192 |
| Internlm2_5 - 7b | 7 | 8192 |
| Dolphin-Llama 3.1 - 8b | 8 | 8192 |
| Hermes-3-Llama-3.1 - 8b | 8 | 8192 |
| Dolphin-Qwen 2 - 7b | 7 | 8192 |
| Dolphin-Mistral-Nemo - 12b | 12 | 8192 |
| Internlm2_5 - 20b | 20 | 8192 |
Text to Speech Models
- Excited to add additional models to choose from when using
whisperspeechas the text to speech backend - see the chart below for the variouss2aandt2smodel combinations and "relative" compute times along with real vram usage stats.
Current Chat and Vision Models


