LocalMind is a chat interface allowing the use of local LLM with persistent memory, running fully offline for privacy.
This project is an expansion of a previous project, LLM Memorization, allowing automatic saves and summaries of your conversations in a local database, to provide relevant context in every chat.
It is built on top of easy-llama, a lightweight Python backend that makes it seamless to interact with local models via llama.cpp.
The dual-memory architecture allows the assistant to deliver a contextually rich and coherent responses:
- Short-term memory captures recent exchanges, ensuring the assistant maintains immediate context and continuity in conversations.
- Long-term memory stores older conversations in a summarized form, allowing the assistant to recall past information without overwhelming the prompt size.
- Summaries (
llm_output_summary) are generated once at insertion for efficiency and stored directly for fast prompt retrieval. - Supports advanced retrieval methods including vector search and keyword filtering to access relevant long-term memories.
- Conversations are automatically saved in the database.
Context-specific memory management:
- Each workflow represents a distinct context/project within the assistant, with its own long-term memory, maintaining relevant context for each specific use case.
- Adding, editing, or deleting workflows directly affects the conversations stored in the database (specific memory can be shared or deleted).
- Supports ephemeral mode, where conversations are not saved, for temporary testing or sensitive queries.
This assistant is designed to be fully customizable by the user.
- Tune the number of recent exchanges included in the prompt to balance context and model input size.
- Adjust the number and minimal relevance of long-term memory retrieved.
- Enable or disable reasoning mode if the model used supports it, and choose to show the thinking block.
- Configure model parameters directly from the settings menu for flexibility and experimentation.
- Swap the local model by editing the
model_pathinresources/config.json, making it easy to experiment with different.ggufmodels.
Use the Info button to open a detailed window displaying:
- Recent exchanges and older conversations.
- Summaries (
llm_output_summary) stored in the database. - Keywords from your original input and from the transitory generated prompt.
- A heatmap correlation graph, showing semantic similarity in the transitory prompt.
- Exchanges (
user_inputandllm_output) used as contexts, sorted by theirsimilarity_score. - General information about your memory database.
Due to technical limitations, this project and its benchmark were developed using a small 0.6B parameter model (Qwen3-0.6B).
This project has been designed to work with any local model quantized in a .gguf format.
I would be highly interested to get results using heavier models.
scripts/gui.py: Chat UI & settingsscripts/llm_executor.py: LLM response generationscripts/main.py: Memory, summarization, DB logicdatas/conversations_example.db: An example database containing exchanges about high-level chemistry (En/Fr) using multiple models.conversations: inputs, outputs, summaries, timestampsvectors: keywords, embeddingsconversation_vectors: full user input embeddingshash_index: duplicate detection
resources/config.json: Paths, models, memory parameters
- Clone the repo
git clone https://github.com/victorcarre6/llm-assistant
cd llm-assistant- Create a virtual environment
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
.venv\Scripts\activate # Windows- Install dependencies
pip install -r requirements.txt- Download models
- Summarization (HuggingFace):
python -m spacy download fr_core_news_lg
python -m spacy download en_core_web_lg- Local LLM (.GGUF):
Set path in
resources/config.json:
"model_path": "resources/models/model_name/model_name.gguf"For initial testing, I advise to use Qwen3-0.6B as it was the model used to develop this project.
- Run the assistant
python scripts/gui.py- Profiles import and export, to selectively exchange memory between users
- Agentic mode :
- Document integration
- Web search
This project has been made (tremendously!) easier thanks to easy-llama.
Contributions are welcome! Feel free to reach out for issues or suggestions. You can support my work on ko-fi
MIT — free and open source.




