A demo chatbot that uses Aquiles-RAG as a Retrieval-Augmented Generation (RAG) server to index documents and answer queries by combining semantic search with GPT-4.1.
AI-Server/
├── context.py # RAG logic: query optimization, indexing, and response streaming
├── main.py # FastAPI server: HTTP routes and WebSocket endpoint
├── utils.py # libSQL/Turso client and database helpers
├── templates/ # Jinja2 HTML templates
│ ├── home.html # Chat interface
│ └── upload.html # Document upload form
├── static/ # Static assets (JS, CSS, images)
├── .env # Environment variables
└── README.md # This document
- Python 3.9+
- FastAPI — API routing and async orchestration.
- Aquiles-RAG — Async client for vector indexing and semantic search.
- OpenAI API —
text-embedding-3-smallfor embeddings;gpt-4.1for answer generation. - libSQL / Turso.tech — Cloud-hosted SQLite for persisting metadata.
- Jinja2 — Dynamic HTML templating.
- Platformdirs — Locates a user data directory for storing uploads.
- python-dotenv — Loads
.envvariables.
Create a .env file at the project root with:
# OpenAI
OPENAI_API_KEY=your_openai_api_key
# Aquiles-RAG
URL_RAG=http://localhost:8001 # Your Aquiles-RAG server URL
API_KEY_RAG=your_rag_api_key # Your Aquiles-RAG API key
INDEX_RAG=docs # RAG index name
# Turso / libSQL
URL=your_turso_or_sqlite_url
TOKEN=your_turso_auth_token-
Clone the repo
git clone https://github.com/Aquiles-ai/aquiles-chat-demo.git cd aquiles-chat-demo -
Install dependencies
pip install -r requirements.txt
-
Initialize the database
python utils.py # Creates the `docs` table if it doesn’t exist -
Start the server
uvicorn main:app --reload --host 0.0.0.0 --port 5600
-
Access the UI
- Chat interface →
http://localhost:5600/home - Upload page →
http://localhost:5600/upload
- Chat interface →
Project UI in action:
Upload a document and index it in Aquiles-RAG.
-
Form Data
file(UploadFile):.pdf,.xlsx,.xls, or.docxtype_doc(str):"pdf","excel", or"word"
-
Success Response
{ "state": "success" } -
Error Response
{ "error": "detailed error message" }
Retrieve a list of all indexed documents.
-
Response
{ "docs": [ { "id": 1, "path": "/.../20250806123456_doc.pdf", "doc_type": "pdf", "created_at": "2025-08-06T12:34:56" }, … ] }
Real-time streaming for RAG + GPT-4.1 responses.
-
Client sends JSON:
{ "query": "What is RAG?", "top_k": 5 } -
Server streams back text chunks as they are generated.
-
Connection closes when answer is complete.
- GET
/home→ Chat UI - GET
/upload→ Document upload form
-
Indexing (
RAGIndexerincontext.py):- Extract text from PDF/Excel/Word.
- Generate embeddings.
- Send chunks to Aquiles-RAG.
- Store metadata in Turso’s
docstable.
-
Query & Response (
RAGPipelineincontext.py):- Optimize the original query into 3–5 concise variants.
- Fetch
top_kchunks per variant from Aquiles-RAG. - Merge and sort the highest-scoring chunks.
- Send context + user query to GPT-4.1 with streaming.
-
Database (
utils.py):- Manage connections and CRUD operations on Turso (SQLite cloud).
- Upload a PDF containing your project documentation.
- Ask in the chat: “How do I configure FastAPI?”
- Receive a step-by-step answer, grounded in your PDF content.
- Deployment: Docker, Vercel, PythonAnywhere.
- Authentication: JWT, OAuth.
- Monitoring Dashboard: Usage metrics & index health.
- Additional Formats: Support TXT, Markdown, HTML.
- User Interface: Allow browsing/upload history and credits.
