Nextwork RAG API

A Retrieval-Augmented Generation (RAG) API built with FastAPI, ChromaDB, and Ollama. This API allows you to add documents to a knowledge base and query them using AI-powered responses.

Features

Add Knowledge: Dynamically add text content to the knowledge base
Query Knowledge: Ask questions and get AI-generated answers based on the stored knowledge
Persistent Storage: Uses ChromaDB for persistent vector storage
AI Integration: Uses Ollama for generating contextual answers

Prerequisites

Python 3.11, 3.12, or 3.13 (Python 3.14 has compatibility issues with ChromaDB)
Ollama installed and running
The tinyllama model installed in Ollama (or modify the model name in app.py)

Note: Python 3.14 is not yet supported by ChromaDB. Use Python 3.13 or earlier for best compatibility.

Installing Ollama and the Model

Install Ollama from https://ollama.ai/
Pull the tinyllama model:
```
ollama pull tinyllama
```

Setup

Clone the repository (if applicable) or navigate to the project directory

Create a virtual environment (using Python 3.13):

python3.13 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Embed initial documents (optional):
```
python embed.py k8s.txt
```
Or embed any text file:
```
python embed.py your_file.txt
```

Running the API

Option 1: Local Development

Start the FastAPI server:

uvicorn app:app --reload

The API will be available at http://localhost:8000

Option 2: Docker (Recommended)

Using Pre-built Image from Docker Hub

The image is available on Docker Hub as zag23/rag-app:latest:

# Pull the image
docker pull zag23/rag-app:latest

# Run the container
docker run -d -p 8000:8000 --name rag-app zag23/rag-app

Important Notes for Docker:

The container expects Ollama to be running on the host machine
On Mac/Windows: The container will automatically connect to host.docker.internal:11434

On Linux: You may need to set OLLAMA_HOST to your host's IP address:

docker run -d -p 8000:8000 -e OLLAMA_HOST=host.docker.internal:11434 --name rag-app zag23/rag-app

Or use --network host:

docker run -d --network host --name rag-app zag23/rag-app

Building from Source

To build the Docker image locally:

docker build -t zag23/rag-app .
docker run -d -p 8000:8000 --name rag-app zag23/rag-app

API Documentation

Once the server is running, you can access:

Interactive API docs: http://localhost:8000/docs
Alternative docs: http://localhost:8000/redoc

API Endpoints

`GET /`

Health check endpoint.

Response:

{
  "status": "ok",
  "message": "Nextwork RAG API is running"
}

`POST /add`

Add new content to the knowledge base.

Request Body:

{
  "text": "Your content here..."
}

Response:

{
  "status": "success",
  "message": "Content added to knowledge base",
  "id": "uuid-here"
}

`POST /query`

Query the knowledge base and get an AI-generated answer.

Request Body:

{
  "q": "What is Kubernetes?",
  "n_results": 1,
  "include_scores": false,
  "use_best_only": true
}

Parameters:

q (required): The question to search for
n_results (optional, default: 1): Number of results to retrieve (1-10)
include_scores (optional, default: false): Include relevance scores in response
use_best_only (optional, default: true): If true, only use best result for AI answer; if false, combine all results

Response (basic):

{
  "answer": "Kubernetes is a container orchestration platform...",
  "results_count": 1
}

Response (with scores and multiple results):

{
  "answer": "Kubernetes is a container orchestration platform...",
  "results_count": 3,
  "results": [
    {
      "id": "doc-id-1",
      "text": "Kubernetes is a container orchestration...",
      "relevance_score": 0.9234,
      "distance": 0.0832
    },
    {
      "id": "doc-id-2",
      "text": "Kubernetes helps manage containers...",
      "relevance_score": 0.8567,
      "distance": 0.1673
    }
  ]
}

`DELETE /delete/{doc_id}`

Delete a document from the knowledge base by its ID.

Path Parameters:

doc_id: The unique ID of the document to delete (returned when adding a document)

Response:

{
  "status": "success",
  "message": "Document 'uuid-here' deleted successfully",
  "id": "uuid-here"
}

Error Responses:

404: Document not found
400: Invalid document ID

Usage Examples

Using cURL

Add content:

curl -X POST "http://localhost:8000/add" \
  -H "Content-Type: application/json" \
  -d '{"text": "FastAPI is a modern web framework for building APIs with Python."}'

Query (basic):

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"q": "What is FastAPI?"}'

Query (with multiple results and scores):

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "What is FastAPI?",
    "n_results": 3,
    "include_scores": true,
    "use_best_only": false
  }'

Delete document:

curl -X DELETE "http://localhost:8000/delete/your-document-id-here"

Note: The API expects JSON format. Do not use -G (GET) flag with --data-urlencode as this will cause errors. Always use -H "Content-Type: application/json" with -d for JSON payloads.

Using Python

import requests

# Add content
response = requests.post(
    "http://localhost:8000/add",
    json={"text": "Your content here"}
)
print(response.json())

# Query (basic)
response = requests.post(
    "http://localhost:8000/query",
    json={"q": "Your question here"}
)
print(response.json()["answer"])

# Query (with multiple results and scores)
response = requests.post(
    "http://localhost:8000/query",
    json={
        "q": "Your question here",
        "n_results": 3,
        "include_scores": True,
        "use_best_only": False
    }
)
data = response.json()
print(f"Answer: {data['answer']}")
print(f"Found {data['results_count']} results")
for i, result in enumerate(data.get('results', []), 1):
    print(f"  Result {i} (score: {result.get('relevance_score', 'N/A')}): {result['text'][:100]}...")

# Delete document
doc_id = "your-document-id-here"
response = requests.delete(f"http://localhost:8000/delete/{doc_id}")
print(response.json())

Project Structure

nextwork-rag-api/
├── app.py              # FastAPI application with RAG endpoints
├── embed.py            # Script to embed documents into ChromaDB
├── Dockerfile          # Docker configuration for containerized deployment
├── requirements.txt    # Python dependencies
├── README.md          # This file
├── test_connection.py  # Test script to verify Ollama and ChromaDB connections
├── .gitignore         # Git ignore rules
├── db/                # ChromaDB database files (auto-generated)
└── k8s.txt           # Example text file for embedding

Configuration

All configuration is done via environment variables. No code changes needed!

Database Path: The ChromaDB database is stored in ./db (can be changed in app.py line 15)
Ollama Model: Default is tinyllama (can be changed in app.py line 95)
Collection Name: Default is "docs" (can be changed in app.py line 16)
Ollama Host:
- Local development: Defaults to localhost:11434
- Docker: Set via OLLAMA_HOST environment variable (defaults to host.docker.internal:11434)
- Kubernetes/Minikube: Use host.docker.internal:11434 with hostNetwork: true for accessing host machine's Ollama
- The code automatically strips http:// or https:// prefixes if present

Environment Variables

OLLAMA_HOST: Ollama server address in hostname:port format (e.g., localhost:11434 or host.docker.internal:11434)
- Default: localhost:11434
- Note: The Ollama Python client expects hostname:port format, not a full URL. Protocol prefixes are automatically removed.
OLLAMA_MODEL: The Ollama model to use for generating answers
- Default: tinyllama
- Example: export OLLAMA_MODEL=llama2 (after running ollama pull llama2)
CHROMA_DB_PATH: Path where ChromaDB stores its database files
- Default: ./db
- Example: export CHROMA_DB_PATH=/data/rag-db
CHROMA_COLLECTION_NAME: Name of the ChromaDB collection to use
- Default: docs
- Example: export CHROMA_COLLECTION_NAME=knowledge_base

Example Configuration

# Set all environment variables
export OLLAMA_HOST=localhost:11434
export OLLAMA_MODEL=llama2
export CHROMA_DB_PATH=./db
export CHROMA_COLLECTION_NAME=docs

# Then start the server
uvicorn app:app --reload

Troubleshooting

Ollama connection error:
- Make sure Ollama is running (ollama serve)
- For Docker: Ensure Ollama is accessible from the container (use host.docker.internal:11434 on Mac/Windows)
- Check that OLLAMA_HOST is set correctly (should be hostname:port format, not a URL)
Model not found: Ensure the model is installed (ollama pull tinyllama)
Empty query results: Make sure you've added content to the knowledge base first using /add endpoint or embed.py
Port already in use: Change the port with uvicorn app:app --port 8001 or use a different port in Docker: docker run -p 8001:8000 ...
Docker container can't connect to Ollama:
- Verify Ollama is running on the host: curl http://localhost:11434/api/tags
- On Mac/Windows: Use host.docker.internal:11434 (default)
- On Linux, you may need to use --network host or set OLLAMA_HOST to your host's IP
- Check container logs: docker logs rag-app
Kubernetes/Minikube can't connect to Ollama:
- For minikube: Use host.docker.internal:11434 as OLLAMA_HOST and enable hostNetwork: true in deployment
- Verify Ollama is accessible: minikube ssh "curl http://host.docker.internal:11434/api/tags"
- Check pod logs: kubectl logs -l app=rag-api
Test connections: Use the provided test script:
```
python test_connection.py
```

Recent Updates

Latest Changes

Query improvements:
- Support for multiple results (configurable n_results, max 10)
- Relevance scores and distance metrics
- Option to combine all results or use only the best match
- Enhanced response format with detailed result metadata
Environment variable configuration: All settings (model, DB path, collection name) now configurable via environment variables - no code changes needed!
Improved error messages: More actionable error messages that help users diagnose issues (connection problems, missing models, empty knowledge base, etc.)
DELETE endpoint: Added /delete/{doc_id} endpoint to remove documents from the knowledge base
Fixed API request format: Updated curl examples to use proper JSON format with Content-Type: application/json header (removed incorrect -G flag usage)
Fixed port configuration: Corrected deployment and service to use port 8000 (matching Dockerfile) instead of 5000
Fixed Kubernetes/Minikube Ollama connection: Updated to use host.docker.internal:11434 with hostNetwork: true for accessing host machine's Ollama service
Fixed Ollama client response handling: Changed from dictionary access (answer["response"]) to attribute access (answer.response) to match the Ollama Python client API
Improved Ollama host configuration: Added automatic protocol stripping for OLLAMA_HOST environment variable to handle both URL and hostname:port formats
Docker support: Added Dockerfile and published image to Docker Hub (zag23/rag-app:latest)
Connection testing: Added test_connection.py script to verify Ollama and ChromaDB connections

Docker Hub

The working image is available on Docker Hub:

Repository: zag23/rag-app
Tag: latest
Pull command: docker pull zag23/rag-app:latest

License

APACHE 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.gitignore		.gitignore
Dockerfile		Dockerfile
K8S_DEPLOYMENT.md		K8S_DEPLOYMENT.md
LICENSE		LICENSE
MINIKUBE_RESTART.md		MINIKUBE_RESTART.md
README.md		README.md
app.py		app.py
check_status.sh		check_status.sh
deployment.yaml		deployment.yaml
embed.py		embed.py
k8s-configmap.yaml		k8s-configmap.yaml
k8s-deployment-with-configmap.yaml		k8s-deployment-with-configmap.yaml
k8s-deployment.yaml		k8s-deployment.yaml
k8s.txt		k8s.txt
pvc.yaml		pvc.yaml
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
service.yaml		service.yaml
test_connection.py		test_connection.py
test_new_features.py		test_new_features.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nextwork RAG API

Features

Prerequisites

Installing Ollama and the Model

Setup

Running the API

Option 1: Local Development

Option 2: Docker (Recommended)

Using Pre-built Image from Docker Hub

Building from Source

API Documentation

API Endpoints

`GET /`

`POST /add`

`POST /query`

`DELETE /delete/{doc_id}`

Usage Examples

Using cURL

Using Python

Project Structure

Configuration

Environment Variables

Example Configuration

Troubleshooting

Recent Updates

Latest Changes

Docker Hub

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nextwork RAG API

Features

Prerequisites

Installing Ollama and the Model

Setup

Running the API

Option 1: Local Development

Option 2: Docker (Recommended)

Using Pre-built Image from Docker Hub

Building from Source

API Documentation

API Endpoints

GET /

POST /add

POST /query

DELETE /delete/{doc_id}

Usage Examples

Using cURL

Using Python

Project Structure

Configuration

Environment Variables

Example Configuration

Troubleshooting

Recent Updates

Latest Changes

Docker Hub

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /`

`POST /add`

`POST /query`

`DELETE /delete/{doc_id}`

Packages