Skip to content

ZAG23/nextwork-rag-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nextwork RAG API

A Retrieval-Augmented Generation (RAG) API built with FastAPI, ChromaDB, and Ollama. This API allows you to add documents to a knowledge base and query them using AI-powered responses.

Features

  • Add Knowledge: Dynamically add text content to the knowledge base
  • Query Knowledge: Ask questions and get AI-generated answers based on the stored knowledge
  • Persistent Storage: Uses ChromaDB for persistent vector storage
  • AI Integration: Uses Ollama for generating contextual answers

Prerequisites

  • Python 3.11, 3.12, or 3.13 (Python 3.14 has compatibility issues with ChromaDB)
  • Ollama installed and running
  • The tinyllama model installed in Ollama (or modify the model name in app.py)

Note: Python 3.14 is not yet supported by ChromaDB. Use Python 3.13 or earlier for best compatibility.

Installing Ollama and the Model

  1. Install Ollama from https://ollama.ai/
  2. Pull the tinyllama model:
    ollama pull tinyllama

Setup

  1. Clone the repository (if applicable) or navigate to the project directory

  2. Create a virtual environment (using Python 3.13):

    python3.13 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Embed initial documents (optional):

    python embed.py k8s.txt

    Or embed any text file:

    python embed.py your_file.txt

Running the API

Option 1: Local Development

Start the FastAPI server:

uvicorn app:app --reload

The API will be available at http://localhost:8000

Option 2: Docker (Recommended)

Using Pre-built Image from Docker Hub

The image is available on Docker Hub as zag23/rag-app:latest:

# Pull the image
docker pull zag23/rag-app:latest

# Run the container
docker run -d -p 8000:8000 --name rag-app zag23/rag-app

Important Notes for Docker:

  • The container expects Ollama to be running on the host machine
  • On Mac/Windows: The container will automatically connect to host.docker.internal:11434
  • On Linux: You may need to set OLLAMA_HOST to your host's IP address:
    docker run -d -p 8000:8000 -e OLLAMA_HOST=host.docker.internal:11434 --name rag-app zag23/rag-app
    Or use --network host:
    docker run -d --network host --name rag-app zag23/rag-app

Building from Source

To build the Docker image locally:

docker build -t zag23/rag-app .
docker run -d -p 8000:8000 --name rag-app zag23/rag-app

API Documentation

Once the server is running, you can access:

API Endpoints

GET /

Health check endpoint.

Response:

{
  "status": "ok",
  "message": "Nextwork RAG API is running"
}

POST /add

Add new content to the knowledge base.

Request Body:

{
  "text": "Your content here..."
}

Response:

{
  "status": "success",
  "message": "Content added to knowledge base",
  "id": "uuid-here"
}

POST /query

Query the knowledge base and get an AI-generated answer.

Request Body:

{
  "q": "What is Kubernetes?",
  "n_results": 1,
  "include_scores": false,
  "use_best_only": true
}

Parameters:

  • q (required): The question to search for
  • n_results (optional, default: 1): Number of results to retrieve (1-10)
  • include_scores (optional, default: false): Include relevance scores in response
  • use_best_only (optional, default: true): If true, only use best result for AI answer; if false, combine all results

Response (basic):

{
  "answer": "Kubernetes is a container orchestration platform...",
  "results_count": 1
}

Response (with scores and multiple results):

{
  "answer": "Kubernetes is a container orchestration platform...",
  "results_count": 3,
  "results": [
    {
      "id": "doc-id-1",
      "text": "Kubernetes is a container orchestration...",
      "relevance_score": 0.9234,
      "distance": 0.0832
    },
    {
      "id": "doc-id-2",
      "text": "Kubernetes helps manage containers...",
      "relevance_score": 0.8567,
      "distance": 0.1673
    }
  ]
}

DELETE /delete/{doc_id}

Delete a document from the knowledge base by its ID.

Path Parameters:

  • doc_id: The unique ID of the document to delete (returned when adding a document)

Response:

{
  "status": "success",
  "message": "Document 'uuid-here' deleted successfully",
  "id": "uuid-here"
}

Error Responses:

  • 404: Document not found
  • 400: Invalid document ID

Usage Examples

Using cURL

Add content:

curl -X POST "http://localhost:8000/add" \
  -H "Content-Type: application/json" \
  -d '{"text": "FastAPI is a modern web framework for building APIs with Python."}'

Query (basic):

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"q": "What is FastAPI?"}'

Query (with multiple results and scores):

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "What is FastAPI?",
    "n_results": 3,
    "include_scores": true,
    "use_best_only": false
  }'

Delete document:

curl -X DELETE "http://localhost:8000/delete/your-document-id-here"

Note: The API expects JSON format. Do not use -G (GET) flag with --data-urlencode as this will cause errors. Always use -H "Content-Type: application/json" with -d for JSON payloads.

Using Python

import requests

# Add content
response = requests.post(
    "http://localhost:8000/add",
    json={"text": "Your content here"}
)
print(response.json())

# Query (basic)
response = requests.post(
    "http://localhost:8000/query",
    json={"q": "Your question here"}
)
print(response.json()["answer"])

# Query (with multiple results and scores)
response = requests.post(
    "http://localhost:8000/query",
    json={
        "q": "Your question here",
        "n_results": 3,
        "include_scores": True,
        "use_best_only": False
    }
)
data = response.json()
print(f"Answer: {data['answer']}")
print(f"Found {data['results_count']} results")
for i, result in enumerate(data.get('results', []), 1):
    print(f"  Result {i} (score: {result.get('relevance_score', 'N/A')}): {result['text'][:100]}...")

# Delete document
doc_id = "your-document-id-here"
response = requests.delete(f"http://localhost:8000/delete/{doc_id}")
print(response.json())

Project Structure

nextwork-rag-api/
├── app.py              # FastAPI application with RAG endpoints
├── embed.py            # Script to embed documents into ChromaDB
├── Dockerfile          # Docker configuration for containerized deployment
├── requirements.txt    # Python dependencies
├── README.md          # This file
├── test_connection.py  # Test script to verify Ollama and ChromaDB connections
├── .gitignore         # Git ignore rules
├── db/                # ChromaDB database files (auto-generated)
└── k8s.txt           # Example text file for embedding

Configuration

All configuration is done via environment variables. No code changes needed!

  • Database Path: The ChromaDB database is stored in ./db (can be changed in app.py line 15)
  • Ollama Model: Default is tinyllama (can be changed in app.py line 95)
  • Collection Name: Default is "docs" (can be changed in app.py line 16)
  • Ollama Host:
    • Local development: Defaults to localhost:11434
    • Docker: Set via OLLAMA_HOST environment variable (defaults to host.docker.internal:11434)
    • Kubernetes/Minikube: Use host.docker.internal:11434 with hostNetwork: true for accessing host machine's Ollama
    • The code automatically strips http:// or https:// prefixes if present

Environment Variables

  • OLLAMA_HOST: Ollama server address in hostname:port format (e.g., localhost:11434 or host.docker.internal:11434)

    • Default: localhost:11434
    • Note: The Ollama Python client expects hostname:port format, not a full URL. Protocol prefixes are automatically removed.
  • OLLAMA_MODEL: The Ollama model to use for generating answers

    • Default: tinyllama
    • Example: export OLLAMA_MODEL=llama2 (after running ollama pull llama2)
  • CHROMA_DB_PATH: Path where ChromaDB stores its database files

    • Default: ./db
    • Example: export CHROMA_DB_PATH=/data/rag-db
  • CHROMA_COLLECTION_NAME: Name of the ChromaDB collection to use

    • Default: docs
    • Example: export CHROMA_COLLECTION_NAME=knowledge_base

Example Configuration

# Set all environment variables
export OLLAMA_HOST=localhost:11434
export OLLAMA_MODEL=llama2
export CHROMA_DB_PATH=./db
export CHROMA_COLLECTION_NAME=docs

# Then start the server
uvicorn app:app --reload

Troubleshooting

  1. Ollama connection error:

    • Make sure Ollama is running (ollama serve)
    • For Docker: Ensure Ollama is accessible from the container (use host.docker.internal:11434 on Mac/Windows)
    • Check that OLLAMA_HOST is set correctly (should be hostname:port format, not a URL)
  2. Model not found: Ensure the model is installed (ollama pull tinyllama)

  3. Empty query results: Make sure you've added content to the knowledge base first using /add endpoint or embed.py

  4. Port already in use: Change the port with uvicorn app:app --port 8001 or use a different port in Docker: docker run -p 8001:8000 ...

  5. Docker container can't connect to Ollama:

    • Verify Ollama is running on the host: curl http://localhost:11434/api/tags
    • On Mac/Windows: Use host.docker.internal:11434 (default)
    • On Linux, you may need to use --network host or set OLLAMA_HOST to your host's IP
    • Check container logs: docker logs rag-app
  6. Kubernetes/Minikube can't connect to Ollama:

    • For minikube: Use host.docker.internal:11434 as OLLAMA_HOST and enable hostNetwork: true in deployment
    • Verify Ollama is accessible: minikube ssh "curl http://host.docker.internal:11434/api/tags"
    • Check pod logs: kubectl logs -l app=rag-api
  7. Test connections: Use the provided test script:

    python test_connection.py

Recent Updates

Latest Changes

  • Query improvements:
    • Support for multiple results (configurable n_results, max 10)
    • Relevance scores and distance metrics
    • Option to combine all results or use only the best match
    • Enhanced response format with detailed result metadata
  • Environment variable configuration: All settings (model, DB path, collection name) now configurable via environment variables - no code changes needed!
  • Improved error messages: More actionable error messages that help users diagnose issues (connection problems, missing models, empty knowledge base, etc.)
  • DELETE endpoint: Added /delete/{doc_id} endpoint to remove documents from the knowledge base
  • Fixed API request format: Updated curl examples to use proper JSON format with Content-Type: application/json header (removed incorrect -G flag usage)
  • Fixed port configuration: Corrected deployment and service to use port 8000 (matching Dockerfile) instead of 5000
  • Fixed Kubernetes/Minikube Ollama connection: Updated to use host.docker.internal:11434 with hostNetwork: true for accessing host machine's Ollama service
  • Fixed Ollama client response handling: Changed from dictionary access (answer["response"]) to attribute access (answer.response) to match the Ollama Python client API
  • Improved Ollama host configuration: Added automatic protocol stripping for OLLAMA_HOST environment variable to handle both URL and hostname:port formats
  • Docker support: Added Dockerfile and published image to Docker Hub (zag23/rag-app:latest)
  • Connection testing: Added test_connection.py script to verify Ollama and ChromaDB connections

Docker Hub

The working image is available on Docker Hub:

  • Repository: zag23/rag-app
  • Tag: latest
  • Pull command: docker pull zag23/rag-app:latest

License

APACHE 2.0

About

A RAG (Retrieval-Augmented Generation) API built with FastAPI, ChromaDB, and Ollama. Supports environment configuration, document management, and advanced querying with relevance scores.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors