Genkit RAG Sample

This sample demonstrates how to build RAG (Retrieval Augmented Generation) applications with Genkit Java using a local vector store for development.

Features Demonstrated

Local Vector Store Plugin: File-based vector storage for development and testing
Document Indexing: Index documents from various sources
Semantic Retrieval: Find relevant documents using embeddings
RAG Flows: Combine retrieval with LLM generation
Multiple Knowledge Bases: Separate vector stores for different domains

Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Index Flow     │────▶│  Local Vec Store │◀────│  Retrieve Flow  │
│  (documents)    │     │  (embeddings)    │     │  (query)        │
└─────────────────┘     └──────────────────┘     └────────┬────────┘
                                                          │
                                                          ▼
                        ┌──────────────────┐     ┌─────────────────┐
                        │  OpenAI LLM      │◀────│  RAG Flow       │
                        │  (generation)    │     │  (answer)       │
                        └──────────────────┘     └─────────────────┘

Knowledge Bases

The sample includes three pre-configured knowledge bases:

world-capitals: Information about capital cities around the world
dog-breeds: Facts about popular dog breeds
coffee-facts: Information about coffee and brewing methods

Prerequisites

Java 21+
Maven 3.6+
OpenAI API key

Running the Sample

Option 1: Direct Run

# Set your OpenAI API key
export OPENAI_API_KEY=your-api-key-here

# Navigate to the sample directory
cd java/samples/rag

# Run the sample
./run.sh
# Or: mvn compile exec:java

Option 2: With Genkit Dev UI (Recommended)

# Set your OpenAI API key
export OPENAI_API_KEY=your-api-key-here

# Navigate to the sample directory
cd java/samples/rag

# Run with Genkit CLI
genkit start -- ./run.sh

The Dev UI will be available at http://localhost:4000

Usage

Step 1: Index the Data

Before querying, you need to index the documents:

# Index world capitals
curl -X POST http://localhost:8080/indexWorldCapitals

# Index dog breeds
curl -X POST http://localhost:8080/indexDogBreeds

# Index coffee facts
curl -X POST http://localhost:8080/indexCoffeeFacts

Step 2: Query the Knowledge Bases

# Ask about world capitals
curl -X POST http://localhost:8080/askAboutCapitals \
  -H 'Content-Type: application/json' \
  -d '"What is the capital of France and what is it known for?"'

# Ask about dogs
curl -X POST http://localhost:8080/askAboutDogs \
  -H 'Content-Type: application/json' \
  -d '"What are good dog breeds for families with children?"'

# Ask about coffee
curl -X POST http://localhost:8080/askAboutCoffee \
  -H 'Content-Type: application/json' \
  -d '"How do you make espresso and what is a cappuccino?"'

Step 3: Retrieve Documents Without Generation

# Just retrieve relevant documents
curl -X POST http://localhost:8080/retrieveDocuments \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "France capital",
    "store": "world-capitals",
    "k": 2
  }'

Step 4: Index Custom Documents

curl -X POST http://localhost:8080/indexDocuments \
  -H 'Content-Type: application/json' \
  -d '[
    "The first fact about my topic.",
    "The second fact about my topic.",
    "The third fact about my topic."
  ]'

How It Works

Indexing

Documents are loaded from text files (one paragraph = one document)
Each document is converted to an embedding using OpenAI's embedding model
Documents and embeddings are stored in a JSON file on disk

Retrieval

The query is converted to an embedding
Cosine similarity is computed between the query and all stored documents
The top-k most similar documents are returned

Generation

Retrieved documents are formatted as context
The context and question are combined into a prompt
The LLM generates an answer based on the context

Local Vector Store

The local vector store is designed for development and testing only. For production, use a proper vector database like:

Pinecone
Chroma
Weaviate
pgvector (PostgreSQL)
Vertex AI Vector Search

Storage Location

Documents are stored in JSON files at:

{java.io.tmpdir}/genkit-rag-sample/__db_{index-name}.json

Adding Your Own Data

Create a text file with your content (paragraphs separated by blank lines)
Place it in src/main/resources/data/
Create a new LocalVecConfig for your data
Define indexing and query flows

Development UI

Access the Genkit Development UI at http://localhost:3100 to:

Browse available flows, indexers, and retrievers
Test flows interactively
View execution traces
Inspect indexed documents

Troubleshooting

Empty Results

Make sure you've indexed the documents first
Check that the embedding model is working correctly

Slow Indexing

The first indexing takes longer due to embedding computation
Subsequent runs use cached embeddings

Out of Memory

For large datasets, consider batch indexing
Use a proper vector database for production

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Genkit RAG Sample

Features Demonstrated

Architecture

Knowledge Bases

Prerequisites

Running the Sample

Option 1: Direct Run

Option 2: With Genkit Dev UI (Recommended)

Usage

Step 1: Index the Data

Step 2: Query the Knowledge Bases

Step 3: Retrieve Documents Without Generation

Step 4: Index Custom Documents

How It Works

Indexing

Retrieval

Generation

Local Vector Store

Storage Location

Adding Your Own Data

Development UI

Troubleshooting

Empty Results

Slow Indexing

Out of Memory

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Genkit RAG Sample

Features Demonstrated

Architecture

Knowledge Bases

Prerequisites

Running the Sample

Option 1: Direct Run

Option 2: With Genkit Dev UI (Recommended)

Usage

Step 1: Index the Data

Step 2: Query the Knowledge Bases

Step 3: Retrieve Documents Without Generation

Step 4: Index Custom Documents

How It Works

Indexing

Retrieval

Generation

Local Vector Store

Storage Location

Adding Your Own Data

Development UI

Troubleshooting

Empty Results

Slow Indexing

Out of Memory