This sample demonstrates how to build RAG (Retrieval Augmented Generation) applications with Genkit Java using a local vector store for development.
- Local Vector Store Plugin: File-based vector storage for development and testing
- Document Indexing: Index documents from various sources
- Semantic Retrieval: Find relevant documents using embeddings
- RAG Flows: Combine retrieval with LLM generation
- Multiple Knowledge Bases: Separate vector stores for different domains
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Index Flow │────▶│ Local Vec Store │◀────│ Retrieve Flow │
│ (documents) │ │ (embeddings) │ │ (query) │
└─────────────────┘ └──────────────────┘ └────────┬────────┘
│
▼
┌──────────────────┐ ┌─────────────────┐
│ OpenAI LLM │◀────│ RAG Flow │
│ (generation) │ │ (answer) │
└──────────────────┘ └─────────────────┘
The sample includes three pre-configured knowledge bases:
- world-capitals: Information about capital cities around the world
- dog-breeds: Facts about popular dog breeds
- coffee-facts: Information about coffee and brewing methods
- Java 21+
- Maven 3.6+
- OpenAI API key
# Set your OpenAI API key
export OPENAI_API_KEY=your-api-key-here
# Navigate to the sample directory
cd java/samples/rag
# Run the sample
./run.sh
# Or: mvn compile exec:java# Set your OpenAI API key
export OPENAI_API_KEY=your-api-key-here
# Navigate to the sample directory
cd java/samples/rag
# Run with Genkit CLI
genkit start -- ./run.shThe Dev UI will be available at http://localhost:4000
Before querying, you need to index the documents:
# Index world capitals
curl -X POST http://localhost:8080/indexWorldCapitals
# Index dog breeds
curl -X POST http://localhost:8080/indexDogBreeds
# Index coffee facts
curl -X POST http://localhost:8080/indexCoffeeFacts# Ask about world capitals
curl -X POST http://localhost:8080/askAboutCapitals \
-H 'Content-Type: application/json' \
-d '"What is the capital of France and what is it known for?"'
# Ask about dogs
curl -X POST http://localhost:8080/askAboutDogs \
-H 'Content-Type: application/json' \
-d '"What are good dog breeds for families with children?"'
# Ask about coffee
curl -X POST http://localhost:8080/askAboutCoffee \
-H 'Content-Type: application/json' \
-d '"How do you make espresso and what is a cappuccino?"'# Just retrieve relevant documents
curl -X POST http://localhost:8080/retrieveDocuments \
-H 'Content-Type: application/json' \
-d '{
"query": "France capital",
"store": "world-capitals",
"k": 2
}'curl -X POST http://localhost:8080/indexDocuments \
-H 'Content-Type: application/json' \
-d '[
"The first fact about my topic.",
"The second fact about my topic.",
"The third fact about my topic."
]'- Documents are loaded from text files (one paragraph = one document)
- Each document is converted to an embedding using OpenAI's embedding model
- Documents and embeddings are stored in a JSON file on disk
- The query is converted to an embedding
- Cosine similarity is computed between the query and all stored documents
- The top-k most similar documents are returned
- Retrieved documents are formatted as context
- The context and question are combined into a prompt
- The LLM generates an answer based on the context
The local vector store is designed for development and testing only. For production, use a proper vector database like:
- Pinecone
- Chroma
- Weaviate
- pgvector (PostgreSQL)
- Vertex AI Vector Search
Documents are stored in JSON files at:
{java.io.tmpdir}/genkit-rag-sample/__db_{index-name}.json
- Create a text file with your content (paragraphs separated by blank lines)
- Place it in
src/main/resources/data/ - Create a new
LocalVecConfigfor your data - Define indexing and query flows
Access the Genkit Development UI at http://localhost:3100 to:
- Browse available flows, indexers, and retrievers
- Test flows interactively
- View execution traces
- Inspect indexed documents
- Make sure you've indexed the documents first
- Check that the embedding model is working correctly
- The first indexing takes longer due to embedding computation
- Subsequent runs use cached embeddings
- For large datasets, consider batch indexing
- Use a proper vector database for production