This sample demonstrates integration with local Ollama models using Genkit Java.
- Ollama Plugin Setup - Configure Genkit with local Ollama models
- Flow Definitions - Create observable, traceable AI workflows
- Text Generation - Generate text with Gemma 3n
- Streaming - Real-time response streaming
- Code Generation - Generate code
- Creative Writing - Generate creative content
- Translation & Summarization - Language tasks
This sample uses gemma3n:e4b - Google Gemma 3n Edge 4B model.
- Java 21+
- Maven 3.6+
- Ollama installed and running
- Download and install Ollama from https://ollama.ai
- Pull the models you want to use:
# Pull the model
ollama pull gemma3n:e4b- Verify Ollama is running:
ollama list# Navigate to the sample directory
cd java/samples/ollama
# Run the sample
./run.sh
# Or: mvn compile exec:java# Navigate to the sample directory
cd java/samples/ollama
# Run with Genkit CLI
genkit start -- ./run.shThe Dev UI will be available at http://localhost:4000
If Ollama is running on a different host:
export OLLAMA_HOST=http://your-ollama-server:11434
./run.sh| Flow | Input | Output | Description |
|---|---|---|---|
greeting |
String (name) | String | Simple greeting flow |
chat |
String (message) | String | Chat with Gemma |
tellJoke |
String (topic) | String | Generate a joke |
streamingChat |
String (message) | String | Streaming chat |
generateCode |
String (prompt) | String | Code generation (streaming) |
quickAnswer |
String (question) | String | Fast, brief answers |
creativeWriting |
String (prompt) | String | Creative writing (streaming) |
translate |
String (text) | String | Translate to Spanish |
summarize |
String (text) | String | Text summarization |
Once the server is running on port 8080:
curl -X POST http://localhost:8080/api/flows/greeting \
-H 'Content-Type: application/json' \
-d '"World"'curl -X POST http://localhost:8080/api/flows/chat \
-H 'Content-Type: application/json' \
-d '"What is the capital of France?"'curl -X POST http://localhost:8080/api/flows/tellJoke \
-H 'Content-Type: application/json' \
-d '"programming"'curl -X POST http://localhost:8080/api/flows/streamingChat \
-H 'Content-Type: application/json' \
-d '"Explain quantum computing"'curl -X POST http://localhost:8080/api/flows/generateCode \
-H 'Content-Type: application/json' \
-d '"Write a Python function to find prime numbers up to n"'curl -X POST http://localhost:8080/api/flows/quickAnswer \
-H 'Content-Type: application/json' \
-d '"What is 2+2?"'curl -X POST http://localhost:8080/api/flows/creativeWriting \
-H 'Content-Type: application/json' \
-d '"Write a short story about a robot learning to paint"'curl -X POST http://localhost:8080/api/flows/translate \
-H 'Content-Type: application/json' \
-d '"Hello, how are you?"'curl -X POST http://localhost:8080/api/flows/summarize \
-H 'Content-Type: application/json' \
-d '"The quick brown fox jumps over the lazy dog. This sentence contains every letter of the English alphabet."'The Ollama plugin can be configured with the following options:
OllamaPlugin plugin = new OllamaPlugin(
OllamaPluginOptions.builder()
.baseUrl("http://localhost:11434") // Or use OLLAMA_HOST env var
.timeout(300) // Request timeout in seconds
.models("gemma3n:e4b") // Model to register
.build()
);Stream responses for real-time output:
genkit.generateStream(
GenerateOptions.builder()
.model("ollama/gemma3n:e4b")
.prompt("Tell me a story")
.build(),
(chunk) -> {
System.out.print(chunk.getText());
});Request JSON-formatted responses:
genkit.generate(
GenerateOptions.builder()
.model("ollama/gemma3n:e4b")
.prompt("List 3 colors as JSON")
.output(OutputConfig.builder()
.format(OutputFormat.JSON)
.build())
.build());- Enable GPU acceleration - Ollama automatically uses GPU if available
- Adjust context window - Smaller context = faster responses
- Use streaming - Better UX for longer responses
Ensure Ollama is running:
ollama servePull the required model:
ollama pull gemma3n:e4b- Check if GPU is being used:
ollama ps - Reduce
maxOutputTokens
- Use a smaller model
- Reduce context length
- Close other applications