Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions .claude/skills/release/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
name: release
description: Release flutter_gemma — rebuild JAR, update all version numbers, checksums, CHANGELOG, upload to GitHub release
user_invocable: true
---

# Flutter Gemma Release

Complete release checklist for flutter_gemma plugin. Run as `/release <version>` (e.g. `/release 0.14.0`).

## Pre-flight

Before starting, verify you're on the correct branch and all changes are committed:
```bash
git status
git log --oneline -5
```

## Step 1: Update version numbers

All files that contain the version:

| File | Variable/Field | Example |
|------|---------------|---------|
| `pubspec.yaml` | `version:` | `version: <VERSION>` |
| `ios/flutter_gemma.podspec` | `s.version` | `s.version = '<VERSION>'` |
| `litertlm-server/build.gradle.kts` | `version =` | `version = "<VERSION>"` |
| `CLAUDE.md` | `Current Version:` | `- **Current Version**: <VERSION>` |
| `macos/scripts/setup_desktop.sh:61` | `JAR_VERSION=` | `JAR_VERSION="<VERSION>"` |
| `macos/scripts/prepare_resources.sh:42` | `JAR_VERSION=` | `JAR_VERSION="<VERSION>"` |
| `linux/scripts/setup_desktop.sh:62` | `JAR_VERSION=` | `JAR_VERSION="<VERSION>"` |
| `windows/scripts/setup_desktop.ps1:90` | `$JarVersion =` | `$JarVersion = "<VERSION>"` |

> JAR_URL is auto-derived from JAR_VERSION in all scripts — no separate update needed.

## Step 2: Update CHANGELOG.md

Add new section at top with all changes. Categories: features, fixes, breaking changes.

## Step 3: Build JAR

```bash
cd litertlm-server && ./gradlew fatJar
```

Verify build success. JAR output: `litertlm-server/build/libs/litertlm-server-<VERSION>-all.jar`

## Step 4: Compute new SHA256

```bash
shasum -a 256 litertlm-server/build/libs/litertlm-server-*-all.jar
```

## Step 5: Update JAR checksums in all 4 scripts

| File | Variable |
|------|----------|
| `macos/scripts/setup_desktop.sh:63` | `JAR_CHECKSUM="<sha256>"` |
| `macos/scripts/prepare_resources.sh:44` | `JAR_CHECKSUM="<sha256>"` |
| `linux/scripts/setup_desktop.sh:64` | `JAR_CHECKSUM="<sha256>"` |
| `windows/scripts/setup_desktop.ps1:92` | `$JarChecksum = "<sha256>"` |

JAR is cross-platform (JVM bytecode) — same checksum for all platforms.

## Step 6: Verify

```bash
flutter analyze # 0 errors
flutter test # all pass
dart pub publish --dry-run # 0 warnings
```

**NEVER publish without dry-run first.** Publishing is IRREVERSIBLE.

## Step 7: Create/update GitHub release

```bash
# Create new release
gh release create v<VERSION> \
litertlm-server/build/libs/litertlm-server-<VERSION>-all.jar \
--title "v<VERSION>" \
--notes-file CHANGELOG_EXCERPT.md

# OR update existing release (delete old JAR first)
gh release delete-asset v<VERSION> litertlm-server.jar --yes 2>/dev/null
gh release upload v<VERSION> litertlm-server/build/libs/litertlm-server-<VERSION>-all.jar
```

Verify JAR URL returns 200:
```bash
curl -sI "https://github.com/DenisovAV/flutter_gemma/releases/download/v<VERSION>/litertlm-server.jar" | head -1
```

## Step 8: Commit & PR

- Author: `--author="Sasha Denisov <denisov.shureg@gmail.com>"`
- No AI attribution in commit messages
- No "Co-Authored-By" or "Generated with Claude" footers
- Create PR via `gh pr create`

## Step 9: After merge — publish

```bash
dart pub publish --dry-run # verify one more time
dart pub publish # only after user approval!
```
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
## 0.13.1
- **LiteRT-LM 0.10.0**: Updated Android and JVM SDK from 0.9.0 to 0.10.0
- **Gemma 4 Thinking Mode**: `isThinking: true` now works with Gemma 4 E2B/E4B models (Android, iOS, Desktop; not Web)
- **Fix cancel download**: Cancel download now works correctly (#196)
- **Fix `large_file_handler` platform support**: Conditional imports for pub.dev platform analysis compatibility

## 0.13.0
- **Gemma 4 E2B/E4B**: Added support for next-gen multimodal models (text + image + audio)
- **systemInstruction**: New parameter in `createChat()` and `createSession()` for setting system-level context
Expand Down
16 changes: 10 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ final token = const String.fromEnvironment('HF_TOKEN');
- 🔥 **Local AI Inference** - Run Gemma models directly on device
- 🖼️ **Multimodal Support** - Text + Image input with Gemma 3 Nano
- 🛠️ **Function Calling** - Enable models to call external functions
- 🧠 **Thinking Mode** - View reasoning process of DeepSeek models
- 🧠 **Thinking Mode** - View reasoning process of DeepSeek and Gemma 4 models
- 📱 **Cross-Platform** - Android, iOS, Web, macOS, Windows, Linux
- ⚡ **GPU Acceleration** - Hardware-accelerated inference
- 🔧 **LoRA Support** - Efficient fine-tuning weights
Expand Down Expand Up @@ -401,6 +401,8 @@ Future<void> close() async {

| Model Family | Function Calling | Thinking Mode | Multimodal | Platform Support |
|--------------|------------------|---------------|------------|------------------|
| Gemma 4 E2B | ✅ | ✅ ¹ | ✅ | Android, iOS, Web, Desktop |
| Gemma 4 E4B | ✅ | ✅ ¹ | ✅ | Android, iOS, Web, Desktop |
| Gemma 3 Nano | ✅ | ❌ | ✅ | Android, iOS, Web |
| Gemma 3 270M | ❌ | ❌ | ❌ | Android, iOS, Web |
| Gemma-3 1B | ✅ | ❌ | ❌ | Android, iOS, Web |
Expand All @@ -411,6 +413,8 @@ Future<void> close() async {
| Qwen2.5 | ✅ | ❌ | ❌ | Android, iOS, Web |
| Phi-4 | ❌ | ❌ | ❌ | Android, iOS, Web |

> ¹ Thinking Mode for Gemma 4: Android, iOS, Desktop only. Web (MediaPipe) does not support `extraContext`.

### Platform Limitations

| Platform | Vision/Multimodal | Audio | Embeddings | Notes |
Expand Down Expand Up @@ -457,10 +461,10 @@ dev_dependencies:

### MediaPipe GenAI Integration

- **Current Version Web**: v0.10.26
- **Current Version Web**: v0.10.27
- **Current Version Android**: v0.10.33
- **Current Version iOS**: v0.10.33
- **Web CDN**: `https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@0.10.26`
- **Web CDN**: `https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@0.10.27`
- **iOS/Android**: Integrated via CocoaPods/Gradle

## Development Best Practices
Expand Down Expand Up @@ -623,7 +627,7 @@ Log.w(TAG, "sizeInTokens: LiteRT-LM does not support token counting. " +

**Dependency (build.gradle):**
```gradle
implementation 'com.google.ai.edge.litertlm:litertlm-android:0.9.0-beta'
implementation 'com.google.ai.edge.litertlm:litertlm-android:0.10.0'
```

**Usage (Dart - no changes required):**
Expand All @@ -642,7 +646,7 @@ await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
```html
<!-- index.html -->
<script type="module">
import { FilesetResolver, LlmInference } from 'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@0.10.26';
import { FilesetResolver, LlmInference } from 'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@0.10.27';
window.FilesetResolver = FilesetResolver;
window.LlmInference = LlmInference;
</script>
Expand Down Expand Up @@ -1243,7 +1247,7 @@ flutter_gemma/

- **GitHub**: https://github.com/DenisovAV/flutter_gemma
- **Pub.dev**: https://pub.dev/packages/flutter_gemma
- **Current Version**: 0.13.0
- **Current Version**: 0.13.1
- **License**: Check repository for license details
- **Issues**: Report bugs via GitHub Issues
- **Changelog**: See `CHANGELOG.md` for version history
37 changes: 25 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

**The plugin supports not only Gemma, but also other models. Here's the full list of supported models:** [Gemma 4 E2B/E4B](https://huggingface.co/google/gemma-4-E2B-it-litert-lm), [Gemma3n E2B/E4B](https://huggingface.co/google/gemma-3n-E2B-it-litert-preview), [FastVLM 0.5B](https://huggingface.co/litert-community/FastVLM-0.5B), [Gemma-3 1B](https://huggingface.co/litert-community/Gemma3-1B-IT), [Gemma 3 270M](https://huggingface.co/litert-community/gemma-3-270m-it), [FunctionGemma 270M](https://huggingface.co/sasha-denisov/function-gemma-270M-it), [Qwen3 0.6B](https://huggingface.co/litert-community/Qwen3-0.6B), [Qwen 2.5](https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct), [Phi-4 Mini](https://huggingface.co/litert-community/Phi-4-mini-instruct), [DeepSeek R1](https://huggingface.co/litert-community/DeepSeek-R1-Distill-Qwen-1.5B), [SmolLM 135M](https://huggingface.co/litert-community/SmolLM-135M-Instruct).

*Note: The flutter_gemma plugin supports Gemma3n (with **multimodal vision and audio support**), FastVLM (vision), Gemma-3, FunctionGemma, Qwen3, Qwen 2.5, Phi-4, DeepSeek R1 and SmolLM. Desktop platforms (macOS, Windows, Linux) require `.litertlm` model format.
*Note: The flutter_gemma plugin supports Gemma 4 and Gemma3n (with **multimodal vision and audio support**), FastVLM (vision), Gemma-3, FunctionGemma, Qwen3, Qwen 2.5, Phi-4, DeepSeek R1 and SmolLM. Desktop platforms (macOS, Windows, Linux) require `.litertlm` model format.

[Gemma](https://ai.google.dev/gemma) is a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models

Expand All @@ -32,7 +32,7 @@ There is an example of using:
- **🖼️ Multimodal Support:** Text + Image input with Gemma3n vision models
- **🎙️ Audio Input:** Record and send audio messages with Gemma3n E2B/E4B models (Android, Desktop - LiteRT-LM engine)
- **🛠️ Function Calling:** Enable your models to call external functions and integrate with other services (supported by select models)
- **🧠 Thinking Mode:** View the reasoning process of DeepSeek models with <think> blocks
- **🧠 Thinking Mode:** View the reasoning process of DeepSeek and Gemma 4 models with thinking blocks
- **🛑 Stop Generation:** Cancel text generation mid-process on Android, Web, and Desktop
- **⚙️ Backend Switching:** Choose between CPU and GPU backends for each model individually in the example app
- **🔍 Advanced Model Filtering:** Filter models by features (Multimodal, Function Calls, Thinking) with expandable UI
Expand Down Expand Up @@ -72,8 +72,8 @@ The example app offers a curated list of models, each suited for different tasks

| Model Family | Best For | Function Calling | Thinking Mode | Vision | Languages | Size |
|---|---|:---:|:---:|:---:|---|---|
| **Gemma 4 E2B** | Next-gen multimodal chat — text, image, audio | ✅ | | ✅ | Multilingual | 2.4GB |
| **Gemma 4 E4B** | Next-gen multimodal chat — text, image, audio | ✅ | | ✅ | Multilingual | 4.3GB |
| **Gemma 4 E2B** | Next-gen multimodal chat — text, image, audio | ✅ | | ✅ | Multilingual | 2.4GB |
| **Gemma 4 E4B** | Next-gen multimodal chat — text, image, audio | ✅ | | ✅ | Multilingual | 4.3GB |
| **Gemma3n** | On-device multimodal chat and image analysis | ✅ | ❌ | ✅ | Multilingual | 3-6GB |
| **FastVLM 0.5B** | Fast vision-language inference | ❌ | ❌ | ✅ | Multilingual | 0.5GB |
| **Phi-4 Mini** | Advanced reasoning and instruction following | ✅ | ❌ | ❌ | Multilingual | 3.9GB |
Expand Down Expand Up @@ -1544,11 +1544,11 @@ FunctionGemma uses a special format (different from JSON-based function calling)

The `flutter_gemma` plugin handles this format automatically via `FunctionCallParser`.

9. **🧠 Thinking Mode (DeepSeek Models)**
9. **🧠 Thinking Mode (DeepSeek & Gemma 4 Models)**

DeepSeek models support "thinking mode" where you can see the model's reasoning process before it generates the final response. This provides transparency into how the model approaches problems.
DeepSeek and Gemma 4 (E2B/E4B) models support "thinking mode" where you can see the model's reasoning process before it generates the final response. This provides transparency into how the model approaches problems.

**Enable Thinking Mode:**
**Enable Thinking Mode (DeepSeek):**

```dart
final chat = await inferenceModel.createChat(
Expand All @@ -1559,7 +1559,6 @@ final chat = await inferenceModel.createChat(
modelType: ModelType.deepSeek, // Required for DeepSeek models
supportsFunctionCalls: true, // DeepSeek also supports function calls
tools: _tools, // Optional: add tools for function calling
// tokenBuffer: 256, // Token buffer for context management
);
```

Expand All @@ -1586,12 +1585,25 @@ chat.generateChatResponseAsync().listen((response) {
});
```

**Enable Thinking Mode (Gemma 4):**

```dart
final chat = await inferenceModel.createChat(
temperature: 1.0,
topK: 64,
topP: 0.95,
isThinking: true, // Enable thinking mode
modelType: ModelType.gemmaIt, // Gemma 4 E2B/E4B
);
// <|think|> is auto-injected into systemInstruction — no manual prompt needed.
```

**Thinking Mode Features:**
- ✅ **Transparent Reasoning**: See how the model thinks through problems
- ✅ **Interactive UI**: Show/hide thinking bubbles with expandable content
- ✅ **Streaming Support**: Thinking content streams in real-time
- ✅ **Function Integration**: Models can think before calling functions
- ✅ **DeepSeek Optimized**: Designed specifically for DeepSeek model architecture
- ✅ **Supported Models**: DeepSeek R1 and Gemma 4 E2B/E4B

**Example Thinking Flow:**
1. User asks: "Change the background to blue and explain why blue is calming"
Expand Down Expand Up @@ -2096,7 +2108,7 @@ Function calling is currently supported by the following models:
| **Image Input (Multimodal)** | ✅ Full | ✅ Full | ✅ Full | ⚠️ Broken (#684) | macOS: model hallucinates |
| **Audio Input** | ✅ Full | ✅ Full | ❌ Not supported | ✅ Full | Gemma3n E2B/E4B |
| **Function Calling** | ✅ Full | ✅ Full | ✅ Full | ❌ Not supported | LiteRT-LM limitation |
| **Thinking Mode** | ✅ Full | ✅ Full | ✅ Full | ❌ Not supported | DeepSeek models |
| **Thinking Mode** | ✅ Full | ✅ Full | ✅ Full | ✅ Full | DeepSeek & Gemma 4 |
| **Stop Generation** | ✅ Full | ✅ Full | ✅ Full | ✅ Full | Cancel mid-process |
| **GPU Acceleration** | ✅ Full | ✅ Full | ✅ Full | ⚠️ Partial | macOS GPU broken |
| **NPU Acceleration** | ✅ Full | ❌ Not supported | ❌ Not supported | ❌ Not supported | Android only (.litertlm) |
Expand Down Expand Up @@ -2264,13 +2276,14 @@ import 'package:flutter_gemma/core/extensions.dart';

// Clean response based on model type
String cleanedResponse = ModelThinkingFilter.cleanResponse(
rawResponse,
rawResponse,
ModelType.deepSeek
);

// The filter automatically removes model-specific tokens like:
// - <end_of_turn> tags (Gemma models)
// - Special DeepSeek tokens
// - <think>...</think> blocks (DeepSeek)
// - <|channel>thought\n...<channel|> blocks (Gemma 4 E2B/E4B)
// - Extra whitespace and formatting
```

Expand Down
2 changes: 1 addition & 1 deletion android/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ dependencies {
implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-guava:1.9.0'

// LiteRT-LM Engine for .litertlm model files
implementation 'com.google.ai.edge.litertlm:litertlm-android:0.9.0'
implementation 'com.google.ai.edge.litertlm:litertlm-android:0.10.0'

implementation 'androidx.core:core-ktx:1.12.0'
implementation 'androidx.lifecycle:lifecycle-runtime-ktx:2.7.0'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,8 @@ private class PlatformServiceImpl(
private val engineLock = Any() // Lock for thread-safe engine access

// NEW: Use InferenceEngine abstraction instead of InferenceModel
private var engine: InferenceEngine? = null
private var session: InferenceSession? = null
@Volatile private var engine: InferenceEngine? = null
@Volatile private var session: InferenceSession? = null

// RAG components
private var embeddingModel: EmbeddingModel? = null
Expand Down Expand Up @@ -130,6 +130,9 @@ private class PlatformServiceImpl(

// Only now clear old state and swap in new engine (thread-safe)
synchronized(engineLock) {
// Cancel stale stream collector before replacing engine
streamJob?.cancel()
streamJob = null
session?.cancelGeneration()
try {
session?.close()
Expand Down Expand Up @@ -176,6 +179,7 @@ private class PlatformServiceImpl(
enableVisionModality: Boolean?,
enableAudioModality: Boolean?,
systemInstruction: String?,
enableThinking: Boolean?,
callback: (Result<Unit>) -> Unit
) {
scope.launch {
Expand All @@ -193,6 +197,7 @@ private class PlatformServiceImpl(
enableVisionModality = enableVisionModality,
enableAudioModality = enableAudioModality,
systemInstruction = systemInstruction,
enableThinking = enableThinking ?: false,
)

session?.close()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ private open class PigeonInterfacePigeonCodec : StandardMessageCodec() {
interface PlatformService {
fun createModel(maxTokens: Long, modelPath: String, loraRanks: List<Long>?, preferredBackend: PreferredBackend?, maxNumImages: Long?, supportAudio: Boolean?, callback: (Result<Unit>) -> Unit)
fun closeModel(callback: (Result<Unit>) -> Unit)
fun createSession(temperature: Double, randomSeed: Long, topK: Long, topP: Double?, loraPath: String?, enableVisionModality: Boolean?, enableAudioModality: Boolean?, systemInstruction: String?, callback: (Result<Unit>) -> Unit)
fun createSession(temperature: Double, randomSeed: Long, topK: Long, topP: Double?, loraPath: String?, enableVisionModality: Boolean?, enableAudioModality: Boolean?, systemInstruction: String?, enableThinking: Boolean?, callback: (Result<Unit>) -> Unit)
fun closeSession(callback: (Result<Unit>) -> Unit)
fun sizeInTokens(prompt: String, callback: (Result<Long>) -> Unit)
fun addQueryChunk(prompt: String, callback: (Result<Unit>) -> Unit)
Expand Down Expand Up @@ -315,7 +315,8 @@ interface PlatformService {
val enableVisionModalityArg = args[5] as Boolean?
val enableAudioModalityArg = args[6] as Boolean?
val systemInstructionArg = args[7] as String?
api.createSession(temperatureArg, randomSeedArg, topKArg, topPArg, loraPathArg, enableVisionModalityArg, enableAudioModalityArg, systemInstructionArg) { result: Result<Unit> ->
val enableThinkingArg = args[8] as Boolean?
api.createSession(temperatureArg, randomSeedArg, topKArg, topPArg, loraPathArg, enableVisionModalityArg, enableAudioModalityArg, systemInstructionArg, enableThinkingArg) { result: Result<Unit> ->
val error = result.exceptionOrNull()
if (error != null) {
reply.reply(wrapError(error))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ data class SessionConfig(
val enableVisionModality: Boolean? = null,
val enableAudioModality: Boolean? = null,
val systemInstruction: String? = null,
val enableThinking: Boolean = false,
)

/**
Expand Down
Loading
Loading