Feature Request: Add Gemma 4 (`gemma4`) architecture support


### Description

Google DeepMind released the [Gemma 4 model family](https://huggingface.co/google/gemma-4-E4B-it) in April 2026. The upstream `llama.cpp` has already merged full support for the `gemma4` architecture (including the `gemma4-iswa` inference backend, `gemma4` tokenizer, and multimodal `GEMMA4V`/`GEMMA4A` vision/audio projectors).

However, the `llama.cpp` version bundled in `LocalLLMClient` currently only supports up to `gemma3n`. Attempting to load a Gemma 4 GGUF model results in:

```
Failed to load model: Failed to load model from file
```

### Environment

- **Device**: iPhone 17 Pro (iOS 19)
- **LocalLLMClient**: `main` branch @ `d420bc8`
- **Model**: `google/gemma-4-E4B-it` → converted to GGUF Q4_K_M via upstream `llama.cpp` build 8818
- **Upstream llama.cpp status**: Gemma 4 fully supported (text + vision + audio)

### Current supported Gemma architectures in bundled llama.cpp

```cpp
// llama-arch.cpp (current)
{ LLM_ARCH_GEMMA,            "gemma"            },
{ LLM_ARCH_GEMMA2,           "gemma2"           },
{ LLM_ARCH_GEMMA3,           "gemma3"           },
{ LLM_ARCH_GEMMA3N,          "gemma3n"          },
{ LLM_ARCH_GEMMA_EMBEDDING,  "gemma-embedding"  },
// ❌ LLM_ARCH_GEMMA4 is missing
```

### Expected behavior

`LocalLLMClient` should be able to load and run Gemma 4 GGUF models (both text-only and multimodal with mmproj) on iOS/macOS, just like it currently supports Gemma 3 and Gemma 3n.

### Key upstream commits/files to sync

The following components need to be synced from upstream `llama.cpp` to add `gemma4` support:

- `src/llama-arch.h` / `src/llama-arch.cpp` — `LLM_ARCH_GEMMA4` registration
- `src/models/gemma4-iswa.cpp` — Gemma 4 inference implementation (ISWA hybrid attention)
- `src/llama-model.cpp` — model loading + graph building for gemma4
- `src/llama-vocab.cpp` — `gemma4` tokenizer (BPE with SPM-style byte fallback)
- `gguf-py/gguf/constants.py` — `MODEL_ARCH.GEMMA4`, PLE tensors, vision/audio projector types
- `tools/mtmd/clip.cpp` / `clip.h` — `GEMMA4V` / `GEMMA4A` projector support

### Why this matters

Gemma 4 E4B is specifically designed for **on-device deployment** (the "E" stands for "Effective parameters" — only 4B active params despite larger total weight due to Per-Layer Embeddings). It's an ideal candidate for mobile inference via `LocalLLMClient`.

### Workaround

Currently using upstream `llama.cpp` via `llama-server` + HTTP as a temporary workaround, but native on-device inference via `LocalLLMClient` would be strongly preferred.

Thank you for this excellent package! 🙏


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add Gemma 4 (`gemma4`) architecture support #90

Description

Environment

Current supported Gemma architectures in bundled llama.cpp

Expected behavior

Key upstream commits/files to sync

Why this matters

Workaround

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Add Gemma 4 (gemma4) architecture support #90

Description

Description

Environment

Current supported Gemma architectures in bundled llama.cpp

Expected behavior

Key upstream commits/files to sync

Why this matters

Workaround

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Feature Request: Add Gemma 4 (`gemma4`) architecture support #90