Skip to content

Maxhbr/diffusion models#56

Open
maxhbr wants to merge 7 commits into
masterfrom
maxhbr/diffusion_models
Open

Maxhbr/diffusion models#56
maxhbr wants to merge 7 commits into
masterfrom
maxhbr/diffusion_models

Conversation

@maxhbr

@maxhbr maxhbr commented Jun 19, 2026

Copy link
Copy Markdown
Owner

No description provided.

Maximilian Huber added 7 commits June 19, 2026 14:17
Signed-off-by: Maximilian Huber <oss@maximilian-huber.de>
Add diffusiongemma-26B-A4B-it-GGUF from unsloth as an RTX model
with 262k context, Q6_K quantization, and alias diffusiongemma-26B.
- Create diffusionllama-cpp overlay on thing host, building llama.cpp
  from PR #24423 (diffusion-gemma GPU offload support)
- Add diffusionLlamaCpp option to myconfig.ai.llama-cpp module
- Add diffusionCUDA device prefix support to devices.nix:
  - mkGuardDevice maps diffusionCUDA to "nvidia" GPU variant
  - backendForDevice returns "diffusion" backend tag
  - llamaServerForDiffusion / llamaBenchForDiffusion helpers
- Wire diffusion package through scripts.nix, router.nix, llama-swap.nix
  via lib/default.nix parameter threading
- Deploy diffusiongemma-26B-A4B-it-Q6_K on diffusionCUDA0 device

DiffusionCUDA0 sets LLAMA_ARG_DEVICE=diffusionCUDA0 so the patched
llama.cpp binary routes the model through the diffusion-gemma GPU
offload path.
Use the real sha256 from the PR #24423 tarball instead of lib.fakeHash.
The diffusionCUDA prefix is only for Nix-side routing to pick the
patched binary. The actual llama.cpp env var expects standard CUDA0.
Override postBuild to compile the llama-diffusion-cli target
from PR #24423, and route diffusionCUDA devices through it
instead of the standard llama-server binary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant