FluidInference
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 1 deletion b/‎.gitignore‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎models/AGENTS.md‎
Lines changed: 37 additions & 0 deletions b/‎models/AGENTS.md‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎models/speaker-diarization/pyannote-community-1/LICENSE‎
Lines changed: 426 additions & 0 deletions b/‎models/speaker-diarization/pyannote-community-1/LICENSE‎
Lines changed: 426 additions & 0 deletions
diff --git a/‎models/speaker-diarization/pyannote-community-1/README.md‎
Lines changed: 38 additions & 0 deletions b/‎models/speaker-diarization/pyannote-community-1/README.md‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎models/speaker-diarization/pyannote-community-1/coreml/.gitignore‎
Lines changed: 5 additions & 0 deletions b/‎models/speaker-diarization/pyannote-community-1/coreml/.gitignore‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎models/speaker-diarization/pyannote-community-1/coreml/README.md‎
Lines changed: 92 additions & 0 deletions b/‎models/speaker-diarization/pyannote-community-1/coreml/README.md‎
Lines changed: 92 additions & 0 deletions
@@ -1,3 +1,5 @@
 __pycache__
 .DS_Store
-.venv
+.venv
+
+*.wav
@@ -0,0 +1,37 @@
+# Repository Guidelines
+
+## Project Structure & Module Organization
+- Code lives under `models/{class}/{model}/{target}`; mirror existing patterns like `vad/silero-vad/coreml`.
+- Each target directory is self-contained: `pyproject.toml`, `uv.lock`, conversion scripts, docs, and sample assets.
+- Keep `README.md`/`CITATION.cff` next to the model. Push large binaries to Hugging Face and reference them here.
+
+## Build, Test, and Development Commands
+Run these from the target directory (Python 3.10.12):
+- `uv sync` — create/refresh the env defined by `pyproject.toml`.
+- `uv run python convert-coreml.py --output-dir ./build/<name>` — run conversion and emit CoreML bundles.
+- `uv run python compare-models.py --audio-file <path> --coreml-dir <dir>` — benchmark converted models (if present).
+- `uv run python test.py` — execute the model-specific smoke test.
+
+## Deployment Targets & Runtime Tips
+- Trace with `.CpuOnly`. Target iOS 17+ and macOS 14+.
+- Use `uv` for reproducible installs; avoid system Python.
+- Keep bundles small; prefer float16 where supported.
+
+## Coding Style & Naming Conventions
+- 4-space indentation, type hints when practical, and double-quoted strings.
+- Lowercase-kebab-case for files/dirs; mirror upstream model names and runtime targets (`coreml`, `onnx`, etc.).
+- When packaging libraries, place importable code under `src/<package>` and expose CLIs via `if __name__ == "__main__": main()`.
+
+## Testing Guidelines
+- Ship a runnable sanity check using bundled assets (e.g., `yc_first_minute.wav`) and verify end-to-end output.
+- Prefer deterministic assertions or concise summary prints; record expected metrics/speedups for benchmarking utilities.
+- Document prerequisites such as `git lfs install` before fetching large checkpoints.
+
+## Commit & Pull Request Guidelines
+- Commits: concise, imperative subjects; append issue numbers when relevant (e.g., `Move parakeet to the right folder (#4)`).
+- Pull requests: describe the model, destination runtime, conversion steps, and validation evidence (logs, plots, or HF links). Call out deviations, new dependencies, and follow-up work.
+
+## Model Assets & Distribution
+- Store heavy weights, notebooks, and rendered plots externally (Hugging Face Hub). Include download instructions or automation scripts.
+- Verify upstream license compliance before redistribution.
+
@@ -0,0 +1,38 @@
+
+# pyannote/speaker-diarization-community-1
+
+Made possible by: [speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1)
+
+```text
+@inproceedings{
+  author={Fluid Inference},
+  title={{Speaker diarization via Core ML}},
+  year=2025,
+}
+
+Speaker segmentation model
+@inproceedings{Plaquet23,
+  author={Alexis Plaquet and Hervé Bredin},
+  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
+  year=2023,
+  booktitle={Proc. INTERSPEECH 2023},
+}
+
+Speaker embedding model
+@inproceedings{Wang2023,
+  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
+  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
+  booktitle={ICASSP 2023, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
+  pages={1--5},
+  year={2023},
+  organization={IEEE}
+}
+
+Speaker clustering
+@article{Landini2022,
+  author={Landini, Federico and Profant, J{\'a}n and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
+  title={{Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks}},
+  year={2022},
+  journal={Computer Speech \& Language},
+}
+```
@@ -0,0 +1,5 @@
+pyannote-speaker-diarization-community-1/
+coreml_models/
+.matplotlib_cache/
+
+build/
@@ -0,0 +1,92 @@
+# pyannote-coreml
+
+This Core ML port of the Hugging Face `pyannote/speaker-diarization-community-1` pipeline was produced primarily by the Mobius coding agent. The directory is laid out so another agent can pick it up and run end-to-end, while still giving power users a clear manual path through the convert → compare → quantize toolchain.
+
+## What Lives Here
+
+- `convert-coreml.py`, `compare-models.py`, `quantize-models.py` — scripted pipeline for export, parity checks, and post-export optimizations.
+- `coreml_models/` — default output folder for `.mlpackage` bundles plus resource JSON.
+- `docs/` — background notes (`docs/plda-coreml.md`, conversion guides, optimization results).
+- `coreml_wrappers.py`, `embedding_io.py`, `plda_module.py` — importable helpers for wrapping Core ML bundles inside PyTorch pipelines.
+- `pyproject.toml`, `uv.lock` — reproducible Python 3.10.12 environment pinned to Torch 2.4, coremltools 7.2, pyannote-audio 4.0.0.
+- Sample clips (`yc_first_10s.wav`, `yc_first_minute.wav`, `../../../../longconvo-30m*.wav`) for smoke tests and benchmarking.
+
+## Agent-Oriented Workflow
+
+Mobius (or any compatible coding agent) can operate this toolkit by chaining three scripts:
+
+1. `convert-coreml.py` exports FBANK, segmentation, embedding, and PLDA components to Core ML (with optional selective FP16).
+2. `compare-models.py` runs PyTorch vs Core ML parity tests, reports timing, DER/JER metrics, and refreshes plots under `plots/`.
+3. `quantize-models.py` generates INT8/INT4/palettized variants, benchmarks latency and memory, and emits comparison charts.
+
+All scripts write machine-readable summaries to disk so an agent can decide what to ship or flag regressions. Automation typically runs them in that order inside this directory with `uv run`.
+
+## Manual Pipeline
+
+Prerequisites: macOS 14+, Xcode 15+, [uv](https://github.com/astral-sh/uv), access to the gated Hugging Face repo. Accept the user agreement on [huggingface.co/pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1) before attempting to download the checkpoints, then fetch the assets into `pyannote-speaker-diarization-community-1/` (run `git lfs pull` if necessary).
+
+```bash
+# 1. Create or refresh the local environment
+uv sync
+
+# 2. Convert PyTorch checkpoints to Core ML
+uv run python convert-coreml.py --model-root ./pyannote-speaker-diarization-community-1 \
+    --output-dir ./coreml_models
+# Optional: add --selective-fp16 for mixed precision exports
+
+# 3. Compare PyTorch vs Core ML outputs, generate plots/metrics
+uv run python compare-models.py --audio-path ../../../../longconvo-30m-last5m.wav \
+    --model-root ./pyannote-speaker-diarization-community-1 \
+    --coreml-dir ./coreml_models
+
+# 4. Produce quantized variants and benchmark them (uses convert+compare outputs)
+uv run python quantize-models.py --audio-path ../../../../longconvo-30m.wav \
+    --coreml-dir ./coreml_models
+# Add --skip-generation to benchmark existing variants only
+```
+
+Key artifacts land under `coreml_models/` (FP32/FP16 exports, PLDA Core ML bundle, resource JSON files) and `plots/` (latency and accuracy reports). The scripts emit timing summaries and DER/JER results directly to stdout for quick inspection.
+
+## Using the Wrappers from Python
+
+`coreml_wrappers.py` exposes helpers to drop the converted models into an existing pyannote pipeline. The snippet below loads the FBANK and embedding bundles, mirrors the PyTorch interface, and emits embeddings for a local clip.
+
+```python
+from pathlib import Path
+
+import coremltools as ct
+import torch
+import torchaudio
+from pyannote.audio import Model
+
+from coreml_wrappers import CoreMLEmbeddingModule
+from embedding_io import SEGMENTATION_FRAMES
+
+root = Path(__file__).resolve().parent
+embedding_ml = ct.models.MLModel(root / "coreml_models" / "embedding-community-1.mlpackage")
+fbank_ml = ct.models.MLModel(root / "coreml_models" / "fbank-community-1.mlpackage")
+prototype = Model.from_pretrained(str(root / "pyannote-speaker-diarization-community-1" / "embedding"))
+
+wrapper = CoreMLEmbeddingModule(embedding_ml, fbank_ml, prototype, output_key="embedding")
+
+waveform, _ = torchaudio.load(root / "yc_first_10s.wav")
+waveform = waveform.unsqueeze(0) if waveform.ndim == 1 else waveform
+weights = torch.ones(1, SEGMENTATION_FRAMES)
+embedding = wrapper(waveform.unsqueeze(0), weights)
+print(embedding.shape)
+```
+
+Call `wrap_pipeline_with_coreml` to swap the segmentation and embedding stages inside a full PyTorch diarization pipeline while keeping the VBx/PLDA logic on-device.
+
+## Status & Known Limitations
+
+- ✅ Conversion, comparison, and quantization scripts are in place and agent friendly.
+- ✅ PLDA parameters now ship as a Core ML model (`plda-community-1.mlpackage`) with precise dtype handling (see `docs/plda-coreml.md`).
+- ⚠️ Fixed 5 s embedding windows introduce mild oscillations around speaker transitions versus the variable-length PyTorch baseline (DER ~0.017–0.018). Plots under `plots/` illustrate the difference.
+- 🔍 Further tuning ideas: adjust VBx thresholds, add post-processing to merge short segments, investigate weighted pooling exports once coremltools supports variable-length inputs.
+
+## References
+
+- Hugging Face pipeline: `pyannote/speaker-diarization-community-1`
+- VBx clustering background: [VBx: Variational Bayes HMM Clustering](https://arxiv.org/abs/2012.14952)
+- Additional notes and deep dives live in `docs/` (start with `docs/plda-coreml.md` and `ANE_OPTIMIZATION_RESULTS.md`).
-Original file line number
+Diff line change
@@ @@ -1,3 +1,5 @@ @@
 __pycache__
 .DS_Store
 -.venv
 +.venv
++
 +*.wav