Skip to content

Conversation

@borgoat
Copy link

@borgoat borgoat commented Oct 1, 2025

There's a couple of new pyannote models:1 pyannote/speaker-diarization-community-1 (offline) and pyannote/speaker-diarization-precision-2 (hosted by pyannote)

I did a minimal upgrade to pyannote-audio 4.0 here to be able to use it, although I believe to make it work properly we probably need additional arguments: the token parameter changed now since one may have to provide a pyannote AI token to use their cloud model.

Footnotes

  1. https://www.pyannote.ai/blog/community-1

@borgoat borgoat force-pushed the feat/pyannote-audio-4 branch from 569a38a to f570307 Compare October 1, 2025 08:04
@borgoat borgoat changed the title chore: upgrade to pyannote-audio 4 Upgrade to pyannote-audio 4 Oct 1, 2025
@hbredin
Copy link

hbredin commented Oct 10, 2025

That's great @borgoat.

FYI, I just released version 4.0.1 of pyannote.audio that fixes support for pyannoteAI premium diarization models.

Therefore, once you update this PR to use 4.0.1, running the following command will perform transcription locally but diarization on pyannoteAI cloud with state-of-the-part Precision2 model.

whisperx --diarize \
         --diarize_model pyannote/speaker-diarization-precision-2 \
         --hf_token {pyannoteAI-api-key} \
         audio.wav

{pyannoteAI-api-key} can be obtained from dashboard.pyannote.ai (you'll automatically get a bunch of free credits).

Enjoy!

@stdweird
Copy link

@borgoat any chance of rebasing your branch and resolving the conflicts?

@borgoat borgoat force-pushed the feat/pyannote-audio-4 branch from f570307 to 99a2ed4 Compare October 28, 2025 13:00
@borgoat
Copy link
Author

borgoat commented Oct 28, 2025

I rebased it now, there are just a couple of things to note:

  • I had to set 3.10 as the minimum python version (same as pyannote)
  • Right now we're using the HF token for pyannote too - I guess that's incorrect @hbredin assuming one may want to use pyannote.ai?

@ErikHeggeli
Copy link

Just got it to work, but had to up torch and python. Otherwise throwing "std::bad_alloc"
Also offline use is way easier with this new version.
Got a working version in my forked repo.

@stdweird
Copy link

@ErikHeggeli i am mainly switching to 4 to make it work offline. if i look in your test branch, the other main difference is the modified yaml file. what is that about? is that something required and should it be included here? (or perhaps you can open PR for it?)

@ErikHeggeli
Copy link

@ErikHeggeli i am mainly switching to 4 to make it work offline. if i look in your test branch, the other main difference is the modified yaml file. what is that about? is that something required and should it be included here? (or perhaps you can open PR for it?)

Can not see the yaml file you are talking about, what is it called?

@ErikHeggeli
Copy link

If it is from the test branch it is not needed. That's something needed to make the earlier versions of pyannote work offline.

@stdweird
Copy link

@ErikHeggeli nvm. i see that your branch ships everything needed to make it offline, not only the changes for pyannote 4.

@ErikHeggeli
Copy link

Yes, clone the main branch.
Get the models from HuggingFace:
git clone https://hf.co/pyannote/speaker-diarization-community-1 /path/to/directory/pyannote-speaker-diarization-community-1

Make sure the models are actually downloaded and not just some reference/pointer, just check that the models aren't 1 kb. (This happened to me). Produced this error "_pickle.UnpicklingError: invalid load key, 'v'."

And then in diarize.py provide full path to "/path/to/directory/pyannote-speaker-diarization-community-1", if path is wrong you will get "huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name'"

Then it should work offline. Only thing I haven't been able to test is GPU offloading, only have CPU available atm.

@stdweird
Copy link

wrt the reference pointer, you need git-lfs installed.
wrt offline, euhm, looks like regular whipser usage pulls in at least on file from torch hub and per language one other model, so still puzzling a bit; but i'll get there. it's much easier than reverse engineering needed to make pyannote 3.X work offline ;)

@ErikHeggeli
Copy link

Yes that was most likely the alignment model (or whisper asr model). You have to download and give the path to those as well, but that always worked as intended offline from my experience.

@hbredin
Copy link

hbredin commented Nov 2, 2025

  • Right now we're using the HF token for pyannote too - I guess that's incorrect @hbredin assuming one may want to use pyannote.ai?

Using --hf_token {pyannoteAI-api-key} should work just fine.
See https://huggingface.co/pyannote/speaker-diarization-precision-2#usage

@to-audiobook
Copy link

Just got it to work, but had to up torch and python. Otherwise throwing "std::bad_alloc" Also offline use is way easier with this new version. Got a working version in my forked repo.

thanks to @hbredin I just learned the std::bad_alloc exeption is caused by incompatibilities between torch and torchcodec versions. Because of that we'd better force specific torchcodec versions, depending on which torch version you guys decide to use. torchcodec's github page has a table showing the versions compatibilities.

@GUUser91
Copy link

GUUser91 commented Nov 16, 2025

i had to edit /mnt/2tb/whisperX/whisperx/diarize.py
and replace

self.model = Pipeline.from_pretrained(
    model_config, use_token=use_auth_token
).to(device)

with

self.model = Pipeline.from_pretrained(
    model_config, token=use_auth_token
).to(device)

for whisperx to work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants