Skip to content

Finetuning Granite Speech #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

avihu111
Copy link

What does this PR do?

This PR adds a notebook that shows how to finetune Granite Speech, an open-source model that leads the OpenASR leaderboard.

Who can review?

@merveenoyan @stevhliu can you give that a look? 🙏

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@stevhliu
Copy link
Member

Hi, thanks for your contribution!

The cookbook recipes are more focused on applied use cases so it'd be awesome if you could tailor it more towards solving a specific problem or use case.

@jack-tol
Copy link

jack-tol commented Jun 30, 2025

Hi, thanks for your contribution!

The cookbook recipes are more focused on applied use cases so it'd be awesome if you could tailor it more towards solving a specific problem or use case.

Might not really be my place to say, but even though this script perhaps doesn't focus on tackling a specific fine-tuning use-case (i.e. domain specific fine-tuning on medical audio etc.), it is nevertheless very important to provide the open-source community with a script to fine-tune a new open-source model on their custom data. Maybe this is in the works already and I'm just jumping the gun, but this contribution surely should exist somewhere within the cookbook or some other resource until perhaps a better, and more robust implementation is available. Just my thoughts.

@stevhliu
Copy link
Member

Absolutely, we're happy to have a link for it in the Granite Speech docs in Transformers if nothing else!

@avihu111
Copy link
Author

avihu111 commented Jul 1, 2025

Hi @stevhliu, thanks for the feedback!
I expected (like @jack-tol) that the most common use case would be finetuning Granite Speech on custom data (e.g., new language, unseen conditions, etc).
My goal was to show the best way to run inference/finetune the model, along with useful code snippets and a concrete (yet concise and easy to run) example.

We can also finetune Granite Speech on an unseen task like spoken question answering, but I fear people won't find it as useful (a finetuning script was requested here and here ).

I hope it will be suitable for the cookbook - I like the fact that the huggingface webpage presents the notebook nicely. 🙏
If not, I assume the best approach is to add it to the Granite Speech docs.

@stevhliu
Copy link
Member

stevhliu commented Jul 1, 2025

I'm wondering if there is some way we can apply your fine-tuning recipe to a more practical application. For example, you can fine-tune Granite Speech and build a Space that transcribes meeting notes, captions videos, etc. This will help you extend the notebook and demonstrate how you can build an AI application with it.

If you decide to keep it as fine-tuning only, then I think it's best to add it to the Granite Speech docs.

Thanks again and we really appreciate the time and effort you put into creating this notebook! 🤗

@avihu111
Copy link
Author

avihu111 commented Jul 3, 2025

Thanks, @stevhliu.
Can you advise on the best way to add this to the Granite Speech docs?
Most of the examples I've seen are short code snippets. Do you have a docs page with an example notebook that you can share?
Any help would be very appreciated - Thanks!

@stevhliu
Copy link
Member

stevhliu commented Jul 3, 2025

Yeah, you can open a PR on the Transformers repo and create a ## Resources section on the Granite Speech docs with a link to your notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants