-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Issue Description
When transcribing German audio, CrisperWhisper generates excessive repetitions of common
German words (particularly "ja"), resulting in severely degraded transcript quality compared
to English transcriptions.
Example Output
Ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja,
ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja,
ja, ja, ja, ja, ja, ja, ja, er wieder kam, also hat er nicht ja, hat in 2 Tage gar nichts
gegessen ja und gezittert.
Configuration Used
- Model: nyrahealth/CrisperWhisper
- Language: German (de)
- Generation parameters: Default (only language token specified)
- Chunk length: 30s
- Return timestamps: word-level
Attempted Solutions
We tried adding generation parameters to control repetition:
'repetition_penalty': 1.2,
'no_repeat_ngram_size': 3
While this reduced the "ja" repetitions (from 50+ to ~2), it significantly deteriorated the
overall transcription quality, making this approach unsuitable.
Expected Behavior
German transcriptions should have similar quality to English transcriptions without excessive
word repetition.
Additional Context
This issue appears to be related to the concern raised in
#10 (comment) about German
language support quality.
The repetition primarily affects common German filler words and responses ("ja", "ähm", etc.)
and significantly impacts transcript readability and usefulness.
English transcriptions with the same model configuration are far better and do not exhibit
this repetition issue, suggesting this is specifically a German language model problem rather
than a general generation issue.