German transcription quality degradation - excessive word repetition

### Issue Description

  When transcribing German audio, CrisperWhisper generates excessive repetitions of common
  German words (particularly "ja"), resulting in severely degraded transcript quality compared
  to English transcriptions.

  ### Example Output

```
  Ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja,
  ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja, ja,
  ja, ja, ja, ja, ja, ja, ja, er wieder kam, also hat er nicht ja, hat in 2 Tage gar nichts
  gegessen ja und gezittert.
```

  ### Configuration Used

  - Model: nyrahealth/CrisperWhisper
  - Language: German (de)
  - Generation parameters: Default (only language token specified)
  - Chunk length: 30s
  - Return timestamps: word-level

  ### Attempted Solutions

  We tried adding generation parameters to control repetition:
```
  'repetition_penalty': 1.2,
  'no_repeat_ngram_size': 3
```

  While this reduced the "ja" repetitions (from 50+ to ~2), it significantly deteriorated the
  overall transcription quality, making this approach unsuitable.

  ### Expected Behavior

  German transcriptions should have similar quality to English transcriptions without excessive
  word repetition.

  ### Additional Context

  This issue appears to be related to the concern raised in
  https://github.com/nyrahealth/CrisperWhisper/issues/10#issuecomment-2457074673 about German
  language support quality.

  The repetition primarily affects common German filler words and responses ("ja", "ähm", etc.)
  and significantly impacts transcript readability and usefulness.

  English transcriptions with the same model configuration are far better and do not exhibit
  this repetition issue, suggesting this is specifically a German language model problem rather
  than a general generation issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

German transcription quality degradation - excessive word repetition #40

Issue Description

Example Output

Configuration Used

Attempted Solutions

Expected Behavior

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

German transcription quality degradation - excessive word repetition #40

Description

Issue Description

Example Output

Configuration Used

Attempted Solutions

Expected Behavior

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions