-
Notifications
You must be signed in to change notification settings - Fork 45
Description
I want to express my appreciation for the outstanding quality of this model. The transcription accuracy and word-level timestamp precision are truly impressive and have greatly improved our workflow.
For some of the audio files I work with—typically two-speaker conversations with imbalanced channel volumes—I’ve found that WhisperX with an initial prompt produces more accurate transcriptions than CrisperWhisper. However, the word-level timestamps from WhisperX are notably less precise.
I was wondering if there is a way to generate timestamps using the CrisperWhisper model based on known text, similar to forced alignment methods. This would help us retain CrisperWhisper’s timestamp precision while benefiting from WhisperX’s transcription accuracy.
Thank you again for developing such an excellent tool. Any guidance or suggestions would be greatly appreciated.