Generate Word-Level Timestamps from Known Text

I want to express my appreciation for the outstanding quality of this model. The transcription accuracy and word-level timestamp precision are truly impressive and have greatly improved our workflow.

For some of the audio files I work with—typically two-speaker conversations with imbalanced channel volumes—I’ve found that WhisperX with an initial prompt produces more accurate transcriptions than CrisperWhisper. However, the word-level timestamps from WhisperX are notably less precise.

I was wondering if there is a way to generate timestamps using the CrisperWhisper model based on known text, similar to forced alignment methods. This would help us retain CrisperWhisper’s timestamp precision while benefiting from WhisperX’s transcription accuracy.

Thank you again for developing such an excellent tool. Any guidance or suggestions would be greatly appreciated.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generate Word-Level Timestamps from Known Text #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generate Word-Level Timestamps from Known Text #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions