Skip to content

Generate Word-Level Timestamps from Known Text #39

@JH-Eric-Yang

Description

@JH-Eric-Yang

I want to express my appreciation for the outstanding quality of this model. The transcription accuracy and word-level timestamp precision are truly impressive and have greatly improved our workflow.

For some of the audio files I work with—typically two-speaker conversations with imbalanced channel volumes—I’ve found that WhisperX with an initial prompt produces more accurate transcriptions than CrisperWhisper. However, the word-level timestamps from WhisperX are notably less precise.

I was wondering if there is a way to generate timestamps using the CrisperWhisper model based on known text, similar to forced alignment methods. This would help us retain CrisperWhisper’s timestamp precision while benefiting from WhisperX’s transcription accuracy.

Thank you again for developing such an excellent tool. Any guidance or suggestions would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions