Skip to content

Conversation

@njourdane
Copy link

@njourdane njourdane commented Nov 6, 2025

This PR adds the --text_file CLI option, which can be used to set a plain-text transcription file as a reference. It can be used for instance to generate karaoke subtitles, assuming you have the lyrics (usually available on the web).

This step is performed after the alignment. It's based on the new align_text function, which takes an aligned transcription result and a file path, and return an other aligned transcription result. It tries match words between synchronized transcription and plain-text transcription using the Python difflib module, acting similarly to a git diff. The diff is done on a slugified version oh each word (so Hëllo matches with hellô!).

Start and end-time are transferred as-is when possible, otherwise they are based on last/previous times and word lengths. Word scores are also transferred.

If the logger is set to DEBUG, it prints the details on how each word is converted, with colors to distinguish diff operations (equal / replace / insert / delete):

image

Here with an extract of Les filles, les meufs from french singer Marguerite.

It was quite a journey to work on this, I hope it will be useful for some people. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant