-
-
Notifications
You must be signed in to change notification settings - Fork 360
feat: Work on 'all' file format #553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @LomikXP, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request focuses on enhancing the application's file format handling, specifically for the 'all' format and improving speaker diarization in TXT output. The primary goal is to extend the functionality of the application to support outputting all available subtitle formats simultaneously and to better handle speaker-separated text in TXT files when diarization is enabled. This involves modifications to the UI, translation modules, subtitle management, and the base transcription pipeline.
Highlights
- File Format 'all': The pull request introduces the 'all' option for file format selection, allowing users to generate all supported subtitle formats (SRT, WebVTT, TXT, LRC, JSON, TSV) in a single operation.
- Speaker Diarization in TXT: The pull request enhances the TXT file format output to include speaker diarization, splitting the text by speaker if diarization is enabled. It uses regex to detect speaker labels in the text and formats the output accordingly.
- Translation Module Adaptations: The translation modules (both generic and DeepL) have been adapted to handle the 'all' file format option, ensuring that the correct output paths are returned when multiple files are generated.
Changelog
Click here to see the changelog
- app.py
- Added 'all' as a choice in the file format dropdown menu.
- modules/translation/deepl_api.py
- Modified the
translate_deeplfunction to handle multiple subtitle files when the 'all' format is selected. - The
files_infodictionary now stores a list of dictionaries, each containing subtitle and path information for a specific format. - The
output_file_pathslist comprehension was updated to iterate through all items infiles_info.
- Modified the
- modules/translation/translation_base.py
- Modified the
translate_filefunction to handle multiple subtitle files when the 'all' format is selected. - The
files_infodictionary now stores a list of dictionaries, each containing subtitle and path information for a specific format. - The
output_file_pathslist comprehension was updated to iterate through all items infiles_info.
- Modified the
- modules/utils/subtitle_manager.py
- Implemented speaker diarization in the
WriteTXTclass, formatting the output to separate text by speaker. - Modified the
get_writerfunction to handle the 'all' output format, creating a writer for each available format. - Modified the
generate_filefunction to generate all subtitle formats when 'all' is selected, returning a list of file paths.
- Implemented speaker diarization in the
- modules/whisper/base_transcription_pipeline.py
- Modified the
transcribe_filefunction to handle multiple subtitle files when the 'all' format is selected. - The
files_infodictionary now stores a list of dictionaries, each containing subtitle, time_for_task and path information for a specific format. - The
result_file_pathlist comprehension was updated to iterate through all items infiles_info. - The return value of
generate_fileis now indexed at zero when calling it fromtranscribe_micandtranscribe_youtube.
- Modified the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Formats align,
'All' files in their prime,
Diarization speaks.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces the 'all' file format option, allowing users to generate all available subtitle formats simultaneously. Additionally, it extends the 'txt' file format to support splitting by speaker when diarization is enabled. The changes seem well-structured, but there are a few areas that could be improved for clarity and efficiency.
Summary of Findings
- Inconsistent return types: The
generate_filefunction now returns a list of tuples, but in some cases, it only returns a single-element list. This inconsistency could lead to unexpected behavior and should be addressed. - Duplicated code: The
writersdictionary is defined twice inmodules/utils/subtitle_manager.py. It should be defined only once to avoid redundancy. - Potential performance issue: When the output format is 'all', the code iterates through all writers and generates files. However, the
write_allfunction is not actually used. This could lead to unnecessary file generation.
Merge Readiness
The pull request is not quite ready for merging. The inconsistent return types and duplicated code should be addressed before merging. Additionally, the potential performance issue with the 'all' output format should be investigated and resolved. I am unable to approve this pull request, and recommend that others review and approve this code before merging.
Related issues / PRs. Summarize issues.
Summarize Changes
Extending feature of using all formats while whispering

Extending txt file format for splitting by speaker (if diarization enabled)
