Skip to content

Conversation

@LomikXP
Copy link

@LomikXP LomikXP commented May 5, 2025

Related issues / PRs. Summarize issues.

  • nothing

Summarize Changes

  1. Extending feature of using all formats while whispering
    image

  2. Extending txt file format for splitting by speaker (if diarization enabled)
    image

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @LomikXP, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request focuses on enhancing the application's file format handling, specifically for the 'all' format and improving speaker diarization in TXT output. The primary goal is to extend the functionality of the application to support outputting all available subtitle formats simultaneously and to better handle speaker-separated text in TXT files when diarization is enabled. This involves modifications to the UI, translation modules, subtitle management, and the base transcription pipeline.

Highlights

  • File Format 'all': The pull request introduces the 'all' option for file format selection, allowing users to generate all supported subtitle formats (SRT, WebVTT, TXT, LRC, JSON, TSV) in a single operation.
  • Speaker Diarization in TXT: The pull request enhances the TXT file format output to include speaker diarization, splitting the text by speaker if diarization is enabled. It uses regex to detect speaker labels in the text and formats the output accordingly.
  • Translation Module Adaptations: The translation modules (both generic and DeepL) have been adapted to handle the 'all' file format option, ensuring that the correct output paths are returned when multiple files are generated.

Changelog

Click here to see the changelog
  • app.py
    • Added 'all' as a choice in the file format dropdown menu.
  • modules/translation/deepl_api.py
    • Modified the translate_deepl function to handle multiple subtitle files when the 'all' format is selected.
    • The files_info dictionary now stores a list of dictionaries, each containing subtitle and path information for a specific format.
    • The output_file_paths list comprehension was updated to iterate through all items in files_info.
  • modules/translation/translation_base.py
    • Modified the translate_file function to handle multiple subtitle files when the 'all' format is selected.
    • The files_info dictionary now stores a list of dictionaries, each containing subtitle and path information for a specific format.
    • The output_file_paths list comprehension was updated to iterate through all items in files_info.
  • modules/utils/subtitle_manager.py
    • Implemented speaker diarization in the WriteTXT class, formatting the output to separate text by speaker.
    • Modified the get_writer function to handle the 'all' output format, creating a writer for each available format.
    • Modified the generate_file function to generate all subtitle formats when 'all' is selected, returning a list of file paths.
  • modules/whisper/base_transcription_pipeline.py
    • Modified the transcribe_file function to handle multiple subtitle files when the 'all' format is selected.
    • The files_info dictionary now stores a list of dictionaries, each containing subtitle, time_for_task and path information for a specific format.
    • The result_file_path list comprehension was updated to iterate through all items in files_info.
    • The return value of generate_file is now indexed at zero when calling it from transcribe_mic and transcribe_youtube.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Formats align,
'All' files in their prime,
Diarization speaks.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces the 'all' file format option, allowing users to generate all available subtitle formats simultaneously. Additionally, it extends the 'txt' file format to support splitting by speaker when diarization is enabled. The changes seem well-structured, but there are a few areas that could be improved for clarity and efficiency.

Summary of Findings

  • Inconsistent return types: The generate_file function now returns a list of tuples, but in some cases, it only returns a single-element list. This inconsistency could lead to unexpected behavior and should be addressed.
  • Duplicated code: The writers dictionary is defined twice in modules/utils/subtitle_manager.py. It should be defined only once to avoid redundancy.
  • Potential performance issue: When the output format is 'all', the code iterates through all writers and generates files. However, the write_all function is not actually used. This could lead to unnecessary file generation.

Merge Readiness

The pull request is not quite ready for merging. The inconsistent return types and duplicated code should be addressed before merging. Additionally, the potential performance issue with the 'all' output format should be investigated and resolved. I am unable to approve this pull request, and recommend that others review and approve this code before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant