Skip to content

Conversation

@Yeuoly
Copy link
Collaborator

@Yeuoly Yeuoly commented Nov 14, 2025

Summary

  • inline the multimodal content, config, and result models into the existing text embedding and rerank entity modules and drop the standalone multimodal module
  • refresh imports and error messaging for the reorganized multimodal hooks in the embedding and rerank model interfaces and executor plumbing
  • add a mock text embedding model unit test that exercises the multimodal invoke path

Testing

  • pytest python/tests/test_multimodal_requests.py

Codex Task

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Yeuoly, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the handling of multimodal data by integrating its core components directly into the existing text embedding and rerank functionalities. This change eliminates a separate multimodal module, streamlining the codebase and extending the capabilities of current models to process diverse input types like text and images for both embedding generation and document reranking. The update also includes the necessary plumbing for these new multimodal invocation paths and introduces dedicated unit tests to ensure proper functionality.

Highlights

  • Multimodal Integration: Multimodal content, configuration, and result models are now inlined directly into the existing text embedding and rerank entity modules.
  • Module Removal: The standalone multimodal module has been removed, simplifying the project structure.
  • New Invocation Types: Added 'MultimodalEmbedding' and 'MultimodalRerank' invocation types and corresponding request models.
  • Interface Extension: Text embedding and rerank model interfaces now include 'invoke_multimodal' methods to support multimodal inputs.
  • Unit Testing: A new unit test file ('test_multimodal_requests.py') has been added to verify the functionality of multimodal embedding requests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the multimodal functionality by inlining it into the existing text embedding and rerank models. This simplifies the codebase by removing the standalone multimodal module. The changes are well-structured, with corresponding updates to request entities, executor logic, and model interfaces. The addition of a new test file with unit tests for the multimodal request path is a great inclusion that improves test coverage for the new functionality. My review includes a couple of suggestions to simplify the code by removing redundant type checks, which will improve readability and maintainability. Overall, this is a solid contribution.

Comment on lines +38 to +39
"query": query.model_dump() if isinstance(query, MultiModalContent) else query,
"docs": [doc.model_dump() if isinstance(doc, MultiModalContent) else doc for doc in docs],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hints for query and docs are MultiModalContent and list[MultiModalContent] respectively. Pydantic ensures that these will be instances of MultiModalContent. Therefore, the isinstance checks are redundant and can be removed to simplify the code.

Suggested change
"query": query.model_dump() if isinstance(query, MultiModalContent) else query,
"docs": [doc.model_dump() if isinstance(doc, MultiModalContent) else doc for doc in docs],
"query": query.model_dump(),
"docs": [doc.model_dump() for doc in docs],

Comment on lines 44 to 47
"documents": [
document.model_dump() if isinstance(document, MultiModalContent) else document
for document in documents
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint for documents is list[MultiModalContent]. Pydantic will ensure that elements of this list are instances of MultiModalContent. The isinstance check is therefore redundant and can be removed to improve code clarity.

            "documents": [document.model_dump() for document in documents],

@Yeuoly Yeuoly changed the title Inline multimodal entities into existing models feat: support multimodal embeddings Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants