feat: support multimodal embeddings #237

Yeuoly · 2025-11-14T05:28:15Z

Summary

inline the multimodal content, config, and result models into the existing text embedding and rerank entity modules and drop the standalone multimodal module
refresh imports and error messaging for the reorganized multimodal hooks in the embedding and rerank model interfaces and executor plumbing
add a mock text embedding model unit test that exercises the multimodal invoke path

Testing

pytest python/tests/test_multimodal_requests.py

gemini-code-assist · 2025-11-14T05:28:37Z

Summary of Changes

Hello @Yeuoly, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the handling of multimodal data by integrating its core components directly into the existing text embedding and rerank functionalities. This change eliminates a separate multimodal module, streamlining the codebase and extending the capabilities of current models to process diverse input types like text and images for both embedding generation and document reranking. The update also includes the necessary plumbing for these new multimodal invocation paths and introduces dedicated unit tests to ensure proper functionality.

Highlights

Multimodal Integration: Multimodal content, configuration, and result models are now inlined directly into the existing text embedding and rerank entity modules.
Module Removal: The standalone multimodal module has been removed, simplifying the project structure.
New Invocation Types: Added 'MultimodalEmbedding' and 'MultimodalRerank' invocation types and corresponding request models.
Interface Extension: Text embedding and rerank model interfaces now include 'invoke_multimodal' methods to support multimodal inputs.
Unit Testing: A new unit test file ('test_multimodal_requests.py') has been added to verify the functionality of multimodal embedding requests.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively refactors the multimodal functionality by inlining it into the existing text embedding and rerank models. This simplifies the codebase by removing the standalone multimodal module. The changes are well-structured, with corresponding updates to request entities, executor logic, and model interfaces. The addition of a new test file with unit tests for the multimodal request path is a great inclusion that improves test coverage for the new functionality. My review includes a couple of suggestions to simplify the code by removing redundant type checks, which will improve readability and maintainability. Overall, this is a solid contribution.

gemini-code-assist · 2025-11-14T05:29:32Z

python/dify_plugin/invocations/model/rerank.py

+            "query": query.model_dump() if isinstance(query, MultiModalContent) else query,
+            "docs": [doc.model_dump() if isinstance(doc, MultiModalContent) else doc for doc in docs],


The type hints for query and docs are MultiModalContent and list[MultiModalContent] respectively. Pydantic ensures that these will be instances of MultiModalContent. Therefore, the isinstance checks are redundant and can be removed to simplify the code.

Suggested change

"query": query.model_dump() if isinstance(query, MultiModalContent) else query,

"docs": [doc.model_dump() if isinstance(doc, MultiModalContent) else doc for doc in docs],

"query": query.model_dump(),

"docs": [doc.model_dump() for doc in docs],

gemini-code-assist · 2025-11-14T05:29:33Z

python/dify_plugin/invocations/model/text_embedding.py

+            "documents": [
+                document.model_dump() if isinstance(document, MultiModalContent) else document
+                for document in documents
+            ],


The type hint for documents is list[MultiModalContent]. Pydantic will ensure that elements of this list are instances of MultiModalContent. The isinstance check is therefore redundant and can be removed to improve code clarity.

"documents": [document.model_dump() for document in documents],

…terfaces

Inline multimodal entities into existing models

a933f42

Yeuoly added the codex label Nov 14, 2025 — with ChatGPT Codex Connector

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

Yeuoly changed the title ~~Inline multimodal entities into existing models~~ feat: support multimodal embeddings Nov 14, 2025

Yeuoly added 9 commits November 14, 2025 13:31

apply ruff

19ca6c5

Merge branch 'main' into codex/add-multimodal-rerank-and-embedding-in…

0b8e5f0

…terfaces

Merge branch 'main' into codex/add-multimodal-rerank-and-embedding-in…

5250460

…terfaces

bump to 0.7.0b1

0bd99d4

fix: remove tenant_id from invoke_multimodal_embedding

7d225d9

tests: add rerank

cd81d57

apply ruff

aea1032

fix

2d3a005

fix: typing

7118abe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support multimodal embeddings #237

feat: support multimodal embeddings #237

Uh oh!

Yeuoly commented Nov 14, 2025

Uh oh!

gemini-code-assist bot commented Nov 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 14, 2025

Uh oh!

gemini-code-assist bot Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"query": query.model_dump() if isinstance(query, MultiModalContent) else query,
		"docs": [doc.model_dump() if isinstance(doc, MultiModalContent) else doc for doc in docs],

feat: support multimodal embeddings #237

Are you sure you want to change the base?

feat: support multimodal embeddings #237

Uh oh!

Conversation

Yeuoly commented Nov 14, 2025

Summary

Testing

Uh oh!

gemini-code-assist bot commented Nov 14, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants