diff --git a/src/content/docs/docs/evaluation.mdx b/src/content/docs/docs/evaluation.mdx
index b52d79f9..25e60ade 100644
--- a/src/content/docs/docs/evaluation.mdx
+++ b/src/content/docs/docs/evaluation.mdx
@@ -1382,6 +1382,10 @@ Genkit includes a number of built-in evaluators, inspired by [RAGAS](https://doc
 - **Deep Equal** -- Checks if the generated output is deep-equal to the reference output
 - **JSONata** -- Checks if the generated output matches a JSONata expression provided in the reference field
 
+#### Python
+
+Use **`ai.define_evaluator()`** (see [Custom evaluators](#custom-evaluators)) for project-specific metrics, or install plugins that register evaluators with your `Genkit` instance—for example [Vertex AI evaluation metrics](/docs/integrations/vertex-ai#evaluation-metrics) via the Google GenAI plugin. Third-party packages may ship additional evaluators; follow each package’s install and registration instructions.
+
 ### Evaluator plugins
 
 Genkit supports additional evaluators through plugins, like the Vertex Rapid Evaluators, which you can access via the [VertexAI Plugin](/docs/integrations/vertex-ai#evaluation-metrics).
@@ -1433,7 +1437,7 @@ async def food_evaluator(
         test_case_id=datapoint.test_case_id or '',
         evaluation=Score(
             score=response.text,
-            status=EvalStatusEnum.PASS_,
+            status=EvalStatusEnum.PASS,
             details={'reasoning': f'LLM judged: {response.text}'},
         ),
     )
diff --git a/src/content/docs/docs/frameworks/fastapi.mdx b/src/content/docs/docs/frameworks/fastapi.mdx
index 1700caf1..6a9b5666 100644
--- a/src/content/docs/docs/frameworks/fastapi.mdx
+++ b/src/content/docs/docs/frameworks/fastapi.mdx
@@ -25,13 +25,36 @@ cd my-genkit-app
 Install FastAPI and Genkit dependencies:
 
 ```bash
-uv add fastapi uvicorn genkit genkit-plugin-google-genai
+uv add fastapi uvicorn genkit genkit-plugin-google-genai genkit-plugin-fastapi
 ```
 
 ## Define Genkit flows and FastAPI routes
 
 To expose your Genkit flows as FastAPI endpoints, you should use standard FastAPI custom endpoints. For simplicity in this example, we'll create a single file (for example, `main.py`) to initialize Genkit, define your flows, and expose them.
 
+### FastAPI handler plugin
+
+To serve a flow with Genkit’s HTTP protocol (streaming chunks, structured errors, and compatibility with Genkit clients), decorate the flow with `genkit_fastapi_handler` and set `response_model=None` on the route so FastAPI does not try to validate or coerce Genkit’s JSON payloads:
+
+```python
+from fastapi import FastAPI
+from genkit import Genkit
+from genkit.plugins.fastapi import genkit_fastapi_handler
+
+app = FastAPI()
+ai = Genkit(...)
+
+@app.post('/chat', response_model=None)
+@genkit_fastapi_handler(ai)
+@ai.flow()
+async def chat(prompt: str) -> str:
+    return 'Hello world'
+```
+
+The handlers stack bottom-up: FastAPI route, then `genkit_fastapi_handler`, then `@ai.flow()`.
+
+The examples below show other patterns when you need custom request or response shapes.
+
 ### 1. Genkit Client Compatible API (Recommended)
 
 If you want your FastAPI server to work with the official Genkit Client SDKs for Web or Dart (see [Accessing flows from the client](/docs/client)), your endpoint must consume and produce messages that match Genkit's network protocol.
@@ -102,7 +125,7 @@ async def menu_suggestion_stream(payload: GenkitEnvelope):
     async def sse_generator():
         flow_stream = streaming_menu_flow.stream(theme)
 
-        # In current PyPI Genkit (0.5.1), stream() returns a tuple (stream, future).
+        # In current PyPI Genkit (0.5.2), stream() returns a tuple (stream, future).
         # We use flow_stream[0] if it is a tuple, otherwise we assume it is the stream itself.
         fs = flow_stream[0] if isinstance(flow_stream, tuple) else flow_stream
 
diff --git a/src/content/docs/docs/frameworks/flask.mdx b/src/content/docs/docs/frameworks/flask.mdx
index 87aaf95b..4c1f419b 100644
--- a/src/content/docs/docs/frameworks/flask.mdx
+++ b/src/content/docs/docs/frameworks/flask.mdx
@@ -274,6 +274,8 @@ async def admin_flow(action: str, ctx: ActionRunContext):
 
 ## Error Handling
 
+When you use `@genkit_flask_handler`, Genkit serializes error details in a form Flask can turn into HTTP responses (the plugin stack avoids returning raw Pydantic models where frameworks expect JSON-serializable payloads).
+
 ### Custom Error Responses
 
 ```python
diff --git a/src/content/docs/docs/integrations/anthropic.mdx b/src/content/docs/docs/integrations/anthropic.mdx
index e54858e7..5bfd3e58 100644
--- a/src/content/docs/docs/integrations/anthropic.mdx
+++ b/src/content/docs/docs/integrations/anthropic.mdx
@@ -1044,11 +1044,11 @@ print(response.text)
 ## Configuration Options
 
 ```python
-from genkit.types import GenerationCommonConfig
+from genkit import ModelConfig
 
 response = await ai.generate(
     prompt='Your prompt here',
-    config=GenerationCommonConfig(
+    config=ModelConfig(
         temperature=0.7,
         max_output_tokens=1000,
     ),
diff --git a/src/content/docs/docs/integrations/google-genai.mdx b/src/content/docs/docs/integrations/google-genai.mdx
index 7cd4bd21..e507d50c 100644
--- a/src/content/docs/docs/integrations/google-genai.mdx
+++ b/src/content/docs/docs/integrations/google-genai.mdx
@@ -616,7 +616,7 @@ const response = await ai.generate({
 
 ### Available Models
 
-- `gemini-embedding-001` - Latest Gemini embedding model (3072 dimensions, customizable)
+- `gemini-embedding-001` — Default **3072** dimensions; set **`output_dimensionality`** in embed params (for example **768**, **1536**, or **3072**) when you want a shorter vector.
 
 ### Usage
 
@@ -627,6 +627,13 @@ const embeddings = await ai.embed({
 });
 
 console.log(embeddings);
+
+// Optional: request a shorter embedding (size indexes to match)
+const compact = await ai.embed({
+  embedder: googleAI.embedder('gemini-embedding-001'),
+  content: 'Machine learning models process data to make predictions.',
+  options: { outputDimensionality: 768 },
+});
 ```
 
 ## Image Models
@@ -1439,6 +1446,20 @@ ai = Genkit(
 )
 ```
 
+3. **Per-request**: Override the API key (or pass other provider-specific options) in the `config` passed to `generate()`:
+
+```python
+response = await ai.generate(
+    model='googleai/gemini-2.5-flash',
+    prompt='Your prompt here',
+    config={
+        'api_key': 'different-api-key',
+    },
+)
+```
+
+This is useful for multi-tenant apps or routing requests to different keys. Model `config` also accepts additional provider-specific fields without strict schema errors.
+
 ## Language Models
 
 ### Available Models
@@ -1469,6 +1490,15 @@ response = await ai.generate(
     prompt='Explain how neural networks learn in simple terms.',
 )
 print(response.text)
+
+# Non-text parts (images, audio, etc.)
+if response.message:
+    for media in response.message.media:
+        print(f'Media type: {media.content_type}')
+
+# Usage may include thinking and context-cache token counts on supported models
+print(response.usage.thoughts_tokens)
+print(response.usage.cached_content_tokens)
 ```
 
 ### Structured Output
@@ -1579,8 +1609,8 @@ response = await ai.generate(
 
 ### Available Models
 
-- `gemini-embedding-001` - Latest Gemini embedding model (3072 dimensions)
-- `text-embedding-004` - Text embedding model (768 dimensions)
+- `gemini-embedding-001` — Default **3072** dimensions; pass **`output_dimensionality`** in **`options`** on **`embed`** / **`embed_many`** (for example **768**, **1536**, or **3072**) when you want a shorter vector.
+- `text-embedding-004` — **768** dimensions in typical use.
 
 ### Usage
 
@@ -1590,6 +1620,14 @@ embeddings = await ai.embed(
     content='Machine learning models process data to make predictions.',
 )
 print(embeddings)
+
+# gemini-embedding-001: default 3072, or request a shorter embedding (size indexes to match)
+gemini_embeddings = await ai.embed(
+    embedder='googleai/gemini-embedding-001',
+    content='Machine learning models process data to make predictions.',
+    options={'output_dimensionality': 768},
+)
+print(gemini_embeddings)
 ```
 
 ## Image Models
@@ -1715,6 +1753,27 @@ if response.message and response.message.content:
 - `pitch`: Voice pitch (-20.0 to 20.0)
 - `volume_gain_db`: Volume (-96.0 to 16.0)
 
+## Context caching
+
+Gemini 2.5 and newer models automatically cache common content prefixes (minimum 1024 tokens for Flash, 2048 for Pro), providing a significant token discount on cached tokens.
+
+```python
+# Structure prompts with consistent content at the beginning
+base_context = 'You are a helpful cook... (large context) ...' * 50
+
+# First request — prefix may be cached by Gemini
+await ai.generate(
+    model='googleai/gemini-2.5-flash',
+    prompt=f'{base_context}\n\nTask 1...',
+)
+
+# Second request with the same prefix — eligible for a cache hit
+await ai.generate(
+    model='googleai/gemini-2.5-flash',
+    prompt=f'{base_context}\n\nTask 2...',
+)
+```
+
 ## Next Steps
 
 - Learn about [generating content](/docs/models) to understand how to use these models effectively
diff --git a/src/content/docs/docs/integrations/vertex-ai.mdx b/src/content/docs/docs/integrations/vertex-ai.mdx
index 11450589..945cc0ca 100644
--- a/src/content/docs/docs/integrations/vertex-ai.mdx
+++ b/src/content/docs/docs/integrations/vertex-ai.mdx
@@ -329,7 +329,8 @@ Use Vertex AI Vector Search for enterprise-grade vector operations:
 
 1. Create a Vector Search index in the [Google Cloud Console](https://console.cloud.google.com/vertex-ai/matching-engine/indexes)
 2. Configure dimensions based on your embedding model:
-   - `gemini-embedding-001`: 768 dimensions
+   - `gemini-embedding-001` / `gemini-embedding-2-preview`: **default 3072** dimensions; you can set **`output_dimensionality`** on embed calls (for example **768**, **1536**, or **3072** per Google). Size the index to the length you actually use.
+   - `text-embedding-005`: 768 dimensions
    - `text-multilingual-embedding-002`: 768 dimensions
    - `multimodalEmbedding001`: 128, 256, 512, or 1408 dimensions
 3. Deploy the index to a standard endpoint
@@ -566,13 +567,21 @@ ai = Genkit(
 - **Vertex AI Express Mode:** A streamlined way to try out many Vertex AI features using just an API key, without needing to set up billing or full project configurations. This is ideal for quick experimentation and has generous free tier quotas. [Learn More about Express Mode](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview).
 
 ```python
-# Using Vertex AI Express Mode (Easy to start, some limitations)
+# Using Vertex AI Express Mode (easy to start; some limitations).
 # Get an API key from the Vertex AI Studio Express Mode setup.
 import os
-VertexAI(api_key=os.environ.get('VERTEX_EXPRESS_API_KEY'))
+
+from genkit import Genkit
+from genkit.plugins.google_genai import VertexAI
+
+ai = Genkit(
+    plugins=[
+        VertexAI(api_key=os.environ['VERTEX_EXPRESS_API_KEY']),
+    ],
+)
 ```
 
-_Note: When using Express Mode, you do not provide `project` and `location` in the plugin config._
+_Note: When using Express Mode, you typically omit `project` and `location` on `VertexAI` (see the [Express Mode docs](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview))._
 
 ### Basic Usage
 
@@ -601,6 +610,20 @@ embeddings = await ai.embed(
 )
 ```
 
+:::note[Embedding vector sizes]
+Size Vector Search indexes (and any application-side buffers) to the **length of vectors your app actually produces**. **`gemini-embedding-001`** and **`gemini-embedding-2-preview`** default to **3072** dimensions; pass **`output_dimensionality`** in **`options`** on **`embed`** / **`embed_many`** to use a shorter vector (Google documents common choices such as **768**, **1536**, or **3072**). Example:
+
+```python
+embeddings = await ai.embed(
+    embedder='vertexai/gemini-embedding-001',
+    content='Your text here.',
+    options={'output_dimensionality': 768},
+)
+```
+
+**`vertexai/text-embedding-005`** and **`vertexai/text-multilingual-embedding-002`** typically use **768** dimensions. See [Embedding models](/docs/integrations/google-genai#embedding-models) and the [Gemini embedding documentation](https://ai.google.dev/gemini-api/docs/embeddings).
+:::
+
 ### Image Generation (Imagen)
 
 ```python
@@ -799,7 +822,7 @@ llm_response = await ai.generate(
 
 ### Model Garden Integration
 
-Access third-party models through Vertex AI Model Garden using the separate `vertex_ai` plugin:
+Access third-party models through Vertex AI Model Garden using the `genkit-plugin-vertex-ai` package (`ModelGardenPlugin`). The plugin requires a Google Cloud project ID: pass `project_id`, or set `GCLOUD_PROJECT` / `GOOGLE_CLOUD_PROJECT`. Model IDs must use the publisher-qualified names shown in the Google Cloud console (for example `meta/...` for Llama, `anthropic/...` for Claude on Vertex). Pass them to `model_garden_name()` so Genkit resolves the action as `modelgarden/<model-id>`.
 
 **Installation:**
 
@@ -807,63 +830,58 @@ Access third-party models through Vertex AI Model Garden using the separate `ver
 uv add genkit-plugin-vertex-ai
 ```
 
-#### Claude 3 Models
+#### Llama (Meta) models
 
 ```python
 from genkit import Genkit
 from genkit.plugins.vertex_ai import ModelGardenPlugin
+from genkit.plugins.vertex_ai.model_garden import model_garden_name
 
 ai = Genkit(
     plugins=[
         ModelGardenPlugin(
+            project_id='my-gcp-project',
             location='us-central1',
-            models=['claude-3-haiku', 'claude-3-sonnet', 'claude-3-opus'],
         ),
     ],
 )
 
 response = await ai.generate(
-    model='claude-3-sonnet',
-    prompt='What should I do when I visit Melbourne?',
+    model=model_garden_name('meta/llama-3.1-405b-instruct-maas'),
+    prompt='Write a function that adds two numbers together',
 )
 ```
 
-#### Llama 3.1 405b
+Another identifier shipped in the Python SDK registry is `meta/llama-3.2-90b-vision-instruct-maas`. Always confirm the exact model resource name for your project in the [Vertex AI Model Garden](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models) console.
 
-```python
-ai = Genkit(
-    plugins=[
-        ModelGardenPlugin(
-            location='us-central1',
-            models=['llama3-405b-instruct-maas'],
-        ),
-    ],
-)
-
-response = await ai.generate(
-    model='llama3-405b-instruct-maas',
-    prompt='Write a function that adds two numbers together',
-)
-```
+#### Anthropic (Claude) models on Vertex
 
-#### Mistral Models
+Claude on Vertex uses `anthropic/...` model IDs. Version strings often include dates or `@` — use the exact ID from the console:
 
 ```python
+from genkit import Genkit
+from genkit.plugins.vertex_ai import ModelGardenPlugin
+from genkit.plugins.vertex_ai.model_garden import model_garden_name
+
 ai = Genkit(
     plugins=[
         ModelGardenPlugin(
+            project_id='my-gcp-project',
             location='us-central1',
-            models=['mistral-large', 'mistral-small'],
         ),
     ],
 )
 
 response = await ai.generate(
-    model='mistral-large',
-    prompt='Explain quantum computing',
+    model=model_garden_name('anthropic/claude-3-5-haiku-20241022'),
+    prompt='What should I do when I visit Melbourne?',
 )
 ```
 
+#### Other OpenAI-compatible Model Garden endpoints
+
+For additional publishers (for example Mistral), use the same `model_garden_name()` pattern with the full Model Garden model ID. Models not in the built-in registry still resolve via the generic OpenAI-compatible Model Garden path.
+
 Vertex AI provides access to various third-party models through Model Garden. Consult the [Vertex AI Model Garden documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models) for the full list of supported models and their capabilities.
 
 ### Evaluation Metrics
diff --git a/src/content/docs/docs/interrupts.mdx b/src/content/docs/docs/interrupts.mdx
index 03a0b420..132458be 100644
--- a/src/content/docs/docs/interrupts.mdx
+++ b/src/content/docs/docs/interrupts.mdx
@@ -925,7 +925,7 @@ the user, for example by asking a multiple-choice question.
 For this use case, use the Genkit instance's `tool()` decorator with `ctx.interrupt()`:
 
 ```python
-from genkit import Genkit, ToolRunContext, tool_response
+from genkit import FinishReason, Genkit, tool_response, ToolRunContext
 from genkit.plugins.google_genai import GoogleAI
 from pydantic import BaseModel, Field
 
@@ -941,7 +941,7 @@ class QuestionInput(BaseModel):
     allow_other: bool = Field(default=False, description='when true, allow write-ins')
 
 @ai.tool()
-def ask_question(input: QuestionInput, ctx: ToolRunContext) -> str:
+async def ask_question(input: QuestionInput, ctx: ToolRunContext) -> str:
     """Use this to ask the user a clarifying question."""
     # Interrupt with metadata that the caller can use.
     ctx.interrupt({
@@ -955,6 +955,9 @@ Note that the output type of an interrupt tool corresponds to the response data
 you will provide when resuming, as opposed to something that will be automatically
 populated by the tool function.
 
+To resume, build each entry in `tool_responses` with `tool_response` from `genkit`,
+wrapping the interrupted `ToolRequestPart` in `Part(root=...)` (see below).
+
 ### Use interrupts
 
 Interrupts are passed into the `tools` list when generating content, just like
@@ -976,8 +979,8 @@ If you've passed one or more interrupts to your generate call, you
 need to check the response for interrupts so that you can handle them:
 
 ```python
-# You can check the 'finish_reason' attribute of the response
-if response.finish_reason == 'interrupted':
+# You can check the finish_reason (use the enum for comparisons)
+if response.finish_reason == FinishReason.INTERRUPTED:
     print("Generation interrupted.")
 
 # Or you can check if any interrupt requests are on the response
@@ -991,8 +994,8 @@ if response.interrupts:
 ```
 
 Responding to an interrupt is done using the `tool_responses` option on a subsequent
-`generate` call, making sure to pass in the existing message history. Use the
-`tool_response` helper function to construct the response:
+`generate` call, making sure to pass in the existing message history. Use `tool_response`
+with each interrupted request and the user's answer:
 
 ```python
 from genkit import tool_response
@@ -1000,10 +1003,12 @@ from genkit import tool_response
 # Get the user's answer (e.g., from user input)
 user_answer = 'b'  # User selected option b
 
+tool_request = response.interrupts[0]
+
 # Resume generation with the tool response
 response = await ai.generate(
     messages=response.messages,
-    tool_responses=[tool_response(response.interrupts[0], user_answer)],
+    tool_responses=[tool_response(tool_request, user_answer)],
     tools=['ask_question'],
 )
 ```
@@ -1014,6 +1019,8 @@ For interactive applications, you'll often need to handle multiple interrupts
 in a loop until the model completes its task:
 
 ```python
+from genkit import tool_response
+
 async def interactive_session():
     response = await ai.generate(
         prompt='Help me plan a backyard BBQ.',
@@ -1055,7 +1062,7 @@ async def interactive_session():
 You can also use interrupts within flows for more structured applications:
 
 ```python
-from genkit import Genkit, ToolRunContext, tool_response
+from genkit import Genkit, ToolRunContext
 from genkit.plugins.google_genai import GoogleAI
 from pydantic import BaseModel, Field
 
@@ -1067,7 +1074,7 @@ class TriviaQuestion(BaseModel):
     answers: list[str] = Field(description='multiple choice answers')
 
 @ai.tool()
-def present_question(input: TriviaQuestion, ctx: ToolRunContext) -> None:
+async def present_question(input: TriviaQuestion, ctx: ToolRunContext) -> None:
     """Presents a trivia question to the user."""
     ctx.interrupt(input.model_dump())
 
diff --git a/src/content/docs/docs/models.mdx b/src/content/docs/docs/models.mdx
index b46351a2..8fc640b3 100644
--- a/src/content/docs/docs/models.mdx
+++ b/src/content/docs/docs/models.mdx
@@ -1662,6 +1662,21 @@ result = await ai.generate(
 )
 ```
 
+You can also pass a per-request API key, provider-specific options, and middleware on a single call. The `config` object allows extra fields for provider-specific parameters, and the `use` parameter accepts middleware to attach to that generation (for example logging or custom behavior):
+
+```python
+result = await ai.generate(
+    model='googleai/gemini-2.5-flash',
+    prompt='Hello',
+    config={
+        'api_key': 'YOUR_API_KEY',
+        'temperature': 0.5,
+        'provider_specific_option': 'value',
+    },
+    use=[{'name': 'my-middleware', 'config': {'option': True}}],
+)
+```
+
 The exact parameters that are supported depend on the individual model and model
 API. However, the parameters in the previous example are common to almost every
 model. The following is an explanation of these parameters:
@@ -1805,6 +1820,14 @@ object's `output` property:
 output = result.output
 ```
 
+For multimodal responses, iterate `result.message.media` (each item is a `Media` value):
+
+```python
+if result.message:
+    for media in result.message.media:
+        print(media.content_type)
+```
+
 #### Handling errors
 
 Note in the prior example that the `output` property can be `None`. This can
diff --git a/src/content/docs/docs/plugin-authoring/overview.mdx b/src/content/docs/docs/plugin-authoring/overview.mdx
index ade14aa5..d6595124 100644
--- a/src/content/docs/docs/plugin-authoring/overview.mdx
+++ b/src/content/docs/docs/plugin-authoring/overview.mdx
@@ -1045,7 +1045,7 @@ topics:
 
 Genkit's capabilities are designed to be extended by plugins. Genkit plugins are
 configurable modules that can provide models, retrievers, indexers, embedders,
-evaluators, and more. You've already seen plugins in action just by using Genkit:
+evaluators, and more (the exact action kinds vary by language SDK). You've already seen plugins in action just by using Genkit:
 
 ```python
 from genkit import Genkit
@@ -1074,8 +1074,8 @@ Python plugins follow a simple pattern. A plugin is a class that:
 
 ```python
 from os import environ
-from genkit import Plugin, ActionKind
-from genkit.core.action import Action, ActionMetadata
+from genkit import Action, ActionKind, Plugin
+from genkit.plugin_api import ActionMetadata
 
 class MyPlugin(Plugin):
     """A custom Genkit plugin."""
@@ -1156,17 +1156,10 @@ prompt as input and generating text, media, or data as output.
 
 ```python
 from os import environ
-from genkit import Plugin, ActionKind, ActionRunContext
-from genkit.core.action import Action, ActionMetadata
-from genkit.blocks.model import model_action_metadata
-from genkit.core.typing import (
-    GenerateRequest,
-    GenerateResponse,
-    GenerateResponseChunk,
-)
-from genkit.core.schema import to_json_schema
-from genkit.types import GenerationCommonConfig
-from collections.abc import AsyncIterator
+
+from genkit import Action, ActionKind, ActionRunContext, Plugin
+from genkit.model import ModelConfig, ModelRequest, ModelResponse, model_action_metadata
+from genkit.plugin_api import ActionMetadata, to_json_schema
 
 class MyModelPlugin(Plugin):
     """Plugin that provides a custom model."""
@@ -1220,7 +1213,7 @@ class MyModelPlugin(Plugin):
                         'output': ['text'],
                     },
                 },
-                config_schema=GenerationCommonConfig,
+                config_schema=ModelConfig,
             )
         ]
 
@@ -1234,9 +1227,9 @@ class MyModelPlugin(Plugin):
             Action object for the model.
         """
         async def generate(
-            request: GenerateRequest,
+            request: ModelRequest,
             ctx: ActionRunContext,
-        ) -> GenerateResponse:
+        ) -> ModelResponse:
             """Generate a response from the model.
 
             Args:
@@ -1244,7 +1237,7 @@ class MyModelPlugin(Plugin):
                 ctx: The action run context.
 
             Returns:
-                A GenerateResponse with the model's output.
+                A ModelResponse with the model's output.
             """
             # Transform Genkit request to your API format
             api_request = self._to_api_request(request)
@@ -1268,12 +1261,12 @@ class MyModelPlugin(Plugin):
                         'systemRole': True,
                         'output': ['text'],
                     },
-                    'customOptions': to_json_schema(GenerationCommonConfig),
+                    'customOptions': to_json_schema(ModelConfig),
                 },
             },
         )
 
-    def _to_api_request(self, request: GenerateRequest) -> dict:
+    def _to_api_request(self, request: ModelRequest) -> dict:
         """Convert Genkit request to API format."""
         # Implementation depends on your API
         pass
@@ -1283,7 +1276,7 @@ class MyModelPlugin(Plugin):
         # Implementation depends on your API
         pass
 
-    def _to_genkit_response(self, response: dict) -> GenerateResponse:
+    def _to_genkit_response(self, response: dict) -> ModelResponse:
         """Convert API response to Genkit format."""
         # Implementation depends on your API
         pass
diff --git a/src/content/docs/docs/rag.mdx b/src/content/docs/docs/rag.mdx
index 9f416a10..80858728 100644
--- a/src/content/docs/docs/rag.mdx
+++ b/src/content/docs/docs/rag.mdx
@@ -806,236 +806,90 @@ _Note_: The `rerank` function is a placeholder for your own logic and is not pro
 
 <Lang lang="python">
 
-RAG is a very broad area and there are many different techniques used to achieve
-the best quality RAG. The core Genkit framework offers three main abstractions
-to help you do RAG:
-
-- **Indexers**: add documents to an "index".
-- **Embedders**: transforms documents into a vector representation
-- **Retrievers**: retrieve documents from an "index", given a query.
-
-These definitions are broad on purpose because Genkit is un-opinionated about
-what an "index" is or how exactly documents are retrieved from it. Genkit only
-provides a `Document` format and everything else is defined by the retriever or
-indexer implementation provider.
+For **Python**, Genkit’s RAG support centers on the shared **[`Document`](/docs/models)** model and **embedders**: use `ai.embed` and `ai.embed_many` with plugin-registered embedder names to produce vectors. Combine those embeddings with your ingestion, storage, and search code, then pass retrieved documents into **`ai.generate(..., docs=...)`** to ground answers.
 
 ### Indexers
 
-The index is responsible for keeping track of your documents in such a way that
-you can quickly retrieve relevant documents given a specific query. This is most
-often accomplished using a vector database, which indexes your documents using
-multidimensional vectors called embeddings. A text embedding (opaquely)
-represents the concepts expressed by a passage of text; these are generated
-using special-purpose ML models. By indexing text using its embedding, a vector
-database is able to cluster conceptually related text and retrieve documents
-related to a novel string of text (the query).
-
-Before you can retrieve documents for the purpose of generation, you need to
-ingest them into your document index. A typical ingestion flow does the
-following:
-
-1. Split up large documents into smaller documents so that only relevant
-   portions are used to augment your prompts – "chunking". This is necessary
-   because many LLMs have a limited context window, making it impractical to
-   include entire documents with a prompt.
-
-   Genkit doesn't provide built-in chunking libraries; however, there are open
-   source libraries available that are compatible with Genkit.
-
-2. Generate embeddings for each chunk. Depending on the database you're using,
-   you might explicitly do this with an embedding generation model, or you might
-   use the embedding generator provided by the database.
-3. Add the text chunk and its index to the database.
-
-You might run your ingestion flow infrequently or only once if you are working
-with a stable source of data. On the other hand, if you are working with data
-that frequently changes, you might continuously run the ingestion flow (for
-example, in a Cloud Firestore trigger, whenever a document is updated).
-
-#### Defining an indexer
-
-To define a custom indexer in Python, use the `ai.define_indexer()` method.
-Your indexer function receives a list of `Document` objects plus an optional
-options object.
-
-```python
-from genkit import Genkit, Document
-
-ai = Genkit(plugins=[...])
-
-async def my_indexer(documents: list[Document], options: dict | None) -> None:
-    for doc in documents:
-        # Generate an embedding for the document (example only).
-        embedding = await ai.embed(
-            embedder='vertexai/text-embedding-004',
-            content=doc,
-        )
-
-        # Store the document + embedding in your vector store.
-        # (Pseudocode — use your actual DB client and schema.)
-        await my_vector_db.store(
-            text=doc.text(),
-            embedding=embedding[0].embedding,
-            metadata=doc.metadata,
-        )
-
-ai.define_indexer(name='my_indexer', fn=my_indexer)
-```
-
-#### Using an indexer
+### Retrievers
 
-Once defined, you can use your indexer to ingest documents:
+RAG is a very broad area and there are many different techniques used to achieve
+the best quality RAG. Conceptually, most pipelines involve:
 
-```python
-from genkit import Document, Part, TextPart
-
-# Prepare your documents
-documents = [
-    Document.from_text('The quick brown fox jumps over the lazy dog.'),
-    Document.from_text('Pack my box with five dozen liquor jugs.'),
-    Document(
-        content=[Part(root=TextPart(text='Custom document with metadata.'))],
-        metadata={'source': 'manual', 'category': 'example'},
-    ),
-]
+- **Indexing**: add documents to a store (often chunk → embed → upsert vectors).
+- **Embedders**: turn text (or other content) into vectors via a Genkit embedder action.
+- **Retrieval**: fetch relevant chunks for a query (your DB client or search layer).
+- **Generation**: call the model with the user question and retrieved context.
 
-# Index the documents
-await ai.index(
-    indexer='my_indexer',
-    documents=documents,
-)
-```
+Genkit provides the [`Document`](/docs/models) model and **embedder** actions for turning content into vectors you can use throughout that pipeline.
 
-#### Using an IndexerRef
+### Ingestion
 
-You can also use `IndexerRef` to reference an indexer by name:
+Typical steps: chunk source text, compute embeddings with `ai.embed` or `ai.embed_many`, then write text + vector + metadata to your database. The following is illustrative only—the storage calls are placeholders for your client library.
 
 ```python
-from genkit.blocks.retriever import IndexerRef
+from genkit import Document, Genkit
 
-# Reference an indexer
-my_indexer_ref = IndexerRef(name='my_indexer')
+ai = Genkit(plugins=[...])  # e.g. VertexAI / GoogleAI for embedders
 
-# Use the reference with ai.index()
-await ai.index(
-    indexer=my_indexer_ref,
-    documents=documents,
-)
+async def ingest_chunks(chunks: list[Document], embedder: str) -> None:
+    for doc in chunks:
+        embeddings = await ai.embed(embedder=embedder, content=doc)
+        vector = embeddings[0].embedding
+        # await my_vector_db.upsert(text=doc.text, embedding=vector, metadata=doc.metadata)
+        _ = vector  # replace with real persistence
 ```
 
 ### Embedders
 
-An embedder is a function that takes content (text, images, audio, etc.) and
-creates a numeric vector that encodes the semantic meaning of the original
-content. As mentioned above, embedders are leveraged as part of the process of
-indexing, however, they can also be used independently to create embeddings
-without an index.
+Use plugin-registered embedder names (for example `vertexai/text-embedding-005`) with `ai.embed` / `ai.embed_many`. Keep the same model for indexing and query embedding when using vector similarity search.
 
-### Retrievers
+### Generation with `docs`
 
-A retriever is a concept that encapsulates logic related to any kind of document
-retrieval. The most popular retrieval cases typically include retrieval from
-vector stores, however, in Genkit a retriever can be any function that returns
-data.
-
-To create a retriever, you can use one of the provided implementations or create
-your own.
-
-## Defining a RAG flow
-
-The following examples show how you could ingest a collection of restaurant menu
-PDF documents into a vector database and retrieve them for use in a flow that
-determines what food items are available. Note that indexing is outside the scope
-of Genkit and you should use the SDKs/APIs provided by the vector store you are using.
-
-The following example shows how you might use a retriever in a RAG flow. Like
-the retriever example, this example uses the Firestore vector store helper from
-the Firebase plugin.
+Implement search with your datastore (vector search, hybrid search, etc.), producing a `list[Document]`. Pass that list as **`docs`** to **`ai.generate`** so the model can use the context.
 
 ```python
-from google.cloud import firestore
-from google.cloud.firestore_v1.base_vector_query import DistanceMeasure
-
-from genkit import Genkit
-from genkit.plugins.firebase import define_firestore_vector_store
+from genkit import Document, Genkit
 from genkit.plugins.google_genai import VertexAI
 
 ai = Genkit(
-    plugins=[
-        VertexAI(location='us-central1'),
-    ],
+    plugins=[VertexAI(location='us-central1')],
+    model='vertexai/gemini-2.5-flash',
 )
 
-firestore_client = firestore.Client()
-
-# Define a Firestore vector store retriever (returns the retriever action name).
-# Important: use the same embedding model for indexing and retrieval.
-EMBEDDING_MODEL = 'vertexai/text-embedding-004'
-
-RETRIEVER_NAME = define_firestore_vector_store(
-    ai,
-    name='my_firestore_retriever',
-    embedder=EMBEDDING_MODEL,
-    collection='mycollection',
-    vector_field='embedding',
-    content_field='text',
-    firestore_client=firestore_client,
-    distance_measure=DistanceMeasure.EUCLIDEAN,
-)
+async def fetch_context(user_query: str, limit: int = 3) -> list[Document]:
+    """Replace with your vector store: embed the query, search, map hits to Documents."""
+    embeddings = await ai.embed(
+        embedder='vertexai/text-embedding-005',
+        content=user_query,
+    )
+    _query_vector = embeddings[0].embedding
+    # rows = await my_vector_db.search(_query_vector, limit=limit)
+    # return [Document.from_text(r.text, metadata=r.meta) for r in rows]
+    return [
+        Document.from_text('Example: seasonal dessert — fruit tart (contains dairy).'),
+        Document.from_text('Example: allergen note — sorbet is dairy-free.'),
+    ][:limit]
 
 @ai.flow()
 async def qa_flow(query: str) -> str:
-    retrieved = await ai.retrieve(
-        retriever=RETRIEVER_NAME,
-        query=query,
-        options={'limit': 3},
-    )
+    docs = await fetch_context(query)
     response = await ai.generate(
-        model='vertexai/gemini-2.5-flash',
         prompt=query,
-        docs=retrieved.documents,
+        docs=docs,
     )
     return response.text
 ```
 
-#### Run the retriever flow
+#### Run the flow
 
 ```python
-result = await qa_flow('Recommend a dessert from the menu while avoiding dairy and nuts')
+result = await qa_flow('Recommend a dessert while avoiding dairy and nuts')
 print(result)
 ```
 
-The output for this command should contain a response from the model, grounded
-in the indexed `menu.pdf` file.
-
-## Write your own retrievers
-
-It's also possible to create your own retriever. This is useful if your
-documents are managed in a document store that is not supported in Genkit (eg:
-MySQL, Google Drive, etc.). The Genkit SDK provides flexible methods that let
-you provide custom code for fetching documents. You can also define custom
-retrievers that build on top of existing retrievers in Genkit and apply advanced
-RAG techniques (such as reranking or prompt extensions) on top.
-
-```python
-from genkit import Document, Genkit
-from genkit.types import RetrieverResponse
-
-ai = Genkit(plugins=[...])
-
-async def my_retriever(query: Document, options: dict | None) -> RetrieverResponse:
-    # Use query.text() and options to do a lookup in your datastore.
-    return RetrieverResponse(documents=[Document.from_text('Hello'), Document.from_text('World')])
-
-ai.define_retriever(name='my_retriever', fn=my_retriever)
-```
-
-Then you'll be able to use your retriever with `ai.retrieve`:
+### Composing search and prompts
 
-```python
-retrieved = await ai.retrieve(retriever='my_retriever', query=query)
-docs = retrieved.documents
-```
+Any async function that returns `list[Document]` can supply context: call databases, APIs, or custom ranking, then pass the result to `generate(docs=...)`. Advanced patterns (reranking, prompt expansion) are ordinary Python code composed around `embed` and `generate`.
 
 </Lang>