Skip to content

feat(agentic-ai): extract documents from tool call results into user messages#6999

Open
maff wants to merge 38 commits into
mainfrom
agentic-ai-document-tool-call-results
Open

feat(agentic-ai): extract documents from tool call results into user messages#6999
maff wants to merge 38 commits into
mainfrom
agentic-ai-document-tool-call-results

Conversation

@maff
Copy link
Copy Markdown
Member

@maff maff commented Apr 20, 2026

Description

Tool call results containing Camunda Documents were serialized as base64 strings embedded in JSON via a custom DocumentToContentSerializer. Most LLMs cannot properly interpret this format.

This PR extracts documents from tool call results into a synthetic UserMessage with DocumentContent blocks appended after the ToolCallResultMessage. The tool result text retains document references (serialized by the standard DocumentSerializer) so the model can correlate references with actual content.

Document message format

A single UserMessage (metadata: toolCallDocuments=true) is appended with:

  • A preamble: "Documents extracted from tool call results:"
  • Per document: an XML tag <document tool-name="…" tool-call-id="…" document-short-id="…" filename="…" /> followed by its DocumentContent block
  • document-short-id is the first UUID segment of the documentId, sufficient for in-conversation correlation

Event messages containing documents receive the same <document> XML labels for consistency (without tool-name/tool-call-id attributes).

Document extraction architecture

Document extraction is driven by ToolCallResultDocumentExtractor, called from AgentMessagesHandlerImpl after the ToolCallResultMessage is built. For each result it asks GatewayToolHandlerRegistry.handlerForToolDefinition(toolName) for the responsible handler:

  • if a handler manages the tool → delegate to GatewayToolHandler.extractDocuments(ToolCallResult) — handlers walk their own typed content (sealed-type switch over McpContent / A2aSendMessageResult);
  • otherwise → fall back to ContentTreeDocumentWalker.extractDocumentsFromContent(...), a stateless static utility that recursively walks Map, Collection, Object[] and Document nodes (used for plain BPMN tools whose content is a raw FEEL tree).

The default GatewayToolHandler.extractDocuments implementation also delegates to the walker, so third-party handlers that return raw maps work without overriding. ContentTreeDocumentWalker is public so handlers whose typed content embeds raw user-generated subtrees can call it directly on those subtrees.

Gateway tool handlers

Gateway handlers (MCP, A2A) restore typed content as ToolCallResult.content() (the pre-PR shape: List<McpContent> for MCP, A2aSendMessageResult for A2A) and override extractDocuments to walk that typed structure:

  • MCP: collects documents from McpDocumentContent and McpEmbeddedResourceContent.BlobDocumentResource via a sealed-type switch over McpContent.
  • A2A: walks A2aMessage.contents at the root, plus A2aTask.artifacts and (recursively) A2aTask.history to collect DocumentContent entries.

This removes the earlier requirement that handlers preserve raw Map/List/Document content trees just so the document extractor's instanceof walk could find nested Document instances.

Message window memory

The synthetic document UserMessage does not count toward the maxMessages context window limit and is evicted together with its associated ToolCallResultMessage.

Cross-provider viability test

A manual cross-provider integration test (DocumentToolCallResultsIT, @Disabled by default) validates that real LLM providers can receive and reason about PDF documents extracted from tool call results. It covers single documents, multiple documents, and nested structures. Provider configs: OpenAI, Anthropic, AWS Bedrock, and OpenAI-compatible (Docker Model Runner, Ollama). CPT judge assertions verify the model correctly extracted facts from the PDFs.

Deleted

DocumentToContentModule, DocumentToContentResponseModel, DocumentToContentSerializer, and related tests.

ADR

See ADR-004: Document Handling in Tool Call Results.

Related issues

closes #7005

Checklist

  • Tests/Integration tests for the changes have been added if applicable.

@maff maff added the e2e-tests label Apr 20, 2026
@maff maff changed the title fix(agentic-ai): attach documents from tool call results as user messages feat(agentic-ai): extract documents from tool call results into user messages Apr 20, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes how Agentic-AI represents Camunda Documents returned from tool calls so LLMs can consume them effectively: documents are extracted out of tool call result payloads into a synthetic follow-up UserMessage containing DocumentContent blocks, while tool result text keeps document references (standard document serializer) for correlation.

Changes:

  • Introduces ToolCallResultDocumentExtractor and integrates it into AgentMessagesHandlerImpl to emit a synthetic document UserMessage (and to append documents to event messages).
  • Updates gateway tool handlers (MCP, A2A) to preserve raw Map/List/Document content trees to enable document extraction, and simplifies LangChain4J tool result serialization to JSON + document references.
  • Removes the deprecated document-to-base64-in-JSON infrastructure and updates unit/e2e tests + ADR accordingly.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/mcp/discovery/McpClientGatewayToolHandlerTest.java Adjusts test to simulate engine-style raw map content for MCP results.
connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/memory/runtime/MessageWindowRuntimeMemoryTest.java Adds tests for excluding synthetic doc messages from the max context window and evicting them with tool results.
connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/tool/ToolCallConverterTest.java Updates expectations: tool results serialize documents as references (not embedded base64/text blobs).
connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/document/DocumentToContentSerializerTest.java Deletes tests for the removed document-to-content JSON serializer.
connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/ContentConverterTest.java Updates object content serialization expectations to document references.
connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/agent/ToolCallResultDocumentExtractorTest.java Adds coverage for recursive extraction from mixed map/list/array content trees and grouping by tool call.
connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/agent/AgentMessagesHandlerTest.java Adds/updates tests for synthetic document UserMessage creation, ordering, and event document extraction.
connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/a2a/client/agentic/tool/A2aGatewayToolHandlerTest.java Verifies A2A handler preserves raw content so extractor can find nested documents.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/model/message/UserMessage.java Adds METADATA_TOOL_CALL_DOCUMENTS key to mark synthetic doc messages.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/mcp/discovery/McpClientGatewayToolHandler.java Preserves raw MCP result content and warns when expected shape differs.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/autoconfigure/AgenticAiConnectorsAutoConfiguration.java Wires ToolCallResultDocumentExtractor bean and injects it into AgentMessagesHandlerImpl.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/memory/runtime/MessageWindowRuntimeMemory.java Excludes synthetic doc messages from maxMessages and evicts them with related tool results.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/tool/ToolCallConverterImpl.java Removes ContentConverter dependency; serializes tool results via ObjectMapper to preserve document references.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/document/DocumentToContentSerializer.java Deletes the custom document-to-content serializer.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/document/DocumentToContentResponseModel.java Deletes the Claude-style response model used by the removed serializer.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/document/DocumentToContentModule.java Deletes the Jackson module that registered the removed serializer.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/configuration/AgenticAiLangchain4JFrameworkConfiguration.java Updates ToolCallConverter bean wiring after ToolCallConverterImpl constructor change.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/ContentConverterImpl.java Stops using a dedicated ObjectMapper copy with the removed module; uses injected ObjectMapper directly.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/agent/ToolCallResultDocumentExtractor.java New extractor to find Documents in arbitrary Map/List/Collection/array trees.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/agent/AgentMessagesHandlerImpl.java Creates synthetic doc UserMessage after tool results and appends docs to event messages.
connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/a2a/client/agentic/tool/A2aGatewayToolHandler.java Preserves raw content for document extraction instead of converting to typed A2A result POJOs.
connectors/agentic-ai/pom.xml Adds jackson-datatype-document test dependency for document reference serialization in tests.
connectors/agentic-ai/docs/adr/004-document-handling-in-tool-call-results.plan.md Adds implementation plan for the new extraction approach.
connectors/agentic-ai/docs/adr/004-document-handling-in-tool-call-results.md Adds ADR describing the chosen approach and tradeoffs.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../outboundconnector/L4JAiAgentConnectorToolCallingTests.java Updates e2e assertions for document reference tool results + synthetic doc user message.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../outboundconnector/L4JAiAgentConnectorMcpIntegrationTests.java Adds e2e test verifying document extraction from MCP image tool results.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../outboundconnector/BaseL4JAiAgentConnectorTest.java Removes legacy DownloadFileToolResult record using deleted response model.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../jobworker/L4JAiAgentJobWorkerToolCallingTests.java Updates jobworker e2e assertions for new document extraction behavior.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../jobworker/L4JAiAgentJobWorkerMcpIntegrationTests.java Adds jobworker e2e test verifying document extraction from MCP image tool results.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../jobworker/BaseL4JAiAgentJobWorkerTest.java Removes legacy DownloadFileToolResult record using deleted response model.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../common/L4JAiAgentA2aIntegrationTestSupport.java Re-serializes expected A2A results via raw Map to match runtime raw-content behavior.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../AiAgentTestFixtures.java Adds helper to extract document short-id from serialized tool result reference text.

@maff maff force-pushed the agentic-ai-document-tool-call-results branch 3 times, most recently from 0f6b5f9 to df579e2 Compare April 21, 2026 11:46
@maff maff force-pushed the agentic-ai-document-tool-call-results branch from df579e2 to 83ebf6b Compare April 22, 2026 16:52
@maff maff requested a review from Copilot April 22, 2026 18:59
@maff maff force-pushed the agentic-ai-document-tool-call-results branch from 83ebf6b to 0f823b4 Compare April 22, 2026 19:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 2 comments.

Comment thread connectors/agentic-ai/pom.xml
@maff maff requested a review from Copilot April 23, 2026 06:03
@maff maff marked this pull request as ready for review April 23, 2026 06:03
@maff maff requested review from a team as code owners April 23, 2026 06:03
@maff maff requested review from ztefanie and removed request for ztefanie April 23, 2026 06:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 39 changed files in this pull request and generated 1 comment.

Comment thread connectors/agentic-ai/docs/adr/004-document-handling-in-tool-call-results.md Outdated
maff added 24 commits May 7, 2026 13:38
Track effective message count as an int instead of recounting via
stream on every eviction iteration.
- add JSON-aware ToolExecutionResultMessageEqualsPredicate for
  order-independent comparison of tool result JSON in E2E tests
- lowercase all inline comments introduced in this PR
- replace string concatenation with String.formatted() in XML tag
  assertions
- inline E2E message order comments as suffixes on assertions
- assert exact base64 data in MCP image E2E tests
- restore chat request count assertions in tool calling E2E tests
- strengthen MCP handler test to assert on content values, not just
  list size
- add Javadoc example for extractDocumentShortId showing the document
  reference JSON format
- update document user message format to XML tags with correlation
  attributes (tool, call-id, document-short-id, filename)
- document the message window memory behavior for document messages
- document event document labeling consistency
- add future optimization note for UserMessage rebuild strategy
- update walker to include Object[] support
- fix provider list (remove specific provider references)
- delete implementation plan file
Only decrement the effective message count when the evicted message is
not a tool-call document message, preventing under-counting if an
orphaned document message ends up at the eviction position.
Move XML tag building, attribute escaping, and document short ID
extraction into a dedicated DocumentXmlTag record with factory methods
and toXml() serialization. Tests moved to DocumentXmlTagTest.
…ll results

Add a manual CPT test that validates real LLM providers can receive and
reason about PDF documents extracted from tool call results via the
synthetic UserMessage with XML correlation tags.

The test covers three scenarios with increasing complexity:
- Single document returned from a tool call
- Multiple documents returned as a list
- Documents embedded in a nested Map structure

A BPMN process downloads PDFs from WireMock, then uses FEEL script tasks
inside an ad-hoc subprocess to assemble tool results of varying shapes.
The AI Agent connector processes these with a real LLM, and CPT judge
assertions verify the model correctly extracted facts from the PDFs.

Provider configs (toggled via env vars): OpenAI, Anthropic, AWS Bedrock,
and OpenAI-compatible (Docker Model Runner). The test is @disabled by
default and not part of CI.
Add Ollama provider configs (qwen3.5, llama3.1:8b) with OLLAMA_URL env
var. Add .disabled() toggle on ProviderConfig and a modelFilters
allowlist for quickly focusing test runs on specific models without
commenting code.
Move DocumentToolCallResultsIT to io.camunda.connector.e2e.agenticai.aiagent
package, rename PDF fixtures to descriptive names (project-launch.pdf,
headcount-report.pdf, author-info.pdf) under document-tool-call-results/
directory, and drop the cpt- prefix from the BPMN file.
Move document extraction off the raw-Map content tree and into the
GatewayToolHandler SPI. Each handler now exposes extractDocuments and
walks its own typed content (sealed-type switch); the generic content
tree walker stays as the default fallback for plain BPMN tools and
handlers that return raw maps.

Removes the constraint that gateway handlers must keep raw Map content
solely so the instanceof-based walker can find Documents inside them.

* New ContentTreeDocumentWalker: public utility extracted from the old
  ToolCallResultDocumentExtractor walker. Public so third-party handlers
  whose typed content embeds raw subtrees can reuse it.
* GatewayToolHandler.extractDocuments(ToolCallResult): default delegates
  to ContentTreeDocumentWalker, override to walk a typed structure.
* GatewayToolHandlerRegistry.extractDocuments routes per-result to the
  responsible handler, falling back to the walker.
* ToolCallResultDocumentExtractor becomes a thin coordinator that
  iterates ToolCallResults and calls the registry; constructor now takes
  the registry.
* MCP handler restores typed McpClientCallToolResult content and walks
  McpDocumentContent and McpEmbeddedResourceContent.BlobDocumentResource.
  Drops the getRawMcpContent workaround.
* A2A handler restores typed A2aSendMessageResult content and walks
  A2aMessage.contents, A2aTask.artifacts, and A2aTask.history (recursive).
  Drops the raw-content preservation comment.
* ADR-004 updated: replaces the "must preserve raw content" subsection
  with a "Per-handler document extraction" subsection.
* Tests split into ContentTreeDocumentWalkerTest (walker behaviour) and
  ToolCallResultDocumentExtractorTest (registry routing). Per-variant
  unit coverage added on McpClientGatewayToolHandlerTest and
  A2aGatewayToolHandlerTest.

https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…utility

Two follow-up adjustments to the per-handler document extraction design:

1. ToolCallResultDocumentExtractor is the routing entrypoint. It asks
   GatewayToolHandlerRegistry.handlerForToolDefinition(name) to find a
   managing handler and delegates if one exists; otherwise it walks the
   raw content tree itself. The fallback no longer lives inside the
   registry — gateway handlers contribute extraction for the results
   they manage; the generic walker handles everything else. Drops
   GatewayToolHandlerRegistry.extractDocuments.

2. ContentTreeDocumentWalker is a fully static utility (private ctor,
   static methods). The previous singleton INSTANCE field was the worst
   of both worlds — the walker is stateless, has no dependencies, and
   the SPI default in GatewayToolHandler needs static-style access since
   interface defaults can't be DI'd. Tests use the real walker directly.

https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…ndlers

The previous refactor commit changed the shape of the transformed
ToolCallResult.content() in both gateway handlers more than necessary:

* MCP set content to the full McpClientCallToolResult wrapper
  ({name, content[], isError}) instead of just the content list as it
  was pre-workaround (List<McpContent>). Reverted to passing
  callToolResult.content() — same shape the LLM saw before commit
  85c6617 introduced the raw-Map workaround. McpClientGatewayToolHandler
  .extractDocuments now walks List<McpContent> from content().
* A2A had no behavioural change pre-vs-post-workaround beyond the
  variable naming and builder usage style — the previous commit
  introduced cosmetic churn. Restored the original method shape with
  the sendMessageResult variable and explicit toolCallResultBuilder.

Tests updated to match the restored List<McpContent> content shape.

https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…ss review nits

Review feedback follow-up:

* GatewayToolHandler javadoc: drop stale `ContentTreeDocumentWalker#INSTANCE`
  link, point at the static `extractDocumentsFromContent` method.
* ContentTreeDocumentWalkerTest: static-import the walker method (less noise)
  and convert the scalar-content test to a parameterised one that also covers
  null input.
* L4J E2E tests (4 sites): drop the regex-based document short-id extraction.
  `AiAgentTestFixtures.readDocumentReference(String)` parses the tool result
  JSON, asserts the camunda document discriminator, and returns a typed
  DocumentReferenceFields record with storeId / documentId / contentType /
  fileName + shortId(). Tests now assert the parsed contentType against the
  parameterised mimeType and read shortId from the parsed record.
* docs/reference/ai-agent.md §19: add `extractDocuments` to the
  GatewayToolHandler interface listing and a new "Document Extraction from
  Tool Call Results" subsection describing the extractor → registry → handler
  routing with ContentTreeDocumentWalker as the fallback.
* docs/reference/mcp.md §7 and a2a.md §7: tool call execution flows now
  document the typed transformed content shape and the per-handler
  extractDocuments step.

Regression test:

* New GatewayToolResultDocumentSerializationTest pins the JSON wire format
  produced by the connector ObjectMapper for Documents nested inside
  McpClientCallToolResult and A2aSendMessageResult — they must serialize as
  `camunda.document.type` references via DocumentSerializer, never as base64
  payloads or raw DocumentReference POJOs. Covers root McpDocumentContent,
  embedded BlobDocumentResource, A2aMessage contents, A2aTask artifacts and
  recursive history, plus an explicit base64-must-not-appear assertion.

https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…onTest

The SDK already covers Camunda document serialization fidelity in
DocumentSerializerTest (connector-runtime/jackson-datatype-document). Once a
Document field is reachable by Jackson and the document module is registered,
DocumentSerializer is invoked regardless of the parent type — neither
McpDocumentContent, BlobDocumentResource, nor A2A DocumentContent overrides
this. The test added in 843e2e2 only re-ran the SDK serializer through
different wrapper objects and would have failed in lockstep with the SDK
tests, providing no additional signal. It also mixed MCP and A2A concerns in
one test class, which should have been split per handler if anything.

https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…n, stale ADR reference

* AiAgentTestFixtures.findFirstCamundaDocumentNode: replace deprecated
  JsonNode.fields() with JsonNode.properties() (CodeQL: deprecated method
  invocation).
* ADR-004: the "Per-handler document extraction" subsection still pointed at
  ContentTreeDocumentWalker.INSTANCE. The walker is now a static utility —
  point readers at extractDocumentsFromContent(...) instead. The Java javadoc
  on GatewayToolHandler was already fixed; this brings the ADR in line.

https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
Moves the synthetic UserMessage assertion logic, document reference parsing,
and the content-block helper out of AiAgentTestFixtures and the per-test
duplicates into a dedicated ToolCallResultDocumentAssertions class.
assertExtractedDocumentsUserMessage takes ExtractedDocument varargs so it
can also assert UserMessages carrying multiple tool call results, and
builds expected XML tags via the production DocumentXmlTag. Parsed
references use the production DocumentReferenceModel.CamundaDocumentReferenceModel,
keeping the test in lockstep with the on-the-wire format.
…xtractor

Tool call result name is set on every production path (MCP/A2A handlers,
forCancelledToolCall, BPMN _meta.name) and events are filtered out before
the extractor runs. Propagating null produces a tag without the tool-name
attribute via DocumentXmlTag.appendAttribute, which is the right shape for
malformed inputs anyway.
…lId, toolName)

Aligns DocumentXmlTag's record fields and factory with the rest of the
new code (ToolCallDocuments, ExtractedDocument), which all put toolCallId
before toolName/toolCallName. The XML attribute order in toXml() is
unchanged.
Adds a single summary log per extractor invocation (results processed,
results with documents, total document count), plus targeted DEBUG logs
in the MCP and A2A gateway handlers when their extractDocuments path
silently returns an empty list because the content has an unexpected
shape.
@maff maff force-pushed the agentic-ai-document-tool-call-results branch from 97d21e3 to 5707929 Compare May 7, 2026 11:38
maff added a commit that referenced this pull request May 7, 2026
Refresh ADR-005 §"Tool Call Result Routing" and the Phase E3 section of
the implementation plan with the agreed design:

- single decision point at the ChatClient SPI boundary; the strategy is a
  pure function `apply(ChatRequest, ModelCapabilities) → (ChatRequest,
  List<UserMessage>)` that walks the request once and routes each document
  it finds — no extract-then-restore double pass
- tool-result-message documents are routed against
  `capabilities.toolResultModalities()`: inline-supported docs stay on
  `ToolCallResult.contentBlocks`; the rest fall back to a synthetic
  `UserMessage` (existing PR #6999 shape, `METADATA_TOOL_CALL_DOCUMENTS`)
- user-message and event-message documents are validated against
  `capabilities.userMessageModalities()`: supported docs stay inline,
  unsupported docs fail loud (`ConnectorException`)
- synthetic UMs land in `RuntimeMemory` inside `ChatClient.chat(...)` so
  the persisted `agentContext.conversation` matches the wire exactly —
  replay across iterations stays deterministic
- `AgentMessagesHandlerImpl` drops the `documentExtractor` field, the
  `createDocumentMessageForToolResults` private method, and the line-134
  call site — strategy owns extraction
- TODO captured: revisit `ChatClient` ↔ `BaseAgentRequestHandler`
  boundary post-Phase E; ChatClient now owns three responsibilities

Also bumps the bundled `anthropic-messages` `tool-result` modalities from
`[text, image]` to `[text, image, pdf]` — Anthropic's
`ToolResultBlockParam.Content.Block.ofDocument(...)` SDK factory confirms
PDF support in tool results. `BundledCapabilityMatrixTest` adjusted to
match.
maff added a commit that referenced this pull request May 7, 2026
…on (Phase E3+E4)

Single-pass routing of every document found in a ChatRequest at the
ChatClient boundary, plus the per-impl native multimodal emission paths
that consume the routed tool-result `contentBlocks`. Combined into one
phase because the bundled capability matrix declares modalities that the
impls have to actually emit — shipping E3 alone would silently drop
inline-routed documents on the floor between phases.

ToolCallResultStrategy (`framework/strategy/`):
- pure function `apply(ChatRequest, ModelCapabilities) -> (ChatRequest, List<UserMessage>)`
- single walk over the request:
  - tool-result documents -> `toolResultModalities`: inline-supported docs go onto
    `ToolCallResult.contentBlocks`; the rest fall back to a synthetic UserMessage
    (PR #6999 shape, `METADATA_TOOL_CALL_DOCUMENTS=true`)
  - user-message and event-message documents -> `userMessageModalities`: supported
    docs stay inline; unsupported docs throw `ConnectorException` (no synthesis
    fallback for user messages, mirroring L4J `DocumentConversionException`)

ChatClientImpl runs the strategy after capability resolution and persists
the synthetic context messages into `RuntimeMemory` immediately after the
anchor `ToolCallResultMessage` so the persisted `agentContext.conversation`
matches what the model saw on the wire (deterministic replay). The
clear()+addMessages() insertion dance is flagged with a TODO to revisit
once the ChatClient<->BARQ boundary settles after Phase E.

AgentMessagesHandlerImpl drops `ToolCallResultDocumentExtractor` from its
constructor and no longer creates `documentMessage` itself — extraction
is now exclusively the strategy's responsibility. The PR #6999 walker,
per-handler `extractDocuments` hook, XML correlation tag, and window-count
exclusion are reused unchanged.

Native multimodal emission (image + PDF only):
- AnthropicMessagesChatModelApi: `ContentBlockParam.ofImage(...)` + `ofDocument(...)`
  on user messages; `ToolResultBlockParam.Content.Block.ofImage(...)` + `ofDocument(...)`
  with `contentOfBlocks(...)` on tool results when contentBlocks is populated.
  Adds `ObjectMapper` to the impl + factory + Spring config for JSON-serialised
  inline tool-result text bodies.
- OpenAiResponsesChatModelApi: `ResponseInputContent.ofInputImage(...)` /
  `ofInputFile(...)` on user messages via
  `EasyInputMessage.contentOfResponseInputMessageContentList(...)`;
  `ResponseFunctionCallOutputItem.ofInputImage(...)` / `ofInputFile(...)` on
  tool results via `outputOfResponseFunctionCallOutputItemList(...)`.
- OpenAiChatCompletionsChatModelApi: `ChatCompletionContentPart.ofImageUrl(...)`
  + `ofFile(...)` on user messages via `addUserMessageOfArrayOfContentParts(...)`.
  Tool messages stay text-only (SDK enforces `ChatCompletionContentPartText`-only).

Bundled capability matrix: anthropic-messages tool-result modality bumped
to `[text, image, pdf]` (verified via `ToolResultBlockParam.Content.Block.ofDocument`);
openai-completions user-message bumped to `[text, image, pdf]`;
openai-responses user-message bumped to `[text, image, pdf]`.

Tests: 1380 unit tests + 3 wire-format e2e tests pass. New
`ToolCallResultStrategyImplTest` (8 cases) covers inline routing, fallback
synthesis, single-pass split, ordering, user-message validation, and the
no-document no-op path. `AgentMessagesHandlerTest` synthesis assertions
migrated to the strategy test; new test pins that `addUserMessages` no
longer emits a synthetic UserMessage.
Copy link
Copy Markdown
Contributor

@nikonovd nikonovd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my last comments 🍊

<artifactId>camunda-process-test-spring</artifactId>
<version>${version.camunda}</version>
</dependency>
<dependency>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 this is already a transitive dependency of camunda-process-test-spring


@Override
public List<Document> extractDocuments(ToolCallResult toolCallResult) {
if (!(toolCallResult.content() instanceof A2aSendMessageResult result)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ similar as we discussed in a previous PR: Carrying a diamond type on the base interface would avoid the need for a type check. Feel free to ignore 😄


var documents = handler.extractDocuments(toolCallResult);

assertThat(documents).containsExactly(document);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ does this test work reliably given the fact that a mock is supplied here?

var documents = handler.extractDocuments(toolCallResult);

// artifacts before history
assertThat(documents).containsExactly(artifactDoc, historyDoc);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ same question here

}

@Test
void usesContentTreeWalkerWhenNoHandlerManagesTheToolCall() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ the test name suggests we use the default content walker fallback, but we actually are not verifying it. Maybe we could use a spy here?

Comment on lines +159 to +160
assertThat(e.toolCallId()).isNull();
assertThat(e.toolCallName()).isNull();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ i guess that would contain event subprocess tool calls, right?

could it potentially contain other documents as well and would this design have any undesired side effects?

Comment on lines +165 to +227
void integrationWithRealRegistry_fallsBackToWalkerWhenNoHandlerMatches() {
final var realExtractor =
new ToolCallResultDocumentExtractor(new GatewayToolHandlerRegistryImpl(List.of()));

final var doc = createDocument("hello", "text/plain", "test.txt");
final var result =
ToolCallResult.builder()
.id("call_1")
.name("plain_bpmn_tool")
.content(Map.of("attachment", doc))
.build();

final var extracted = realExtractor.extractDocuments(List.of(result));

assertThat(extracted).hasSize(1);
assertThat(extracted.getFirst().documents()).containsExactly(doc);
}

@Test
void integrationWithRealRegistry_routesToManagingHandler(@Mock GatewayToolHandler handler) {
final var doc = createDocument("typed", "text/plain", "typed.txt");
final var typedContent = new TypedHandlerContent(doc);

when(handler.type()).thenReturn("typed");
when(handler.isGatewayManaged("typed_tool")).thenReturn(true);
when(handler.extractDocuments(any(ToolCallResult.class))).thenReturn(List.of(doc));

final var realExtractor =
new ToolCallResultDocumentExtractor(new GatewayToolHandlerRegistryImpl(List.of(handler)));

final var result =
ToolCallResult.builder().id("call_1").name("typed_tool").content(typedContent).build();

final var extracted = realExtractor.extractDocuments(List.of(result));

assertThat(extracted).hasSize(1);
assertThat(extracted.getFirst().documents()).containsExactly(doc);
verify(handler).extractDocuments(result);
}

@Test
void integrationWithRealRegistry_doesNotConsultHandlerForUnmanagedTool(
@Mock GatewayToolHandler handler) {
when(handler.type()).thenReturn("typed");
when(handler.isGatewayManaged("plain_tool")).thenReturn(false);

final var realExtractor =
new ToolCallResultDocumentExtractor(new GatewayToolHandlerRegistryImpl(List.of(handler)));

final var doc = createDocument("hello", "text/plain", "test.txt");
final var result =
ToolCallResult.builder()
.id("call_1")
.name("plain_tool")
.content(Map.of("attachment", doc))
.build();

final var extracted = realExtractor.extractDocuments(List.of(result));

assertThat(extracted).hasSize(1);
assertThat(extracted.getFirst().documents()).containsExactly(doc);
verify(handler, never()).extractDocuments(any());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 we should extract those into a nested test class to avoid prefixing and underscoring, WDYT?

}

@Test
void generatesTagWithoutToolAndCallId() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ why could that potentially happen?

class DocumentXmlTagTest {

@Nested
class ToXml {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ why do we need a nested class here if it's still a flat test structure?

Comment on lines +301 to +302
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ why is this library introduced?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agentic AI: Make Camunda Documents in tool call results visible to LLMs

4 participants