feat(agentic-ai): extract documents from tool call results into user messages#6999
feat(agentic-ai): extract documents from tool call results into user messages#6999maff wants to merge 38 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR changes how Agentic-AI represents Camunda Documents returned from tool calls so LLMs can consume them effectively: documents are extracted out of tool call result payloads into a synthetic follow-up UserMessage containing DocumentContent blocks, while tool result text keeps document references (standard document serializer) for correlation.
Changes:
- Introduces
ToolCallResultDocumentExtractorand integrates it intoAgentMessagesHandlerImplto emit a synthetic documentUserMessage(and to append documents to event messages). - Updates gateway tool handlers (MCP, A2A) to preserve raw
Map/List/Documentcontent trees to enable document extraction, and simplifies LangChain4J tool result serialization to JSON + document references. - Removes the deprecated document-to-base64-in-JSON infrastructure and updates unit/e2e tests + ADR accordingly.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/mcp/discovery/McpClientGatewayToolHandlerTest.java | Adjusts test to simulate engine-style raw map content for MCP results. |
| connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/memory/runtime/MessageWindowRuntimeMemoryTest.java | Adds tests for excluding synthetic doc messages from the max context window and evicting them with tool results. |
| connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/tool/ToolCallConverterTest.java | Updates expectations: tool results serialize documents as references (not embedded base64/text blobs). |
| connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/document/DocumentToContentSerializerTest.java | Deletes tests for the removed document-to-content JSON serializer. |
| connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/ContentConverterTest.java | Updates object content serialization expectations to document references. |
| connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/agent/ToolCallResultDocumentExtractorTest.java | Adds coverage for recursive extraction from mixed map/list/array content trees and grouping by tool call. |
| connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/aiagent/agent/AgentMessagesHandlerTest.java | Adds/updates tests for synthetic document UserMessage creation, ordering, and event document extraction. |
| connectors/agentic-ai/src/test/java/io/camunda/connector/agenticai/a2a/client/agentic/tool/A2aGatewayToolHandlerTest.java | Verifies A2A handler preserves raw content so extractor can find nested documents. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/model/message/UserMessage.java | Adds METADATA_TOOL_CALL_DOCUMENTS key to mark synthetic doc messages. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/mcp/discovery/McpClientGatewayToolHandler.java | Preserves raw MCP result content and warns when expected shape differs. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/autoconfigure/AgenticAiConnectorsAutoConfiguration.java | Wires ToolCallResultDocumentExtractor bean and injects it into AgentMessagesHandlerImpl. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/memory/runtime/MessageWindowRuntimeMemory.java | Excludes synthetic doc messages from maxMessages and evicts them with related tool results. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/tool/ToolCallConverterImpl.java | Removes ContentConverter dependency; serializes tool results via ObjectMapper to preserve document references. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/document/DocumentToContentSerializer.java | Deletes the custom document-to-content serializer. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/document/DocumentToContentResponseModel.java | Deletes the Claude-style response model used by the removed serializer. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/document/DocumentToContentModule.java | Deletes the Jackson module that registered the removed serializer. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/configuration/AgenticAiLangchain4JFrameworkConfiguration.java | Updates ToolCallConverter bean wiring after ToolCallConverterImpl constructor change. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/framework/langchain4j/ContentConverterImpl.java | Stops using a dedicated ObjectMapper copy with the removed module; uses injected ObjectMapper directly. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/agent/ToolCallResultDocumentExtractor.java | New extractor to find Documents in arbitrary Map/List/Collection/array trees. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/aiagent/agent/AgentMessagesHandlerImpl.java | Creates synthetic doc UserMessage after tool results and appends docs to event messages. |
| connectors/agentic-ai/src/main/java/io/camunda/connector/agenticai/a2a/client/agentic/tool/A2aGatewayToolHandler.java | Preserves raw content for document extraction instead of converting to typed A2A result POJOs. |
| connectors/agentic-ai/pom.xml | Adds jackson-datatype-document test dependency for document reference serialization in tests. |
| connectors/agentic-ai/docs/adr/004-document-handling-in-tool-call-results.plan.md | Adds implementation plan for the new extraction approach. |
| connectors/agentic-ai/docs/adr/004-document-handling-in-tool-call-results.md | Adds ADR describing the chosen approach and tradeoffs. |
| connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../outboundconnector/L4JAiAgentConnectorToolCallingTests.java | Updates e2e assertions for document reference tool results + synthetic doc user message. |
| connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../outboundconnector/L4JAiAgentConnectorMcpIntegrationTests.java | Adds e2e test verifying document extraction from MCP image tool results. |
| connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../outboundconnector/BaseL4JAiAgentConnectorTest.java | Removes legacy DownloadFileToolResult record using deleted response model. |
| connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../jobworker/L4JAiAgentJobWorkerToolCallingTests.java | Updates jobworker e2e assertions for new document extraction behavior. |
| connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../jobworker/L4JAiAgentJobWorkerMcpIntegrationTests.java | Adds jobworker e2e test verifying document extraction from MCP image tool results. |
| connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../jobworker/BaseL4JAiAgentJobWorkerTest.java | Removes legacy DownloadFileToolResult record using deleted response model. |
| connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../common/L4JAiAgentA2aIntegrationTestSupport.java | Re-serializes expected A2A results via raw Map to match runtime raw-content behavior. |
| connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../AiAgentTestFixtures.java | Adds helper to extract document short-id from serialized tool result reference text. |
0f6b5f9 to
df579e2
Compare
df579e2 to
83ebf6b
Compare
83ebf6b to
0f823b4
Compare
Track effective message count as an int instead of recounting via stream on every eviction iteration.
- add JSON-aware ToolExecutionResultMessageEqualsPredicate for order-independent comparison of tool result JSON in E2E tests - lowercase all inline comments introduced in this PR - replace string concatenation with String.formatted() in XML tag assertions - inline E2E message order comments as suffixes on assertions - assert exact base64 data in MCP image E2E tests - restore chat request count assertions in tool calling E2E tests - strengthen MCP handler test to assert on content values, not just list size - add Javadoc example for extractDocumentShortId showing the document reference JSON format
- update document user message format to XML tags with correlation attributes (tool, call-id, document-short-id, filename) - document the message window memory behavior for document messages - document event document labeling consistency - add future optimization note for UserMessage rebuild strategy - update walker to include Object[] support - fix provider list (remove specific provider references) - delete implementation plan file
Only decrement the effective message count when the evicted message is not a tool-call document message, preventing under-counting if an orphaned document message ends up at the eviction position.
Move XML tag building, attribute escaping, and document short ID extraction into a dedicated DocumentXmlTag record with factory methods and toXml() serialization. Tests moved to DocumentXmlTagTest.
…ll results Add a manual CPT test that validates real LLM providers can receive and reason about PDF documents extracted from tool call results via the synthetic UserMessage with XML correlation tags. The test covers three scenarios with increasing complexity: - Single document returned from a tool call - Multiple documents returned as a list - Documents embedded in a nested Map structure A BPMN process downloads PDFs from WireMock, then uses FEEL script tasks inside an ad-hoc subprocess to assemble tool results of varying shapes. The AI Agent connector processes these with a real LLM, and CPT judge assertions verify the model correctly extracted facts from the PDFs. Provider configs (toggled via env vars): OpenAI, Anthropic, AWS Bedrock, and OpenAI-compatible (Docker Model Runner). The test is @disabled by default and not part of CI.
Add Ollama provider configs (qwen3.5, llama3.1:8b) with OLLAMA_URL env var. Add .disabled() toggle on ProviderConfig and a modelFilters allowlist for quickly focusing test runs on specific models without commenting code.
Move DocumentToolCallResultsIT to io.camunda.connector.e2e.agenticai.aiagent package, rename PDF fixtures to descriptive names (project-launch.pdf, headcount-report.pdf, author-info.pdf) under document-tool-call-results/ directory, and drop the cpt- prefix from the BPMN file.
Move document extraction off the raw-Map content tree and into the GatewayToolHandler SPI. Each handler now exposes extractDocuments and walks its own typed content (sealed-type switch); the generic content tree walker stays as the default fallback for plain BPMN tools and handlers that return raw maps. Removes the constraint that gateway handlers must keep raw Map content solely so the instanceof-based walker can find Documents inside them. * New ContentTreeDocumentWalker: public utility extracted from the old ToolCallResultDocumentExtractor walker. Public so third-party handlers whose typed content embeds raw subtrees can reuse it. * GatewayToolHandler.extractDocuments(ToolCallResult): default delegates to ContentTreeDocumentWalker, override to walk a typed structure. * GatewayToolHandlerRegistry.extractDocuments routes per-result to the responsible handler, falling back to the walker. * ToolCallResultDocumentExtractor becomes a thin coordinator that iterates ToolCallResults and calls the registry; constructor now takes the registry. * MCP handler restores typed McpClientCallToolResult content and walks McpDocumentContent and McpEmbeddedResourceContent.BlobDocumentResource. Drops the getRawMcpContent workaround. * A2A handler restores typed A2aSendMessageResult content and walks A2aMessage.contents, A2aTask.artifacts, and A2aTask.history (recursive). Drops the raw-content preservation comment. * ADR-004 updated: replaces the "must preserve raw content" subsection with a "Per-handler document extraction" subsection. * Tests split into ContentTreeDocumentWalkerTest (walker behaviour) and ToolCallResultDocumentExtractorTest (registry routing). Per-variant unit coverage added on McpClientGatewayToolHandlerTest and A2aGatewayToolHandlerTest. https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…utility Two follow-up adjustments to the per-handler document extraction design: 1. ToolCallResultDocumentExtractor is the routing entrypoint. It asks GatewayToolHandlerRegistry.handlerForToolDefinition(name) to find a managing handler and delegates if one exists; otherwise it walks the raw content tree itself. The fallback no longer lives inside the registry — gateway handlers contribute extraction for the results they manage; the generic walker handles everything else. Drops GatewayToolHandlerRegistry.extractDocuments. 2. ContentTreeDocumentWalker is a fully static utility (private ctor, static methods). The previous singleton INSTANCE field was the worst of both worlds — the walker is stateless, has no dependencies, and the SPI default in GatewayToolHandler needs static-style access since interface defaults can't be DI'd. Tests use the real walker directly. https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…ndlers
The previous refactor commit changed the shape of the transformed
ToolCallResult.content() in both gateway handlers more than necessary:
* MCP set content to the full McpClientCallToolResult wrapper
({name, content[], isError}) instead of just the content list as it
was pre-workaround (List<McpContent>). Reverted to passing
callToolResult.content() — same shape the LLM saw before commit
85c6617 introduced the raw-Map workaround. McpClientGatewayToolHandler
.extractDocuments now walks List<McpContent> from content().
* A2A had no behavioural change pre-vs-post-workaround beyond the
variable naming and builder usage style — the previous commit
introduced cosmetic churn. Restored the original method shape with
the sendMessageResult variable and explicit toolCallResultBuilder.
Tests updated to match the restored List<McpContent> content shape.
https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…ss review nits Review feedback follow-up: * GatewayToolHandler javadoc: drop stale `ContentTreeDocumentWalker#INSTANCE` link, point at the static `extractDocumentsFromContent` method. * ContentTreeDocumentWalkerTest: static-import the walker method (less noise) and convert the scalar-content test to a parameterised one that also covers null input. * L4J E2E tests (4 sites): drop the regex-based document short-id extraction. `AiAgentTestFixtures.readDocumentReference(String)` parses the tool result JSON, asserts the camunda document discriminator, and returns a typed DocumentReferenceFields record with storeId / documentId / contentType / fileName + shortId(). Tests now assert the parsed contentType against the parameterised mimeType and read shortId from the parsed record. * docs/reference/ai-agent.md §19: add `extractDocuments` to the GatewayToolHandler interface listing and a new "Document Extraction from Tool Call Results" subsection describing the extractor → registry → handler routing with ContentTreeDocumentWalker as the fallback. * docs/reference/mcp.md §7 and a2a.md §7: tool call execution flows now document the typed transformed content shape and the per-handler extractDocuments step. Regression test: * New GatewayToolResultDocumentSerializationTest pins the JSON wire format produced by the connector ObjectMapper for Documents nested inside McpClientCallToolResult and A2aSendMessageResult — they must serialize as `camunda.document.type` references via DocumentSerializer, never as base64 payloads or raw DocumentReference POJOs. Covers root McpDocumentContent, embedded BlobDocumentResource, A2aMessage contents, A2aTask artifacts and recursive history, plus an explicit base64-must-not-appear assertion. https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…onTest The SDK already covers Camunda document serialization fidelity in DocumentSerializerTest (connector-runtime/jackson-datatype-document). Once a Document field is reachable by Jackson and the document module is registered, DocumentSerializer is invoked regardless of the parent type — neither McpDocumentContent, BlobDocumentResource, nor A2A DocumentContent overrides this. The test added in 843e2e2 only re-ran the SDK serializer through different wrapper objects and would have failed in lockstep with the SDK tests, providing no additional signal. It also mixed MCP and A2A concerns in one test class, which should have been split per handler if anything. https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…n, stale ADR reference * AiAgentTestFixtures.findFirstCamundaDocumentNode: replace deprecated JsonNode.fields() with JsonNode.properties() (CodeQL: deprecated method invocation). * ADR-004: the "Per-handler document extraction" subsection still pointed at ContentTreeDocumentWalker.INSTANCE. The walker is now a static utility — point readers at extractDocumentsFromContent(...) instead. The Java javadoc on GatewayToolHandler was already fixed; this brings the ADR in line. https://claude.ai/code/session_01SM8HzedSAVWqnDaEKrmCpR
…r + L4J adapter combo
Moves the synthetic UserMessage assertion logic, document reference parsing, and the content-block helper out of AiAgentTestFixtures and the per-test duplicates into a dedicated ToolCallResultDocumentAssertions class. assertExtractedDocumentsUserMessage takes ExtractedDocument varargs so it can also assert UserMessages carrying multiple tool call results, and builds expected XML tags via the production DocumentXmlTag. Parsed references use the production DocumentReferenceModel.CamundaDocumentReferenceModel, keeping the test in lockstep with the on-the-wire format.
…xtractor Tool call result name is set on every production path (MCP/A2A handlers, forCancelledToolCall, BPMN _meta.name) and events are filtered out before the extractor runs. Propagating null produces a tag without the tool-name attribute via DocumentXmlTag.appendAttribute, which is the right shape for malformed inputs anyway.
…lId, toolName) Aligns DocumentXmlTag's record fields and factory with the rest of the new code (ToolCallDocuments, ExtractedDocument), which all put toolCallId before toolName/toolCallName. The XML attribute order in toXml() is unchanged.
Adds a single summary log per extractor invocation (results processed, results with documents, total document count), plus targeted DEBUG logs in the MCP and A2A gateway handlers when their extractDocuments path silently returns an empty list because the content has an unexpected shape.
97d21e3 to
5707929
Compare
Refresh ADR-005 §"Tool Call Result Routing" and the Phase E3 section of the implementation plan with the agreed design: - single decision point at the ChatClient SPI boundary; the strategy is a pure function `apply(ChatRequest, ModelCapabilities) → (ChatRequest, List<UserMessage>)` that walks the request once and routes each document it finds — no extract-then-restore double pass - tool-result-message documents are routed against `capabilities.toolResultModalities()`: inline-supported docs stay on `ToolCallResult.contentBlocks`; the rest fall back to a synthetic `UserMessage` (existing PR #6999 shape, `METADATA_TOOL_CALL_DOCUMENTS`) - user-message and event-message documents are validated against `capabilities.userMessageModalities()`: supported docs stay inline, unsupported docs fail loud (`ConnectorException`) - synthetic UMs land in `RuntimeMemory` inside `ChatClient.chat(...)` so the persisted `agentContext.conversation` matches the wire exactly — replay across iterations stays deterministic - `AgentMessagesHandlerImpl` drops the `documentExtractor` field, the `createDocumentMessageForToolResults` private method, and the line-134 call site — strategy owns extraction - TODO captured: revisit `ChatClient` ↔ `BaseAgentRequestHandler` boundary post-Phase E; ChatClient now owns three responsibilities Also bumps the bundled `anthropic-messages` `tool-result` modalities from `[text, image]` to `[text, image, pdf]` — Anthropic's `ToolResultBlockParam.Content.Block.ofDocument(...)` SDK factory confirms PDF support in tool results. `BundledCapabilityMatrixTest` adjusted to match.
…on (Phase E3+E4)
Single-pass routing of every document found in a ChatRequest at the
ChatClient boundary, plus the per-impl native multimodal emission paths
that consume the routed tool-result `contentBlocks`. Combined into one
phase because the bundled capability matrix declares modalities that the
impls have to actually emit — shipping E3 alone would silently drop
inline-routed documents on the floor between phases.
ToolCallResultStrategy (`framework/strategy/`):
- pure function `apply(ChatRequest, ModelCapabilities) -> (ChatRequest, List<UserMessage>)`
- single walk over the request:
- tool-result documents -> `toolResultModalities`: inline-supported docs go onto
`ToolCallResult.contentBlocks`; the rest fall back to a synthetic UserMessage
(PR #6999 shape, `METADATA_TOOL_CALL_DOCUMENTS=true`)
- user-message and event-message documents -> `userMessageModalities`: supported
docs stay inline; unsupported docs throw `ConnectorException` (no synthesis
fallback for user messages, mirroring L4J `DocumentConversionException`)
ChatClientImpl runs the strategy after capability resolution and persists
the synthetic context messages into `RuntimeMemory` immediately after the
anchor `ToolCallResultMessage` so the persisted `agentContext.conversation`
matches what the model saw on the wire (deterministic replay). The
clear()+addMessages() insertion dance is flagged with a TODO to revisit
once the ChatClient<->BARQ boundary settles after Phase E.
AgentMessagesHandlerImpl drops `ToolCallResultDocumentExtractor` from its
constructor and no longer creates `documentMessage` itself — extraction
is now exclusively the strategy's responsibility. The PR #6999 walker,
per-handler `extractDocuments` hook, XML correlation tag, and window-count
exclusion are reused unchanged.
Native multimodal emission (image + PDF only):
- AnthropicMessagesChatModelApi: `ContentBlockParam.ofImage(...)` + `ofDocument(...)`
on user messages; `ToolResultBlockParam.Content.Block.ofImage(...)` + `ofDocument(...)`
with `contentOfBlocks(...)` on tool results when contentBlocks is populated.
Adds `ObjectMapper` to the impl + factory + Spring config for JSON-serialised
inline tool-result text bodies.
- OpenAiResponsesChatModelApi: `ResponseInputContent.ofInputImage(...)` /
`ofInputFile(...)` on user messages via
`EasyInputMessage.contentOfResponseInputMessageContentList(...)`;
`ResponseFunctionCallOutputItem.ofInputImage(...)` / `ofInputFile(...)` on
tool results via `outputOfResponseFunctionCallOutputItemList(...)`.
- OpenAiChatCompletionsChatModelApi: `ChatCompletionContentPart.ofImageUrl(...)`
+ `ofFile(...)` on user messages via `addUserMessageOfArrayOfContentParts(...)`.
Tool messages stay text-only (SDK enforces `ChatCompletionContentPartText`-only).
Bundled capability matrix: anthropic-messages tool-result modality bumped
to `[text, image, pdf]` (verified via `ToolResultBlockParam.Content.Block.ofDocument`);
openai-completions user-message bumped to `[text, image, pdf]`;
openai-responses user-message bumped to `[text, image, pdf]`.
Tests: 1380 unit tests + 3 wire-format e2e tests pass. New
`ToolCallResultStrategyImplTest` (8 cases) covers inline routing, fallback
synthesis, single-pass split, ordering, user-message validation, and the
no-document no-op path. `AgentMessagesHandlerTest` synthesis assertions
migrated to the strategy test; new test pins that `addUserMessages` no
longer emits a synthetic UserMessage.
nikonovd
left a comment
There was a problem hiding this comment.
Please check my last comments 🍊
| <artifactId>camunda-process-test-spring</artifactId> | ||
| <version>${version.camunda}</version> | ||
| </dependency> | ||
| <dependency> |
There was a problem hiding this comment.
🔧 this is already a transitive dependency of camunda-process-test-spring
|
|
||
| @Override | ||
| public List<Document> extractDocuments(ToolCallResult toolCallResult) { | ||
| if (!(toolCallResult.content() instanceof A2aSendMessageResult result)) { |
There was a problem hiding this comment.
⛏️ similar as we discussed in a previous PR: Carrying a diamond type on the base interface would avoid the need for a type check. Feel free to ignore 😄
|
|
||
| var documents = handler.extractDocuments(toolCallResult); | ||
|
|
||
| assertThat(documents).containsExactly(document); |
There was a problem hiding this comment.
❓ does this test work reliably given the fact that a mock is supplied here?
| var documents = handler.extractDocuments(toolCallResult); | ||
|
|
||
| // artifacts before history | ||
| assertThat(documents).containsExactly(artifactDoc, historyDoc); |
| } | ||
|
|
||
| @Test | ||
| void usesContentTreeWalkerWhenNoHandlerManagesTheToolCall() { |
There was a problem hiding this comment.
⛏️ the test name suggests we use the default content walker fallback, but we actually are not verifying it. Maybe we could use a spy here?
| assertThat(e.toolCallId()).isNull(); | ||
| assertThat(e.toolCallName()).isNull(); |
There was a problem hiding this comment.
❓ i guess that would contain event subprocess tool calls, right?
could it potentially contain other documents as well and would this design have any undesired side effects?
| void integrationWithRealRegistry_fallsBackToWalkerWhenNoHandlerMatches() { | ||
| final var realExtractor = | ||
| new ToolCallResultDocumentExtractor(new GatewayToolHandlerRegistryImpl(List.of())); | ||
|
|
||
| final var doc = createDocument("hello", "text/plain", "test.txt"); | ||
| final var result = | ||
| ToolCallResult.builder() | ||
| .id("call_1") | ||
| .name("plain_bpmn_tool") | ||
| .content(Map.of("attachment", doc)) | ||
| .build(); | ||
|
|
||
| final var extracted = realExtractor.extractDocuments(List.of(result)); | ||
|
|
||
| assertThat(extracted).hasSize(1); | ||
| assertThat(extracted.getFirst().documents()).containsExactly(doc); | ||
| } | ||
|
|
||
| @Test | ||
| void integrationWithRealRegistry_routesToManagingHandler(@Mock GatewayToolHandler handler) { | ||
| final var doc = createDocument("typed", "text/plain", "typed.txt"); | ||
| final var typedContent = new TypedHandlerContent(doc); | ||
|
|
||
| when(handler.type()).thenReturn("typed"); | ||
| when(handler.isGatewayManaged("typed_tool")).thenReturn(true); | ||
| when(handler.extractDocuments(any(ToolCallResult.class))).thenReturn(List.of(doc)); | ||
|
|
||
| final var realExtractor = | ||
| new ToolCallResultDocumentExtractor(new GatewayToolHandlerRegistryImpl(List.of(handler))); | ||
|
|
||
| final var result = | ||
| ToolCallResult.builder().id("call_1").name("typed_tool").content(typedContent).build(); | ||
|
|
||
| final var extracted = realExtractor.extractDocuments(List.of(result)); | ||
|
|
||
| assertThat(extracted).hasSize(1); | ||
| assertThat(extracted.getFirst().documents()).containsExactly(doc); | ||
| verify(handler).extractDocuments(result); | ||
| } | ||
|
|
||
| @Test | ||
| void integrationWithRealRegistry_doesNotConsultHandlerForUnmanagedTool( | ||
| @Mock GatewayToolHandler handler) { | ||
| when(handler.type()).thenReturn("typed"); | ||
| when(handler.isGatewayManaged("plain_tool")).thenReturn(false); | ||
|
|
||
| final var realExtractor = | ||
| new ToolCallResultDocumentExtractor(new GatewayToolHandlerRegistryImpl(List.of(handler))); | ||
|
|
||
| final var doc = createDocument("hello", "text/plain", "test.txt"); | ||
| final var result = | ||
| ToolCallResult.builder() | ||
| .id("call_1") | ||
| .name("plain_tool") | ||
| .content(Map.of("attachment", doc)) | ||
| .build(); | ||
|
|
||
| final var extracted = realExtractor.extractDocuments(List.of(result)); | ||
|
|
||
| assertThat(extracted).hasSize(1); | ||
| assertThat(extracted.getFirst().documents()).containsExactly(doc); | ||
| verify(handler, never()).extractDocuments(any()); | ||
| } |
There was a problem hiding this comment.
🔧 we should extract those into a nested test class to avoid prefixing and underscoring, WDYT?
| } | ||
|
|
||
| @Test | ||
| void generatesTagWithoutToolAndCallId() { |
There was a problem hiding this comment.
❓ why could that potentially happen?
| class DocumentXmlTagTest { | ||
|
|
||
| @Nested | ||
| class ToXml { |
There was a problem hiding this comment.
❓ why do we need a nested class here if it's still a flat test structure?
| <groupId>com.google.guava</groupId> | ||
| <artifactId>guava</artifactId> |
There was a problem hiding this comment.
❓ why is this library introduced?
Description
Tool call results containing Camunda Documents were serialized as base64 strings embedded in JSON via a custom
DocumentToContentSerializer. Most LLMs cannot properly interpret this format.This PR extracts documents from tool call results into a synthetic
UserMessagewithDocumentContentblocks appended after theToolCallResultMessage. The tool result text retains document references (serialized by the standardDocumentSerializer) so the model can correlate references with actual content.Document message format
A single
UserMessage(metadata:toolCallDocuments=true) is appended with:"Documents extracted from tool call results:"<document tool-name="…" tool-call-id="…" document-short-id="…" filename="…" />followed by itsDocumentContentblockdocument-short-idis the first UUID segment of thedocumentId, sufficient for in-conversation correlationEvent messages containing documents receive the same
<document>XML labels for consistency (without tool-name/tool-call-id attributes).Document extraction architecture
Document extraction is driven by
ToolCallResultDocumentExtractor, called fromAgentMessagesHandlerImplafter theToolCallResultMessageis built. For each result it asksGatewayToolHandlerRegistry.handlerForToolDefinition(toolName)for the responsible handler:GatewayToolHandler.extractDocuments(ToolCallResult)— handlers walk their own typed content (sealed-typeswitchoverMcpContent/A2aSendMessageResult);ContentTreeDocumentWalker.extractDocumentsFromContent(...), a stateless static utility that recursively walksMap,Collection,Object[]andDocumentnodes (used for plain BPMN tools whose content is a raw FEEL tree).The default
GatewayToolHandler.extractDocumentsimplementation also delegates to the walker, so third-party handlers that return raw maps work without overriding.ContentTreeDocumentWalkeris public so handlers whose typed content embeds raw user-generated subtrees can call it directly on those subtrees.Gateway tool handlers
Gateway handlers (MCP, A2A) restore typed content as
ToolCallResult.content()(the pre-PR shape:List<McpContent>for MCP,A2aSendMessageResultfor A2A) and overrideextractDocumentsto walk that typed structure:McpDocumentContentandMcpEmbeddedResourceContent.BlobDocumentResourcevia a sealed-type switch overMcpContent.A2aMessage.contentsat the root, plusA2aTask.artifactsand (recursively)A2aTask.historyto collectDocumentContententries.This removes the earlier requirement that handlers preserve raw
Map/List/Documentcontent trees just so the document extractor'sinstanceofwalk could find nestedDocumentinstances.Message window memory
The synthetic document
UserMessagedoes not count toward themaxMessagescontext window limit and is evicted together with its associatedToolCallResultMessage.Cross-provider viability test
A manual cross-provider integration test (
DocumentToolCallResultsIT,@Disabledby default) validates that real LLM providers can receive and reason about PDF documents extracted from tool call results. It covers single documents, multiple documents, and nested structures. Provider configs: OpenAI, Anthropic, AWS Bedrock, and OpenAI-compatible (Docker Model Runner, Ollama). CPT judge assertions verify the model correctly extracted facts from the PDFs.Deleted
DocumentToContentModule,DocumentToContentResponseModel,DocumentToContentSerializer, and related tests.ADR
See ADR-004: Document Handling in Tool Call Results.
Related issues
closes #7005
Checklist