diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md new file mode 100644 index 000000000..34fb2ee91 --- /dev/null +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md @@ -0,0 +1,94 @@ +# Plan: Add E2E test for non-ergonomic (low-level) tool definition + +## Goal + +Add a failsafe IT test that exercises the **current explicit** `ToolDefinition.create()` / `ToolDefinition.createOverride()` API — the "non-ergonomic" approach — with multiple tools, `ToolSet` with `addCustom`/`addBuiltIn`, `getArgumentsAs()` deserialization into a record, and a tool handler that mutates application state. This establishes baseline test coverage before issue #1682 adds the annotation-driven ergonomic API. + +## Instructions + +Read `java.instructions.md` in my User level Copilot instructions. This session is about Java. + +Use the `new-java-e2e-test-yaml-and-test` skill to create a new failsafe IT test that exercises the non-ergonomic-tool-definition approach to tool definition. + +### What the test must exercise + +The test class should be `LowLevelToolDefinitionIT.java` in `java/src/test/java/com/github/copilot/`. It must demonstrate **all** of the following in a single session: + +1. **`ToolDefinition.create(name, description, schema, handler)`** — define at least two custom tools explicitly with `Map` schemas. +2. **`ToolDefinition.createOverride(name, description, schema, handler)`** — define one tool that overrides a built-in tool. +3. **`invocation.getArgumentsAs(SomeRecord.class)`** — at least one handler must deserialize arguments into a Java record (not `getArguments()` returning raw Map). +4. **`invocation.getArguments()`** — at least one handler must use the raw `Map` accessor. +5. **`ToolSet` with `addCustom("*").addBuiltIn("web_fetch")`** — pass `setAvailableTools(...)` on the `SessionConfig`. +6. **Handler mutates state** — one tool handler should mutate a field on the test class and the test should assert that the field was updated after the response. +7. **Handler returns `CompletableFuture.completedFuture(...)`** — all handlers return completed futures (as is the current pattern). + +### Concrete test design + +#### Snapshot category + +`tools` (reuse the existing category under `test/snapshots/tools/`). + +#### Snapshot file + +`test/snapshots/tools/low_level_tool_definition.yaml` + +#### Java test method name + +`lowLevelToolDefinition` (converts to `low_level_tool_definition` for snapshot lookup). + +#### Tool definitions for the test + +| Tool | Factory | Name | Description | Schema | Handler behavior | +|------|---------|------|-------------|--------|-----------------| +| Set Phase | `create` | `set_current_phase` | "Sets the current phase of the agent" | `{ type: object, properties: { phase: { type: string, enum: [searching, analyzing, done] } }, required: [phase] }` | Deserializes via `getArgumentsAs(PhaseArgs.class)` where `record PhaseArgs(String phase) {}`. Mutates a `currentPhase` field on the test. Returns `"Phase set to " + phase`. | +| Search | `create` | `search_items` | "Search for items by keyword" | `{ type: object, properties: { keyword: { type: string } }, required: [keyword] }` | Uses `getArguments()` raw Map. Returns a fixed string like `"Found: item_alpha, item_beta"`. | +| Override grep | `createOverride` | `grep` | "Custom grep override" | `{ type: object, properties: { query: { type: string } }, required: [query] }` | Uses `getArguments()`. Returns `"CUSTOM_GREP: " + query`. | + +#### Prompt + +``` +First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results. +``` + +#### YAML snapshot structure + +Two conversations (one for the tool-call turn, one for the final response turn after tool results are provided): + +- **Conversation 1** (tool call turn): system `${system}` + user prompt → assistant with `tool_calls` for `set_current_phase` and `search_items`. +- **Conversation 2** (final response turn): full history including tool results → assistant final content mentioning "analyzing", "item_alpha", "item_beta". + +Study the existing snapshot files in `test/snapshots/tools/` carefully. In particular, study the snapshot file for the `testInvokesCustomTool` test in `ToolsTest.java` (`test/snapshots/tools/invokes_custom_tool.yaml`). It shows how tool call and tool result conversations are structured. Additionally, study `test/snapshots/tools/should_execute_multiple_custom_tools_in_parallel_single_turn.yaml` which shows multiple parallel tool calls in a single turn. + +#### Assertions + +1. `response` is not null. +2. Response content contains `"analyzing"` (confirming the phase tool was called). +3. Response content contains `"item_alpha"` or `"item_beta"` (confirming search tool was called). +4. The `currentPhase` field on the test class equals `"analyzing"` (confirming handler mutated state). + +#### Session config + +```java +new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of(setPhaseTool, searchTool, grepOverrideTool)) +``` + +### Step-by-step execution + +1. Create the YAML snapshot file at `test/snapshots/tools/low_level_tool_definition.yaml`. +2. Create the Java IT file at `java/src/test/java/com/github/copilot/LowLevelToolDefinitionIT.java`. +3. Run `mvn spotless:apply` from the `java/` directory (using the background + log pattern from `java.instructions.md`). +4. Run the test in isolation: + ```sh + cd java + LOG="$(date +%Y%m%d-%H%M)-job-logs.txt" && mvn failsafe:integration-test -Dit.test="LowLevelToolDefinitionIT#lowLevelToolDefinition" -Denforcer.skip=true > "$LOG" 2>&1 & tail -f "$LOG" + ``` +5. Fix any failures. Iterate until the isolated test passes cleanly. +6. Run the full build: + ```sh + cd java + LOG="$(date +%Y%m%d-%H%M)-job-logs.txt" && mvn clean verify > "$LOG" 2>&1 & tail -f "$LOG" + ``` +7. Fix any failures from the full build. Iterate until `mvn clean verify` passes cleanly. diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/20260615-prompts.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260615-prompts.md new file mode 100644 index 000000000..30ad01d76 --- /dev/null +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260615-prompts.md @@ -0,0 +1,605 @@ +### 18:28 Prompt + +Consider this C# usage of our API: + +```C# + [DisplayName("set_current_phase")] + [Description("Sets the current phase of the agent. Use this to report progress.")] + private void SetCurrentPhase(Phase phase) + { + Phase = phase; + updateUi(); + } + +... + + Session = await client.CreateSessionAsync(new() + { + OnPermissionRequest = PermissionHandler.ApproveAll, + SystemMessage = systemMessageConfig, + AvailableTools = new ToolSet().AddCustom("*").AddBuiltIn("web_fetch"), + Tools = [ + CopilotTool.DefineTool(SetCurrentPhase), + CopilotTool.DefineTool(ReportIntent, new() { OverridesBuiltInTool = true }), + CopilotTool.DefineTool(database.SearchProperties), + ], + }); +``` + +How would one do this with the Java API? + +### 18:30 Response + +Searched for regex `public static ToolDefinition create` (`**/java/src/main/java/com/github/copilot/rpc/ToolDefinition.java`), 4 results + +Searched for regex `class ToolSet|addCustom|addBuiltIn|AddCustom|AddBuiltIn` (`**/java/src/main/java/com/github/copilot/rpc/ToolSet.java`), 8 results + +Searched for regex `setAvailableTools|setTools|List` (`**/java/src/main/java/com/github/copilot/rpc/SessionConfig.java`), 5 results + +Here's the Java equivalent: + +```java +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.create("set_current_phase", "Sets the current phase of the agent. Use this to report progress.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(PhaseArgs.class).phase(); + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + phase); + }), + ToolDefinition.createOverride("report_intent", "Reports the agent's intent", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> { /* ... */ }), + ToolDefinition.create("search_properties", "Search property listings", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> database.searchProperties(invocation.getArguments())) + ))).get(30, TimeUnit.SECONDS); +``` + +The key differences from the C# version: + +| C# | Java | +|---|---| +| `CopilotTool.DefineTool(SetCurrentPhase)` — infers name, description, and schema from the method's attributes via reflection | `ToolDefinition.create(name, description, schema, handler)` — all explicit | +| `new() { OverridesBuiltInTool = true }` (options object) | `ToolDefinition.createOverride(...)` (dedicated factory) | +| `[DisplayName]` / `[Description]` attributes on the method | No annotation-based discovery; you provide the strings directly | +| Implicit JSON Schema generation from parameter types | You supply the `Map` schema manually | + +The Java SDK does not currently have a reflection/annotation-based `DefineTool` that can inspect a method and auto-generate the name, description, and parameter schema. Everything is explicit. + +### 18:31 Prompt + +Oof. That is very ugly. So you're saying there is no `DefineTool` analog in Java? + +### 18:34 Response + +Searched for regex `defineTool|DefineTool|@Tool|@CopilotTool|annotation.*tool|reflect.*tool|schema.*generat` (`**/java/src/main/java/**`), no results + +Correct — there is no `DefineTool` analog in the Java SDK today. No reflection-based discovery, no annotations, no automatic JSON Schema generation from parameter types. The only API is `ToolDefinition.create(name, description, schema, handler)` where you supply everything manually. + +The C# version gets its ergonomics from `CopilotTool.DefineTool(MethodGroup)` which uses .NET reflection to pull `[DisplayName]`, `[Description]`, and parameter types to auto-generate the schema. The Go SDK has `DefineTool[T, U](name, desc, handler)` which uses generics to derive the schema from the input type `T`. Python has `@define_tool` which introspects the decorated function's signature and docstring. + +Java *could* support something similar — e.g. a `DefineTool.from(instance, "methodName")` that reflects on the method's parameters and annotations, or a generic approach using a record type for args (similar to Go). But that doesn't exist yet. + +### 18:34 Prompt + +Given that `ToolDefinition` is already a record, run with that fact and sketch out what a `DefineTool` analog might look like. + +### 18:38 Response + +Here's a sketch leveraging records as the args type (similar to Go's `DefineTool[T, U]`): + +```java +// ─── User defines args as a record ─────────────────────────────────────────── + +record WeatherArgs( + @Param("City name") String location, + @Param(value = "Temperature unit", required = false) String unit +) {} + +// ─── Usage: one-liner tool definition ──────────────────────────────────────── + +var tool = ToolDefinition.define("get_weather", "Get current weather for a location", + WeatherArgs.class, + (args, invocation) -> { + String weather = fetchWeather(args.location(), args.unit()); + return CompletableFuture.completedFuture(weather); + }); +``` + +The handler receives the **deserialized record** directly — no more `invocation.getArgumentsAs(...)` boilerplate. + +--- + +### Implementation sketch + +```java +// ─── New annotation ────────────────────────────────────────────────────────── + +package com.github.copilot.rpc; + +import java.lang.annotation.*; + +@Target(ElementType.RECORD_COMPONENT) +@Retention(RetentionPolicy.RUNTIME) +public @interface Param { + /** Description of this parameter. */ + String value(); + /** Whether this parameter is required. Default true. */ + boolean required() default true; +} + +// ─── New typed handler functional interface ────────────────────────────────── + +@FunctionalInterface +public interface TypedToolHandler { + CompletableFuture handle(T args, ToolInvocation invocation); +} + +// ─── New factory method on ToolDefinition ──────────────────────────────────── + +public record ToolDefinition(...) { + + // Existing factories... + + /** + * Defines a tool with automatic JSON Schema generation from a record type. + * + *

Record components become the schema's {@code properties}. Use + * {@link Param} to annotate descriptions and optional flags. + * + * @param the args record type + * @param name tool name + * @param description tool description + * @param argsType the record class (e.g. {@code WeatherArgs.class}) + * @param handler receives the deserialized record + invocation context + * @return a fully constructed ToolDefinition with auto-generated schema + */ + public static ToolDefinition define( + String name, + String description, + Class argsType, + TypedToolHandler handler) { + + Map schema = SchemaGenerator.fromRecord(argsType); + + ToolHandler rawHandler = invocation -> { + T args = invocation.getArgumentsAs(argsType); + return handler.handle(args, invocation); + }; + + return new ToolDefinition(name, description, schema, rawHandler, null, null, null); + } +} +``` + +--- + +### Schema generation from record components + +```java +package com.github.copilot.rpc; + +import java.lang.reflect.RecordComponent; +import java.util.*; + +final class SchemaGenerator { + + static Map fromRecord(Class type) { + RecordComponent[] components = type.getRecordComponents(); + Map properties = new LinkedHashMap<>(); + List required = new ArrayList<>(); + + for (RecordComponent comp : components) { + Map prop = new LinkedHashMap<>(); + prop.put("type", jsonType(comp.getType())); + + Param param = comp.getAnnotation(Param.class); + if (param != null) { + prop.put("description", param.value()); + if (param.required()) { + required.add(comp.getName()); + } + } else { + // No annotation → required by default + required.add(comp.getName()); + } + + properties.put(comp.getName(), prop); + } + + Map schema = new LinkedHashMap<>(); + schema.put("type", "object"); + schema.put("properties", properties); + if (!required.isEmpty()) { + schema.put("required", required); + } + return schema; + } + + private static String jsonType(Class type) { + if (type == String.class) return "string"; + if (type == int.class || type == Integer.class + || type == long.class || type == Long.class) return "integer"; + if (type == double.class || type == Double.class + || type == float.class || type == Float.class) return "number"; + if (type == boolean.class || type == Boolean.class) return "boolean"; + if (type.isArray() || List.class.isAssignableFrom(type)) return "array"; + return "object"; + } +} +``` + +--- + +### The BRK206 example rewritten + +```java +record PhaseArgs(@Param("The phase to transition to") Phase phase) {} + +record SearchArgs(@Param("Search query") String query, + @Param(value = "Max results", required = false) Integer limit) {} + +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.define("set_current_phase", + "Sets the current phase of the agent. Use this to report progress.", + PhaseArgs.class, + (args, inv) -> { + this.phase = args.phase(); + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + args.phase()); + }), + ToolDefinition.define("report_intent", + "Reports the agent's intent", + ReportIntentArgs.class, + (args, inv) -> { /* ... */ }) + .withOverridesBuiltInTool(true), // would need a wither + ToolDefinition.define("search_properties", + "Search property listings", + SearchArgs.class, + (args, inv) -> database.searchProperties(args.query(), args.limit())) + ))).get(30, TimeUnit.SECONDS); +``` + +--- + +### Trade-offs + +| Pro | Con | +|-----|-----| +| Schema auto-generated from record components | Requires runtime reflection (one-time, at definition) | +| Type-safe handler — no manual `getArgumentsAs()` | Nested/complex schemas (arrays of objects) need more work | +| `@Param` keeps description co-located with the field | New annotation + new interface = new public API surface | +| Records are already idiomatic for value objects in Java 17+ | Enums in records (like `Phase`) need custom mapping logic | +| Very close to Go's `DefineTool[T, U]` ergonomics | Can't infer name/description from the method itself (unlike C#) | + +The main limitation vs C# is that Java doesn't have method-group-to-delegate conversion, so you can't just pass `this::setCurrentPhase` and have it figure out the name and schema. You still provide the name and description strings — but the schema becomes free. + +### 18:41 Prompt + +I have added the folder `langchain4j-00-main` to this VS Code workspace. + +Look at the design of `./langchain4j-core/src/main/java/dev/langchain4j/agent/tool/Tool.java`. Explain how that design differs from the record based one you proposed. + +### 18:46 Response + +Searched for files matching `**/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/Tool.java`, 1 match + +Read [](file:///home/edburns/workareas/langchain4j-00-main/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/Tool.java#1-1), lines 1 to 50 + +Searched for text `@interface P` (`**/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/P.java`), 1 result + +Searched for regex `class ToolSpecification|record ToolSpecification|String name|String description|JsonObjectSchema` (`**/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/ToolSpecification.java`), 14 results + +Read [](file:///home/edburns/workareas/langchain4j-00-main/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/Tool.java#50-50), lines 50 to 60 + +Searched for regex `toolSpecificationFrom|toolSpecificationsFrom|class ToolSpecifications` (`**/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/ToolSpecifications.java`), 6 results + +Read [](file:///home/edburns/workareas/langchain4j-00-main/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/P.java#1-1), lines 1 to 60 + +Read [](file:///home/edburns/workareas/langchain4j-00-main/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/ToolSpecifications.java#100-100), lines 100 to 160 + +Here's how the two designs differ: + +--- + +### LangChain4j: **Annotation-on-method** approach + +```java +@Tool("Get current weather for a location") +String getWeather(@P("City name") String location, + @P(value = "Unit", required = false) String unit) { + return fetchWeather(location, unit); +} +``` + +The framework discovers tools by scanning an **object's methods** for `@Tool`. It then: +1. Reflects on the `Method` to get the name (from `@Tool(name=...)` or the method name itself) +2. Reflects on each `Parameter` to build the JSON Schema (using `@P` for description/required) +3. Generates a `ToolSpecification` (name + description + schema) automatically +4. At invocation time, dispatches to the annotated method with deserialized args passed as method parameters + +**The user never writes a schema or a handler lambda** — the method IS the handler. + +--- + +### My record-based proposal: **Record-as-schema + explicit lambda** approach + +```java +record WeatherArgs(@Param("City name") String location, + @Param(value = "Unit", required = false) String unit) {} + +var tool = ToolDefinition.define("get_weather", "Get current weather", + WeatherArgs.class, + (args, invocation) -> CompletableFuture.completedFuture(fetchWeather(args.location(), args.unit()))); +``` + +The record defines the schema shape; the lambda is the handler. + +--- + +### Key differences + +| Aspect | LangChain4j (`@Tool` on method) | Record-based proposal | +|--------|--------------------------------|----------------------| +| **Discovery** | Automatic — scan object for `@Tool` methods | Manual — you explicitly call `define(...)` | +| **Tool name** | Derived from method name or `@Tool(name=...)` | Provided as a string argument | +| **Description** | `@Tool("description")` on the method | Provided as a string argument | +| **Parameter schema** | Generated from method parameter types + `@P` annotations | Generated from record components + `@Param` annotations | +| **Handler** | The annotated method itself — no lambda | Explicit `TypedToolHandler` lambda | +| **Arg delivery** | Each parameter is a separate method argument | Single deserialized record object | +| **Return type** | Any (String sent as-is, others serialized to JSON) | Always `CompletableFuture` | +| **Registration** | Implicit (pass object to framework, it finds all `@Tool` methods) | Explicit (build a `List`) | +| **Async** | Synchronous method (framework handles threading) | Explicitly async via `CompletableFuture` | + +--- + +### The fundamental philosophical difference + +LangChain4j treats **the method as the tool**. You annotate existing methods and the framework does everything else. This is closest to C#'s `CopilotTool.DefineTool(SetCurrentPhase)`. + +My record-based proposal treats **the record as the schema** but still requires explicit wiring. It's closer to Go's `DefineTool[T, U]` — you get schema generation for free but still wire things manually. + +If you wanted the LangChain4j-style ergonomics in the Copilot Java SDK, it would look something like: + +```java +class MyTools { + @CopilotTool("Sets the current phase of the agent") + CompletableFuture setCurrentPhase(@Param("The phase") Phase phase) { + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Done"); + } +} + +// Registration: scan an object for @CopilotTool methods +var tools = ToolDefinition.fromObject(myToolsInstance); +``` + +That's the most ergonomic option but requires the most framework machinery (method scanning, reflective invocation, argument marshaling per-parameter rather than per-record). + +### 18:46 Prompt + +Yes, I like the langchain4j approach much better. Let's start by filing an issue in `copilot-sdk`. + +Title + +Java: Ergonomics: Defining tools + +Description + +The current way of defining tools in Java is quite awful. + + +```java +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.create("set_current_phase", "Sets the current phase of the agent. Use this to report progress.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(PhaseArgs.class).phase(); + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + phase); + }), + ToolDefinition.createOverride("report_intent", "Reports the agent's intent", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> { /* ... */ }), + ToolDefinition.create("search_properties", "Search property listings", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> database.searchProperties(invocation.getArguments())) + ))).get(30, TimeUnit.SECONDS); +``` + +Compare that with what you get in langchain4j: + +```java +@Tool("Get current weather for a location") +String getWeather(@P("City name") String location, + @P(value = "Unit", required = false) String unit) { + return fetchWeather(location, unit); +} +``` + +We simply must do better. + +### 18:51 Response + +Ran terminal command: cd /home/edburns/workareas/copilot-sdk && gh issue create \ + --repo github/copilot-sdk \ + --title "Java: Ergonomics: Defining tools" \ + --body 'The current way of defining tools in Java is quite awful. + +```java +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.create("set_current_phase", "Sets the current phase of the agent. Use this to report progress.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(PhaseArgs.class).phase(); + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + phase); + }), + ToolDefinition.createOverride("report_intent", "Reports the agent'''s intent", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> { /* ... */ }), + ToolDefinition.create("search_properties", "Search property listings", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> database.searchProperties(invocation.getArguments())) + ))).get(30, TimeUnit.SECONDS); +``` + +Compare that with what you get in [langchain4j](https://github.com/langchain4j/langchain4j): + +```java +@Tool("Get current weather for a location") +String getWeather(@P("City name") String location, + @P(value = "Unit", required = false) String unit) { + return fetchWeather(location, unit); +} +``` + +We simply must do better.' + +Filed: [#1682](https://github.com/github/copilot-sdk/issues/1682) — Java: Ergonomics: Defining tools. + +### 18:52 Prompt + +Define a new ADR in `java/docs/adr/adr-005-tool-definition.md`. Look at the existing ADRs in there, and follow ADR best practice. Create the new ADR describing the approaches we evaluated and how we ended up choosing the langchain4j approach as shown in https://github.com/langchain4j/langchain4j/blob/main/docs/docs/tutorials/tools.md#high-level-tool-api . + +Be explicit about how we looked at the record approach, the current approach, and chose the langchain4j approach for its Java idiomatic qualities. + +### 19:02 Response + +Searched for files matching `**/java/docs/adr/*.md`, 4 matches + +Read [](file:///home/edburns/workareas/copilot-sdk/java/docs/adr/adr-004-copilotexperimental.md#1-1), lines 1 to 80 + +Now let me fetch the langchain4j tools tutorial for reference: + + + +Fetched https://github.com/langchain4j/langchain4j/blob/main/docs/docs/tutorials/tools.md + +Now I have all the context. Let me create the ADR: + +Created [](file:///home/edburns/workareas/copilot-sdk/java/docs/adr/adr-005-tool-definition.md) + +Created adr-005-tool-definition.md. It documents all three options evaluated (status quo, record-as-schema, annotation-on-method), the rationale for choosing the langchain4j-style approach, and links to the issue and reference implementations. + +Made changes. + + +### 19:02 Prompt + +Is there a way around this? + +----- +- Build requirement: recommend `-parameters` javac flag in documentation. Without it, developers must use `@Param(name="...")` explicitly (same trade-off as langchain4j, Spring, etc.). +----- + +We did an automatic annotation processor for `@CopilotExperimental`. Can't we do something similar here? + +### 19:06 Response + +Yes — this is completely solvable with an annotation processor. + +The key insight: **`javax.lang.model.element.VariableElement.getSimpleName()` always returns the real parameter name at compile time**, regardless of whether `-parameters` is passed to `javac`. The `-parameters` flag only controls whether those names survive into the `.class` bytecode for runtime reflection. An annotation processor sees the source-level names unconditionally. + +### How it would work + +The processor runs at compile time, finds all `@CopilotTool`-annotated methods, and generates a metadata class (or resource) that bakes in the parameter names, types, and descriptions: + +```java +// GENERATED — do not edit +final class MyTools$$CopilotToolMeta { + static List definitions(MyTools instance) { + return List.of( + new ToolDefinition("set_current_phase", + "Sets the current phase of the agent.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "description", "The phase to transition to")), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(Phase.class); + return CompletableFuture.completedFuture( + instance.setCurrentPhase(phase)); + }, null, null, null) + ); + } +} +``` + +At runtime, `ToolDefinition.fromObject(myTools)` just loads the generated `$$CopilotToolMeta` class — zero reflection, zero dependency on `-parameters`. + +### Precedent in the ecosystem + +| Framework | Approach | +|-----------|----------| +| **Micronaut** | Annotation processor generates all DI metadata at compile time — no runtime reflection, no `-parameters` needed | +| **Dagger 2** | Processor generates `_Factory` / `_MembersInjector` classes | +| **MapStruct** | Processor generates mapper implementations from interface method signatures | +| **Our own `@CopilotExperimental`** | Processor walks declared elements via JSR 269 | + +### Advantages over the `-parameters` fallback + +1. **No build configuration requirement** — works out of the box with any `javac` invocation. +2. **No runtime reflection** — the generated code calls the method directly. Faster, GraalVM-native-friendly, no `setAccessible` issues. +3. **Compile-time validation** — the processor can emit errors if `@Param` is missing on a parameter, if a type isn't serializable, if two tools have the same name, etc. (same as we did for `@CopilotExperimental`). +4. **Schema correctness guaranteed** — the schema is computed once from the source AST, not guessed at runtime from erased generics. + +### Trade-off vs. pure runtime reflection (langchain4j default) + +| | Annotation processor | Runtime reflection | +|---|---|---| +| Requires `-parameters`? | No | Yes (or `@P(name=...)`) | +| GraalVM native-image friendly? | Yes | Needs reflection config | +| Compile-time error checking? | Yes | Fails at runtime | +| Extra generated source files? | Yes | None | +| Works without running the processor? | No — but fails loudly at compile time | Yes (degraded) | + +So yes: we can follow the same JSR 269 pattern as `@CopilotExperimental` to eliminate the `-parameters` requirement entirely. The processor sees the real names from source, bakes them into generated code, and the runtime never needs to call `Parameter.getName()`. + +Want me to update the ADR to reflect this? + +### 19:07 Prompt + +Yes please. + +### 19:14 Response + +Read [](file:///home/edburns/workareas/copilot-sdk/java/docs/adr/adr-005-tool-definition.md#195-195), lines 195 to 220 + +Read [](file:///home/edburns/workareas/copilot-sdk/java/docs/adr/adr-005-tool-definition.md#218-218), lines 218 to 240 + +Replacing 15 lines with 10 lines in [](file:///home/edburns/workareas/copilot-sdk/java/docs/adr/adr-005-tool-definition.md) + + +``` + +``` +Done. Added the `@CopilotExperimental` decision as a consequence bullet — the new annotations and utility methods will be gated behind experimental opt-in until the API stabilizes. + +Made changes. diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md new file mode 100644 index 000000000..931f02d08 --- /dev/null +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md @@ -0,0 +1,218 @@ +# DAY: 2026-06-16 + +### 19:31 Prompt + +Consider this C# usage of our API: + +```C# + [DisplayName("set_current_phase")] + [Description("Sets the current phase of the agent. Use this to report progress.")] + private void SetCurrentPhase(Phase phase) + { + Phase = phase; + updateUi(); + } + +... + + Session = await client.CreateSessionAsync(new() + { + OnPermissionRequest = PermissionHandler.ApproveAll, + SystemMessage = systemMessageConfig, + AvailableTools = new ToolSet().AddCustom("*").AddBuiltIn("web_fetch"), + Tools = [ + CopilotTool.DefineTool(SetCurrentPhase), + CopilotTool.DefineTool(ReportIntent, new() { OverridesBuiltInTool = true }), + CopilotTool.DefineTool(database.SearchProperties), + ], + }); +``` + +According to our chat yesterday, here's how you'd do it with our Java API. For discussion call this the non-ergonomic-tool-definition. + +```java +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.create("set_current_phase", "Sets the current phase of the agent. Use this to report progress.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(PhaseArgs.class).phase(); + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + phase); + }), + ToolDefinition.createOverride("report_intent", "Reports the agent's intent", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> { /* ... */ }), + ToolDefinition.create("search_properties", "Search property listings", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> database.searchProperties(invocation.getArguments())) + ))).get(30, TimeUnit.SECONDS); +``` + +Also according to our chat yesterday: + +----- + +The key differences from the C# version: + +| C# | Java | +|---|---| +| `CopilotTool.DefineTool(SetCurrentPhase)` — infers name, description, and schema from the method's attributes via reflection | `ToolDefinition.create(name, description, schema, handler)` — all explicit | +| `new() { OverridesBuiltInTool = true }` (options object) | `ToolDefinition.createOverride(...)` (dedicated factory) | +| `[DisplayName]` / `[Description]` attributes on the method | No annotation-based discovery; you provide the strings directly | +| Implicit JSON Schema generation from parameter types | You supply the `Map` schema manually | + +The Java SDK does not currently have a reflection/annotation-based `DefineTool` that can inspect a method and auto-generate the name, description, and parameter schema. Everything is explicit. + +------ + +We are working on issue https://github.com/github/copilot-sdk/issues/1682 that addresses this gap. + +But first, I want to add test coverage for the way to do it that apparently currently does work. + +Write a plan to `1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md` to instruct a `copilot --yolo` session to: + +- Read `java.instructions.md` in my User level Copilot instructions. This session is about Java. +- Use the `new-java-e2e-test-yaml-and-test` skill to create a new failsafe IT test that exercises the non-ergonomic-tool-definition approach to tool definition. +- Run the test in isolation until you get a clean run of just that test. +- Run all the automated tests according to the `java.instructions.md` and `mvn clean verify`. + + + +### 19:40 Prompt + +Right now, the most important thing we did in this session was learn how to create a net new snapshot YAML, `test/snapshots/system_message_sections/should_use_replaced_identity_section_in_response.yaml` and use it in a net-new failsafe IT test. + +This is an important skill. + +I've never used skills before. + +Help me package up this knowledge as a reusable skill. + +I guess skills have names, right? Well, let's call this `new-java-e2e-test-yaml-and-test`, using whatever naming convention one uses for skills. + +Explain what you're donig step by step. I have a need for this skill right now. + +### 19:53 Prompt + +▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ +◐ The user wants to create a reusable skill for creating new E2E test YAML snapshots and corresponding failsafe IT tests. Let me first understand how skills work in this repo - I can see from the environment that there's already a `java-coding-skill` defined. Let me look at how skills are structured. + +● Let me look at how skills are structured in this repo so I can explain and create one for you. + +● Find existing skills (shell) + │ find /home/edburns/workareas/copilot-sdk/.github/skills -type f | sort + └ 4 lines... + +● List directory .github/skills/java-coding-skill + └ 1 file found + +● List directory .github/skills/rust-coding-skill + └ 2 files found + +● Read SKILL.md + │ .github/skills/rust-coding-skill/SKILL.md + └ L1:30 (30 lines read) + +● Read SKILL.md + │ .github/skills/java-coding-skill/SKILL.md + └ L1:10 (10 lines read) + +● Good. Here's what I'm seeing and what I'll do: + + How Skills Work (Step by Step) + + 1. Skills live in .github/skills// — each skill is a directory containing at minimum a SKILL.md file + 2. SKILL.md has YAML frontmatter with name and description — the description tells Copilot when to invoke the skill + 3. The body of SKILL.md is the instruction content — the "knowledge" that gets loaded when the skill is invoked + 4. Optional extra files (like examples.md) can be included for reference material + + What I'll Create + + For your new-e2e-test-yaml-and-test skill, I'll create: + + - .github/skills/new-e2e-test-yaml-and-test/SKILL.md — the main skill with step-by-step instructions on how to create a new YAML snapshot and corresponding failsafe IT test + - .github/skills/new-e2e-test-yaml-and-test/examples.md — the actual example files we created this session as reference + + Let me create it now: + +✗ Create skill directory (shell) + │ mkdir -p /home/edburns/workareas/copilot-sdk/.github/skills/new-e2e-test-yaml-and-test + └ The user rejected this tool call. User feedback: I named it wrong. The skill should be called + `new-java-e2e-test-yaml-and-test` + +● Create skill directory (shell) + │ mkdir -p /home/edburns/workareas/copilot-sdk/.github/skills/new-java-e2e-test-yaml-and-test + └ 1 line... + +● IDE connection lost: Visual Studio Code - Insiders closed + +● Now let me create the main SKILL.md: + +● Create SKILL.md +222 + └ .github/skills/new-java-e2e-test-yaml-and-test/SKILL.md + +● Now let me add the examples file with the actual artifacts from this session: + +● Get example YAML (shell) + │ cat /home/edburns/workareas/copilot-sdk/test/snapshots/system_message_sections/should_use_replaced_identity_section_in + │ _response.yaml + └ 16 lines... + +● All permissions are now enabled. Tool, path, and URL requests will be automatically approved. + +● Get example test (shell) + │ cat /home/edburns/workareas/copilot-sdk/java/src/test/java/com/github/copilot/SystemMessageSectionsIT.java + └ 193 lines... + +● Create examples.md +178 + └ .github/skills/new-java-e2e-test-yaml-and-test/examples.md + +● Done. Here's what I created: + + .github/skills/new-java-e2e-test-yaml-and-test/ + ├── SKILL.md ← Main instructions (when/how to create new E2E tests) + └── examples.md ← Real examples from this session (simple + tool-call) + + How it works: + + - The description in the frontmatter tells Copilot to invoke this skill whenever you're creating a new Java E2E integration test with a YAML snapshot + - SKILL.md covers: YAML format, proxy matching logic, the Java test template, common pitfalls, and the key constraint that you must handcraft YAMLs (can't record via Java) + - examples.md has two working examples — the simple Botanica test and the multi-turn tool-call test + + The skill is ready to use. Next time you (or Copilot) need to add a new E2E test, the skill will be available as new-java-e2e-test-yaml-and-test. + +### 19:54 Prompt + +Ok, now, in a subsequent session, I will want to use the new skill `new-java-e2e-test-yaml-and-test`. + +How do I instruct the agent to use this skill to do its job? + +Is it something like + +`Use the skill new-java-e2e-test-yaml-and-test to create a new test and yaml that exercises the feature we just created.` + +Or something else? + +### 19:56 Response + + Based on the documentation, here's how skills work: + + You don't need to explicitly invoke skills. Skills are automatically loaded based on their description field in the frontmatter. The agent sees the skill listed in and invokes it when the task matches the description. + + However, you can be explicit if you want. Any of these would work: + + 1. Implicit (recommended): Just describe what you want — the skill's description matches "creating a new Java E2E integration test that requires a new replay proxy YAML snapshot": + Create a new failsafe IT test with a YAML snapshot that exercises the new feature. + 2. Explicit mention: Reference the skill by name to make sure it's invoked: + Use the new-java-e2e-test-yaml-and-test skill to create a test and YAML snapshot for the new feature. + 3. Check available skills: Use /skills in the CLI to see what's loaded, or /env to see the full environment including skills. + + Both approaches work. The agent sees the skill in its available tools and will call it before generating code. The explicit mention is a good safety net if you want to be sure. + diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/add-tests-that-use-low_level_tool_definition.yaml.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/add-tests-that-use-low_level_tool_definition.yaml.md new file mode 100644 index 000000000..b60937cd2 --- /dev/null +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/add-tests-that-use-low_level_tool_definition.yaml.md @@ -0,0 +1,151 @@ +## Goal + +Port the Java integration test behavior from: + +- `java/src/test/java/com/github/copilot/LowLevelToolDefinitionIT.java` +- test method: `lowLevelToolDefinition` +- snapshot: `test/snapshots/tools/low_level_tool_definition.yaml` + +to the following non-Java SDKs, using each language's native E2E test infrastructure: + +1. `dotnet` +2. `go` +3. `nodejs` +4. `python` +5. `rust` + +The new/updated tests in each language must use the **same snapshot scenario** (`tools/low_level_tool_definition`) and validate the same behavior. + +--- + +## Required test behavior to port + +From a test perspective, replicate this behavior: + +1. Define a `set_current_phase` tool that accepts a `phase` argument (string, enum: `["searching", "analyzing", "done"]`) and returns `"Phase set to {phase}"`. The tool handler must also store the phase value in test-local state. +2. Define a `search_items` tool that accepts a `keyword` argument (string) and returns `"Found: item_alpha, item_beta"`. +3. Define a `grep` override tool (using whatever "override" mechanism the language provides) that accepts a `query` argument (string) and returns `"CUSTOM_GREP: {query}"`. +4. Create a session with: + - Permission handler that auto-approves all requests. + - Available tools: all custom tools (`*`) plus built-in `web_fetch`. + - The three tool definitions registered on the session. +5. Send prompt: `"First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results."` +6. Assert: + - The assistant response is non-null/non-empty. + - The response content (case-insensitive) contains `"analyzing"`. + - The response content contains `"item_alpha"` or `"item_beta"`. + - The test-local phase state equals `"analyzing"` (verifying the tool handler was actually invoked). + +Do not weaken these assertions. + +--- + +## Critical execution constraint (must follow exactly) + +Proceed through languages **one-at-a-time** in this exact order: + +1. `dotnet` +2. `go` +3. `nodejs` +4. `python` +5. `rust` + +❌❌ **Do not continue to the next language unless and until the current language gets a clean run with the new test in isolation.** ❌❌ + +Do **not** run full cross-language or full-repo test suites. Let CI/CD handle broad runs. + +--- + +## Snapshot/name mapping requirements + +Ensure each language's test naming/harness maps to: + +- snapshot folder: `tools` +- snapshot file: `low_level_tool_definition.yaml` + +Do not create alternate snapshot names for this scenario. + +--- + +## Per-language isolated run commands + +Use these commands for isolated validation while iterating. + +### 1) dotnet + +Implement in dotnet E2E tests (preferred: new `LowLevelToolDefinitionE2ETests` class or add to existing `ToolsE2ETests` class using snapshot category `tools`, test method `Low_Level_Tool_Definition`). + +Isolated run: + +```bash +cd dotnet && dotnet test test/GitHub.Copilot.SDK.Test.csproj --filter "FullyQualifiedName~Low_Level_Tool_Definition" +``` + +### 2) go + +Implement in Go E2E tests with snapshot mapping to `tools/low_level_tool_definition` (preferred: add to existing `go/internal/e2e/tools_e2e_test.go` or create new file, subtest name exactly `low_level_tool_definition`). + +Isolated run: + +```bash +cd go && go test ./internal/e2e -run 'TestToolsE2E/low_level_tool_definition$' -count=1 +``` + +### 3) nodejs + +Implement in Node E2E Vitest (preferred: add to existing `nodejs/test/e2e/tools.e2e.test.ts` or create new file, test name mapping to `low_level_tool_definition`). + +Isolated run: + +```bash +cd nodejs && npm test -- test/e2e/tools.e2e.test.ts -t "low_level_tool_definition" +``` + +### 4) python + +Implement in Python E2E pytest (preferred: add to existing `python/e2e/test_tools_e2e.py` or create new file, test function `test_low_level_tool_definition`). + +Isolated run: + +```bash +cd python && uv run pytest e2e/test_tools_e2e.py::test_low_level_tool_definition +``` + +### 5) rust + +Implement in Rust E2E tests (preferred: add to existing `rust/tests/e2e/tools.rs`; use `with_e2e_context("tools", "low_level_tool_definition", ...)`). + +Isolated run: + +```bash +cd rust && cargo test --features test-support --test e2e tools::low_level_tool_definition -- --exact +``` + +--- + +## Implementation notes + +1. Reuse existing per-language E2E harness helpers and style conventions. +2. Keep changes scoped to test code and required wiring. +3. Do not hand-edit generated code. +4. ❌❌❌ DO NOT CHANGE ANY non-test CODE.❌❌❌ +5. ✅✅Put the test in the "right place" for each language. That means put it "near" any similar existing tests. The existing tools E2E test files are: + - `dotnet/test/E2E/ToolsE2ETests.cs` + - `go/internal/e2e/tools_e2e_test.go` + - `nodejs/test/e2e/tools.e2e.test.ts` + - `python/e2e/test_tools_e2e.py` + - `rust/tests/e2e/tools.rs` + Put the new test near those. ✅✅ +6. The snapshot `test/snapshots/tools/low_level_tool_definition.yaml` involves **two conversations**: one where tool calls are made without prior tool results, and one full round-trip (tool calls → tool results → final assistant message). Each language's replay proxy handles this; just ensure the test sends the right prompt and processes tool invocations correctly. +7. The `grep` override tool uses whatever "tool override" mechanism exists in each language (e.g., `ToolDefinition.createOverride` in Java, or the equivalent in each SDK). If a language has no override concept, define it as a regular custom tool named `grep`. + +--- + +## Deliverable + +When done, provide: + +1. files changed per language, +2. isolated command used per language, +3. pass/fail result per language (must be passing before moving to next), +4. any blockers (if any language cannot be completed). diff --git a/nodejs/test/e2e/session_lifecycle.e2e.test.ts b/nodejs/test/e2e/session_lifecycle.e2e.test.ts index fae878273..0cb570ceb 100644 --- a/nodejs/test/e2e/session_lifecycle.e2e.test.ts +++ b/nodejs/test/e2e/session_lifecycle.e2e.test.ts @@ -32,7 +32,7 @@ describe("Session Lifecycle", async () => { // Sessions must have activity to be persisted to disk await session1.sendAndWait({ prompt: "Say hello" }); - await session2.sendAndWait({ prompt: "Say world" }); + await session2.sendAndWait({ prompt: "Say hi" }); // Poll until both sessions are visible on disk instead of a hard 500ms wait. await waitFor(async () => { diff --git a/test/snapshots/session_lifecycle/should_list_created_sessions_after_sending_a_message.yaml b/test/snapshots/session_lifecycle/should_list_created_sessions_after_sending_a_message.yaml index 274ab7d2f..779cffb03 100644 --- a/test/snapshots/session_lifecycle/should_list_created_sessions_after_sending_a_message.yaml +++ b/test/snapshots/session_lifecycle/should_list_created_sessions_after_sending_a_message.yaml @@ -13,6 +13,6 @@ conversations: - role: system content: ${system} - role: user - content: Say world + content: Say hi - role: assistant - content: world + content: Hi! I'm GitHub Copilot CLI, ready to help with your software engineering tasks. What would you like to work on?