Skip to content

Conversation

@bedrin
Copy link
Contributor

@bedrin bedrin commented Dec 4, 2025

Overview

This PR adds support for OpenAI's new Responses API to the OpenAiApi class, providing low-level access to OpenAI's latest agentic capabilities. The Responses API represents OpenAI's unified interface for building agent-like applications with built-in tools, multi-turn conversations, and enhanced reasoning capabilities.

Important: This PR adds support at the low-level API layer only (OpenAiApi class). It does not integrate with the high-level ChatModel abstractions. The Responses API appears to be a stateful, standalone application (OpenAI's latest agentic attempt) rather than a traditional chat model. It doesn't fit the existing ChatModel abstractions and isn't easily integrated as another chat-model provider. It represents a new agentic category entirely.

Related Issues

Changes

1. Core API Support (OpenAiApi.java)

Added DTOs

Request DTO - ResponseRequest:

  • Comprehensive request object with 24 parameters
  • Parameters include: model, input, instructions, temperature, tools, reasoning, conversation, previousResponseId, etc.
  • Supports all Responses API features: reasoning models, built-in tools, structured outputs, multi-turn conversations
  • Includes nested records: TextConfig, TextFormat, ReasoningConfig

Response DTO - Response:

  • Complete response structure with id, status, model, output, usage, etc.
  • Nested records: OutputItem, ContentItem, ReasoningDetails, ResponseError, IncompleteDetails
  • Supports multiple output types: messages, reasoning, tool calls

Streaming DTO - ResponseStreamEvent:

  • Event-based streaming support
  • Includes: type, sequenceNumber, response, delta, text, etc.
  • Enables real-time processing of responses

Added Methods

  • responseEntity(ResponseRequest) - Synchronous response creation
  • responseEntity(ResponseRequest, HttpHeaders) - Synchronous with custom headers
  • responseStream(ResponseRequest) - Streaming response creation
  • responseStream(ResponseRequest, HttpHeaders) - Streaming with custom headers

Added Configuration

  • responsesPath field (default: /v1/responses)
  • Builder support for responses path configuration
  • Updated constructors to include responses path

2. Autoconfiguration Support

OpenAiChatProperties.java

  • Added responsesPath property with default value /v1/responses
  • Added getter/setter methods
  • Follows same pattern as completionsPath and embeddingsPath

OpenAiChatAutoConfiguration.java

  • Updated openAiApi() bean to include .responsesPath(chatProperties.getResponsesPath())
  • Enables Spring Boot property configuration

OpenAiEmbeddingAutoConfiguration.java

  • Updated openAiApi() method to include responses path
  • Uses default constant for consistency

3. Integration Tests (OpenAiApiIT.java)

Added 4 comprehensive integration tests:

  1. responseEntity() - Basic synchronous response

    • Tests simple request/response flow
    • Validates response structure and content
    • Cost: ~10-20 tokens
  2. responseStream() - Streaming responses

    • Tests event stream processing
    • Validates multiple event types
    • Cost: ~10-20 tokens
  3. responseWithInstructionsAndConfiguration() - Advanced configuration

    • Tests system instructions and parameters
    • Validates parameter echo and content accuracy
    • Cost: ~10-20 tokens
  4. responseWithWebSearchTool() - Built-in web_search tool

    • Demonstrates built-in tool usage (no custom implementation needed)
    • Tests tool execution and response handling
    • Validates output structure with tool calls
    • Cost: ~30-50 tokens

Total estimated cost: ~$0.0002 - $0.0005 per test run

4. Unit Tests (ResponsesApiTest.java)

Added comprehensive unit tests covering:

  • ResponseRequest creation with various parameter combinations
  • Response structure validation
  • ResponseStreamEvent structure validation
  • Convenience constructors

5. Documentation Updates

openai-chat.adoc

  • Added spring.ai.openai.chat.responses-path property documentation
  • Updated Chat Completions API references for clarity
  • Changed note about Responses API availability (now supported via OpenAiApi)

Key Features

Built-in Tools

The Responses API provides tools without custom implementation:

  • web_search - Search the internet (demonstrated in integration test)
  • file_search - Search through uploaded files
  • code_interpreter - Execute Python code
  • computer_use - Interact with computer interfaces
  • Remote MCPs (Model Context Protocol)

Multi-turn Conversations

Native support for stateful conversations:

  • Via previousResponseId parameter
  • Via conversation object/ID

Reasoning Models

Enhanced support for reasoning models (gpt-5, o-series):

  • Configurable reasoning effort levels
  • Access to reasoning content and summaries

Structured Outputs

JSON schema validation via TextConfig:

  • Type-safe structured responses
  • Schema validation with strict mode

Configuration

Default Configuration (Minimal)

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}

Custom Configuration

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        responses-path: /v1/responses  # Can be customized for compatible servers

Usage Examples

Basic Synchronous Request

@Autowired
private OpenAiApi openAiApi;

public void example() {
    var request = new OpenAiApi.ResponseRequest("What is AI?", "gpt-4o");
    ResponseEntity<OpenAiApi.Response> response = openAiApi.responseEntity(request);

    // Extract text from response
    String text = response.getBody()
        .output()
        .stream()
        .filter(item -> "message".equals(item.type()))
        .flatMap(item -> item.content().stream())
        .filter(content -> "output_text".equals(content.type()))
        .map(OpenAiApi.Response.ContentItem::text)
        .findFirst()
        .orElse(null);
}

Streaming Request

var request = new OpenAiApi.ResponseRequest("Tell me a story", "gpt-4o", true);
Flux<OpenAiApi.ResponseStreamEvent> stream = openAiApi.responseStream(request);

stream.subscribe(event -> {
    if ("response.output_text.delta".equals(event.type())) {
        System.out.print(event.delta());
    }
});

Using Built-in Web Search Tool

var webSearchTool = Map.of("type", "web_search");

var request = new OpenAiApi.ResponseRequest(
    "gpt-4o",
    "What is the current weather in San Francisco?",
    null, null, null, null, null,
    List.of(webSearchTool),  // Enable web_search tool
    null, null, false, null, null, null, null, null, null,
    List.of("web_search_call.action.sources"),  // Include search sources
    null, null, null, null, null, null
);

ResponseEntity<OpenAiApi.Response> response = openAiApi.responseEntity(request);

Multi-turn Conversation

// First request
var request1 = new OpenAiApi.ResponseRequest("What is 2+2?", "gpt-4o");
var response1 = openAiApi.responseEntity(request1);
String responseId = response1.getBody().id();

// Follow-up request
var request2 = new OpenAiApi.ResponseRequest(
    "gpt-4o",
    "And what is that number multiplied by 3?",
    null, null, null, null, null, null, null, null,
    false, null, null, null,
    responseId,  // Reference previous response
    null, null, null, null, null, null, null, null, null
);

var response2 = openAiApi.responseEntity(request2);

Design Decisions

Why Low-Level API Only?

The Responses API is fundamentally different from traditional chat models:

  1. Stateful vs Stateless: The Responses API is designed for stateful, multi-turn agent applications, while ChatModel is stateless
  2. Built-in Tools: Responses API provides native tools (web search, file search, etc.) without custom implementation, unlike ChatModel's function calling
  3. Different Abstractions: The output structure (output array with multiple item types) doesn't map cleanly to ChatResponse
  4. Agent-First Design: Represents a new category of agentic applications rather than a traditional chat interface
  5. Future Evolution: OpenAI is positioning this as the future of agent development, separate from chat completions

Implementation Patterns

  1. Follows Existing Conventions: Mirrors chatCompletionEntity and chatCompletionStream patterns
  2. Comprehensive DTOs: All major API fields included for maximum flexibility
  3. Convenience Constructors: Simplified constructors for common use cases
  4. Type Safety: Uses Java records for immutable, type-safe DTOs
  5. Spring Boot Integration: Full support for externalized configuration

Backward Compatibility

Fully backward compatible

  • No changes to existing ChatModel implementations
  • No changes to existing Chat Completions API usage
  • New functionality is additive only
  • Default values match OpenAI standards

Testing

Unit Tests

  • ✅ 5 unit tests in ResponsesApiTest
  • ✅ All existing tests continue to pass
  • ✅ No compilation errors

Integration Tests

  • ✅ 4 new integration tests in OpenAiApiIT
  • ✅ Cover synchronous, streaming, configuration, and built-in tools
  • ✅ Minimal cost (~$0.0002-$0.0005 per run)
  • ✅ Serve as usage examples

Build Verification

  • spring-ai-openai module builds successfully
  • spring-ai-autoconfigure-model-openai module builds successfully
  • ✅ All existing tests pass

Benefits

  1. Early Access: Enables developers to use OpenAI's latest agentic capabilities
  2. Built-in Tools: Simplifies integration with web search, file search, etc.
  3. Future-Ready: Positions Spring AI for OpenAI's agent-first direction
  4. Flexible: Low-level API allows custom abstractions to be built on top
  5. Well-Documented: Comprehensive tests serve as usage examples
  6. Cost-Efficient: Integration tests designed to minimize API costs

Future Enhancements

Potential future additions (not in this PR):

  1. Higher-level abstractions if patterns emerge
  2. Conversation management utilities
  3. Response accumulator helpers for streaming
  4. Observability support for Responses API calls
  5. Integration with Spring AI's advisor pattern (if applicable)

References


Note: This PR intentionally does not integrate the Responses API with the high-level ChatModel abstractions. The Responses API represents a fundamentally different paradigm (stateful agents vs stateless chat) that doesn't fit the existing abstractions. This low-level API access allows the Spring AI community to experiment and potentially develop appropriate higher-level abstractions in the future.

## Overview

This PR adds support for OpenAI's new **Responses API** to the `OpenAiApi` class, providing low-level access to OpenAI's latest agentic capabilities. The Responses API represents OpenAI's unified interface for building agent-like applications with built-in tools, multi-turn conversations, and enhanced reasoning capabilities.

**Important**: This PR adds support at the **low-level API layer only** (`OpenAiApi` class). It does not integrate with the high-level `ChatModel` abstractions. The Responses API appears to be a stateful, standalone application (OpenAI's latest agentic attempt) rather than a traditional chat model. It doesn't fit the existing `ChatModel` abstractions and isn't easily integrated as another chat-model provider. It represents a new agentic category entirely.

## Related Issues

- Closes spring-projects#4221 - Support for OpenAI Responses API
- Related to spring-projects#2962 - Enhanced reasoning model support
- Related to spring-projects#3022 - Multi-turn conversation handling

## Changes

### 1. Core API Support (`OpenAiApi.java`)

#### Added DTOs

**Request DTO - `ResponseRequest`**:
- Comprehensive request object with 24 parameters
- Parameters include: `model`, `input`, `instructions`, `temperature`, `tools`, `reasoning`, `conversation`, `previousResponseId`, etc.
- Supports all Responses API features: reasoning models, built-in tools, structured outputs, multi-turn conversations
- Includes nested records: `TextConfig`, `TextFormat`, `ReasoningConfig`

**Response DTO - `Response`**:
- Complete response structure with `id`, `status`, `model`, `output`, `usage`, etc.
- Nested records: `OutputItem`, `ContentItem`, `ReasoningDetails`, `ResponseError`, `IncompleteDetails`
- Supports multiple output types: messages, reasoning, tool calls

**Streaming DTO - `ResponseStreamEvent`**:
- Event-based streaming support
- Includes: `type`, `sequenceNumber`, `response`, `delta`, `text`, etc.
- Enables real-time processing of responses

#### Added Methods

- `responseEntity(ResponseRequest)` - Synchronous response creation
- `responseEntity(ResponseRequest, HttpHeaders)` - Synchronous with custom headers
- `responseStream(ResponseRequest)` - Streaming response creation
- `responseStream(ResponseRequest, HttpHeaders)` - Streaming with custom headers

#### Added Configuration

- `responsesPath` field (default: `/v1/responses`)
- Builder support for responses path configuration
- Updated constructors to include responses path

### 2. Autoconfiguration Support

#### `OpenAiChatProperties.java`
- Added `responsesPath` property with default value `/v1/responses`
- Added getter/setter methods
- Follows same pattern as `completionsPath` and `embeddingsPath`

#### `OpenAiChatAutoConfiguration.java`
- Updated `openAiApi()` bean to include `.responsesPath(chatProperties.getResponsesPath())`
- Enables Spring Boot property configuration

#### `OpenAiEmbeddingAutoConfiguration.java`
- Updated `openAiApi()` method to include responses path
- Uses default constant for consistency

### 3. Integration Tests (`OpenAiApiIT.java`)

Added 4 comprehensive integration tests:

1. **`responseEntity()`** - Basic synchronous response
   - Tests simple request/response flow
   - Validates response structure and content
   - Cost: ~10-20 tokens

2. **`responseStream()`** - Streaming responses
   - Tests event stream processing
   - Validates multiple event types
   - Cost: ~10-20 tokens

3. **`responseWithInstructionsAndConfiguration()`** - Advanced configuration
   - Tests system instructions and parameters
   - Validates parameter echo and content accuracy
   - Cost: ~10-20 tokens

4. **`responseWithWebSearchTool()`** - Built-in web_search tool
   - Demonstrates built-in tool usage (no custom implementation needed)
   - Tests tool execution and response handling
   - Validates output structure with tool calls
   - Cost: ~30-50 tokens

**Total estimated cost**: ~$0.0002 - $0.0005 per test run

### 4. Unit Tests (`ResponsesApiTest.java`)

Added comprehensive unit tests covering:
- `ResponseRequest` creation with various parameter combinations
- `Response` structure validation
- `ResponseStreamEvent` structure validation
- Convenience constructors

### 5. Documentation Updates

#### `openai-chat.adoc`
- Added `spring.ai.openai.chat.responses-path` property documentation
- Updated Chat Completions API references for clarity
- Changed note about Responses API availability (now supported via `OpenAiApi`)

## Key Features

### Built-in Tools
The Responses API provides tools without custom implementation:
- **`web_search`** - Search the internet (demonstrated in integration test)
- **`file_search`** - Search through uploaded files
- **`code_interpreter`** - Execute Python code
- **`computer_use`** - Interact with computer interfaces
- Remote MCPs (Model Context Protocol)

### Multi-turn Conversations
Native support for stateful conversations:
- Via `previousResponseId` parameter
- Via `conversation` object/ID

### Reasoning Models
Enhanced support for reasoning models (gpt-5, o-series):
- Configurable reasoning effort levels
- Access to reasoning content and summaries

### Structured Outputs
JSON schema validation via `TextConfig`:
- Type-safe structured responses
- Schema validation with `strict` mode

## Configuration

### Default Configuration (Minimal)
```yaml
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
```

### Custom Configuration
```yaml
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        responses-path: /v1/responses  # Can be customized for compatible servers
```

## Usage Examples

### Basic Synchronous Request
```java
@Autowired
private OpenAiApi openAiApi;

public void example() {
    var request = new OpenAiApi.ResponseRequest("What is AI?", "gpt-4o");
    ResponseEntity<OpenAiApi.Response> response = openAiApi.responseEntity(request);

    // Extract text from response
    String text = response.getBody()
        .output()
        .stream()
        .filter(item -> "message".equals(item.type()))
        .flatMap(item -> item.content().stream())
        .filter(content -> "output_text".equals(content.type()))
        .map(OpenAiApi.Response.ContentItem::text)
        .findFirst()
        .orElse(null);
}
```

### Streaming Request
```java
var request = new OpenAiApi.ResponseRequest("Tell me a story", "gpt-4o", true);
Flux<OpenAiApi.ResponseStreamEvent> stream = openAiApi.responseStream(request);

stream.subscribe(event -> {
    if ("response.output_text.delta".equals(event.type())) {
        System.out.print(event.delta());
    }
});
```

### Using Built-in Web Search Tool
```java
var webSearchTool = Map.of("type", "web_search");

var request = new OpenAiApi.ResponseRequest(
    "gpt-4o",
    "What is the current weather in San Francisco?",
    null, null, null, null, null,
    List.of(webSearchTool),  // Enable web_search tool
    null, null, false, null, null, null, null, null, null,
    List.of("web_search_call.action.sources"),  // Include search sources
    null, null, null, null, null, null
);

ResponseEntity<OpenAiApi.Response> response = openAiApi.responseEntity(request);
```

### Multi-turn Conversation
```java
// First request
var request1 = new OpenAiApi.ResponseRequest("What is 2+2?", "gpt-4o");
var response1 = openAiApi.responseEntity(request1);
String responseId = response1.getBody().id();

// Follow-up request
var request2 = new OpenAiApi.ResponseRequest(
    "gpt-4o",
    "And what is that number multiplied by 3?",
    null, null, null, null, null, null, null, null,
    false, null, null, null,
    responseId,  // Reference previous response
    null, null, null, null, null, null, null, null, null
);

var response2 = openAiApi.responseEntity(request2);
```

## Design Decisions

### Why Low-Level API Only?

The Responses API is fundamentally different from traditional chat models:

1. **Stateful vs Stateless**: The Responses API is designed for stateful, multi-turn agent applications, while `ChatModel` is stateless
2. **Built-in Tools**: Responses API provides native tools (web search, file search, etc.) without custom implementation, unlike `ChatModel`'s function calling
3. **Different Abstractions**: The output structure (`output` array with multiple item types) doesn't map cleanly to `ChatResponse`
4. **Agent-First Design**: Represents a new category of agentic applications rather than a traditional chat interface
5. **Future Evolution**: OpenAI is positioning this as the future of agent development, separate from chat completions

### Implementation Patterns

1. **Follows Existing Conventions**: Mirrors `chatCompletionEntity` and `chatCompletionStream` patterns
2. **Comprehensive DTOs**: All major API fields included for maximum flexibility
3. **Convenience Constructors**: Simplified constructors for common use cases
4. **Type Safety**: Uses Java records for immutable, type-safe DTOs
5. **Spring Boot Integration**: Full support for externalized configuration

## Backward Compatibility

✅ **Fully backward compatible**
- No changes to existing `ChatModel` implementations
- No changes to existing Chat Completions API usage
- New functionality is additive only
- Default values match OpenAI standards

## Testing

### Unit Tests
- ✅ 5 unit tests in `ResponsesApiTest`
- ✅ All existing tests continue to pass
- ✅ No compilation errors

### Integration Tests
- ✅ 4 new integration tests in `OpenAiApiIT`
- ✅ Cover synchronous, streaming, configuration, and built-in tools
- ✅ Minimal cost (~$0.0002-$0.0005 per run)
- ✅ Serve as usage examples

### Build Verification
- ✅ `spring-ai-openai` module builds successfully
- ✅ `spring-ai-autoconfigure-model-openai` module builds successfully
- ✅ All existing tests pass

## Benefits

1. **Early Access**: Enables developers to use OpenAI's latest agentic capabilities
2. **Built-in Tools**: Simplifies integration with web search, file search, etc.
3. **Future-Ready**: Positions Spring AI for OpenAI's agent-first direction
4. **Flexible**: Low-level API allows custom abstractions to be built on top
5. **Well-Documented**: Comprehensive tests serve as usage examples
6. **Cost-Efficient**: Integration tests designed to minimize API costs

## Future Enhancements

Potential future additions (not in this PR):
1. Higher-level abstractions if patterns emerge
2. Conversation management utilities
3. Response accumulator helpers for streaming
4. Observability support for Responses API calls
5. Integration with Spring AI's advisor pattern (if applicable)

## Migration from Chat Completions

For users wanting to try the Responses API:

| Chat Completions | Responses API |
|------------------|---------------|
| `messages` array | `input` (simplified) |
| Custom function implementation | Built-in tools (no code needed) |
| Manual conversation state | Native multi-turn support |
| Limited reasoning access | Full reasoning capabilities |
| `ChatCompletionRequest` | `ResponseRequest` |

## References

- [OpenAI Responses API Documentation](https://platform.openai.com/docs/api-reference/responses)
- [OpenAI Migration Guide](https://platform.openai.com/docs/guides/migrate-to-responses)
- [Responses vs Chat Completions](https://platform.openai.com/docs/guides/responses-vs-chat-completions)
- [OpenAI Java SDK](https://github.com/openai/openai-java) - Referenced for implementation patterns

## Checklist

- [x] Code compiles without errors
- [x] All existing tests pass
- [x] New unit tests added and passing
- [x] New integration tests added and passing
- [x] Documentation updated
- [x] Autoconfiguration support added
- [x] Spring Boot properties supported
- [x] Backward compatible
- [x] Follows existing code conventions
- [x] No breaking changes

---

**Note**: This PR intentionally does **not** integrate the Responses API with the high-level `ChatModel` abstractions. The Responses API represents a fundamentally different paradigm (stateful agents vs stateless chat) that doesn't fit the existing abstractions. This low-level API access allows the Spring AI community to experiment and potentially develop appropriate higher-level abstractions in the future.

Signed-off-by: Dmitry Bedrin <[email protected]>
@filiphr
Copy link
Contributor

filiphr commented Dec 5, 2025

Note: This PR intentionally does not integrate the Responses API with the high-level ChatModel abstractions. The Responses API represents a fundamentally different paradigm (stateful agents vs stateless chat) that doesn't fit the existing abstractions. This low-level API access allows the Spring AI community to experiment and potentially develop appropriate higher-level abstractions in the future.

Why are you saying this @bedrin according to https://platform.openai.com/docs/guides/migrate-to-responses OpenAI themselves are recommending new projects to use the new Responses API

While Chat Completions remains supported, Responses is recommended for all new projects.

Some models e.g. gpt-5.1-codex do not support the Chat Completions API and only support the Responses API. I believe that there can be a ChatModel abstraction build on top of the Responses API.

This PR is a step in a good direction. However, I do believe that Spring AI itself should support the Responses API through the ChatModel abstraction.

@bedrin
Copy link
Contributor Author

bedrin commented Dec 5, 2025

@filiphr I meant that we cannot use all functionality provided by new Responses API in existing Spring AI ChatModel abstraction. But indeed we can still use it - you're right. But I think it should be a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for new Open AI Responses API

2 participants