Skip to content

azure-ai-agentserver: Support background mode and resumable streaming for hosted agents #46015

@niallkeys

Description

@niallkeys

Feature Request

Support background: true and resumable streaming (GET /responses/{id}?stream=true&starting_after=N) for hosted agents, matching the behavior already available for direct model invocations via the Foundry Responses API.

Current Behavior

Direct model invocation (works)

When calling the Responses API directly with a model deployment:

POST /openai/responses
{
  "model": "gpt-4o",
  "stream": true,
  "store": true,
  "background": true,
  "input": [{"role": "user", "content": "Hello"}]
}
  • Returns immediately with status: "in_progress" and "background": true
  • Every SSE event includes a sequence_number
  • GET /responses/{id}?stream=true&starting_after=0 replays all stored events, then continues with live events if still in progress
  • Response lifecycle completes correctly (status: "completed", output populated)

Hosted agent invocation (does not work)

When calling the same Responses API with an agent_reference pointing to a hosted agent:

POST /openai/responses
{
  "model": "gpt-5",
  "stream": true,
  "store": true,
  "background": true,
  "agent_reference": {"type": "agent_reference", "name": "my-agent"},
  "input": [{"role": "user", "content": "Hello"}]
}
  • Returns immediately with status: "in_progress" but the stored response has "background": false — the flag is silently dropped
  • GET /responses/{id}?stream=true&starting_after=0 returns "Streaming is not enabled for this response"
  • The response object remains stuck at status: "in_progress" with empty output, even after the agent completes and conversation items are saved
  • The azure-ai-agentserver-core SDK (v1.0.0b16) has no reference to background anywhere in its source

Why This Matters

The primary use case is resumable/rejoinable streaming for chat UIs. When a user:

  • Refreshes the page mid-generation
  • Loses network connectivity temporarily
  • Opens a conversation that is still being generated in another tab

They should be able to call GET /responses/{id}?stream=true&starting_after=0 to replay past events and continue receiving live events. This works today for direct model calls but not for hosted agents, despite both using the same /openai/responses API surface.

Observations

  • The SDK already assigns sequence_number to every ResponseStreamEvent via StreamEventState — the primitive for starting_after resumption is already in place
  • The SDK already supports store=true which saves completed items to the Conversations API after stream completion
  • The gap appears to be in Foundry's proxy layer between the Responses API endpoint and the hosted agent container — it doesn't buffer/store SSE events as they pass through for hosted agents the way it does for direct model calls

Expected Behavior

background: true + stream: true should work identically for hosted agents as it does for direct model calls:

  1. Foundry buffers SSE events as they pass through from the hosted agent container
  2. GET /responses/{id}?stream=true&starting_after=N replays stored events and continues with live events
  3. The response lifecycle completes correctly when the agent finishes

Environment

  • azure-ai-agentserver-core==1.0.0b16
  • azure-ai-agentserver-langgraph==1.0.0b16
  • API version: 2025-11-15-preview

Metadata

Metadata

Assignees

No one assigned

    Labels

    customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions