Skip to content

fix: synchronize ACP telemetry and refresh remote final state#2460

Open
simonrosenberg wants to merge 2 commits intomainfrom
fix/issue-2375-acp-telemetry
Open

fix: synchronize ACP telemetry and refresh remote final state#2460
simonrosenberg wants to merge 2 commits intomainfrom
fix/issue-2375-acp-telemetry

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Mar 16, 2026

Fixes #2375

This implements the fix direction from the issue discussion:

  • move ACP telemetry writes onto a single synchronized path in ACPAgent
  • stop mutating metrics directly from session_update()
  • wait for the turn's UsageUpdate before recording cost/tokens/latency
  • refresh the authoritative remote conversation state before run() returns
  • keep event reconciliation for history completeness after the final state refresh

Why:

Latest zero-cost ACP benchmark rows were caused by two separate correctness problems:

  1. ACP telemetry was split across notification handling and prompt response handling.
  2. RemoteConversation could return from REST fallback with stale cached state, leaving conversation_stats at zero even when the server had final stats.

Tests:

  • PYTHONPATH=/tmp/sdk-issue-2375/openhands-sdk${PYTHONPATH:+:$PYTHONPATH} pytest tests/sdk/agent/test_acp_agent.py tests/sdk/conversation/remote/test_remote_conversation.py

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d61f7eb-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d61f7eb-python \
  ghcr.io/openhands/agent-server:d61f7eb-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d61f7eb-golang-amd64
ghcr.io/openhands/agent-server:d61f7eb-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d61f7eb-golang-arm64
ghcr.io/openhands/agent-server:d61f7eb-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d61f7eb-java-amd64
ghcr.io/openhands/agent-server:d61f7eb-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d61f7eb-java-arm64
ghcr.io/openhands/agent-server:d61f7eb-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d61f7eb-python-amd64
ghcr.io/openhands/agent-server:d61f7eb-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:d61f7eb-python-arm64
ghcr.io/openhands/agent-server:d61f7eb-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:d61f7eb-golang
ghcr.io/openhands/agent-server:d61f7eb-java
ghcr.io/openhands/agent-server:d61f7eb-python

About Multi-Architecture Support

  • Each variant tag (e.g., d61f7eb-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., d61f7eb-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

REST API breakage checks (OpenAPI) — ❌ FAILED

Result:FAILED

⚠️ Breaking REST API changes or policy violations detected.

Log excerpt (first 1000 characters)
::error title=openhands-agent-server REST API::Breaking REST API change detected without MINOR version bump (1.14.0 -> 1.14.0).

Breaking REST API changes detected compared to baseline release:
- added '#/components/schemas/ACPAgent-Output, #/components/schemas/Agent-Output' to the '/items/anyOf[subschema #1: ConversationInfo]/agent' response property 'oneOf' list for the response status '200'
- the '/items/anyOf[subschema #1: ConversationInfo]/agent' response's property type/format changed from 'object'/'' to ''/'' for status '200'
- removed the required property '/items/anyOf[subschema #1: ConversationInfo]/agent/kind' from the response with the '200' status
- removed the required property '/items/anyOf[subschema #1: ConversationInfo]/agent/llm' from the response with the '200' status
- the 'agent' request property type/format changed from 'object'/'' to ''/''
- added '#/components/schemas/ACPAgent-Output, #/components/schemas/Agent-Output' to the 'agent' response property 'oneOf' li

Action log

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Core telemetry fix is solid, but bundled breaking changes need attention

Verdict: ❌ Needs documentation - The synchronization fix is correct, but two undocumented breaking changes (retry removal + hook regression) should be called out in the PR description or split into separate PRs.

Key Insight: Moving cost recording to a single synchronized path after UsageUpdate receipt is the right fix for zero-cost telemetry. The per-session tracking is cleaner than global state. However, removing ~150 lines of retry logic and changing hook behavior are significant changes that deserve explicit justification.

# PromptResponse, so under normal conditions the notification handler
# completes almost immediately. This timeout is a safety net for slow
# or remote servers.
_USAGE_UPDATE_TIMEOUT: float = float(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Undocumented Breaking Change: The PR completely removes connection retry logic (previously handled ConnectionError, BrokenPipeError, EOFError with 3 retries + exponential backoff). This timeout constant replacement is good, but the retry removal is undocumented.

Why this matters: Production ACP servers can have temporary network blips, container restarts, or connection resets. The old code tolerated these via retry. The new code will fail immediately.

What is needed: Either (1) explain in the PR description why retry is no longer needed, or (2) restore retry logic, or (3) split this removal into a separate PR with justification.

elapsed: Wall-clock seconds for this prompt round-trip (optional).
usage_update: The synchronized ACP UsageUpdate for this turn, if any.
"""
if usage_update is not None and usage_update.cost is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable: Single telemetry recording path is exactly what was needed. Cost/tokens/latency all processed in one place after synchronization eliminates the split-brain problem described in the issue. Per-session cost tracking (_last_cost_by_session) correctly handles multiple concurrent sessions.

self._session_id,
)
await _drain_notifications()
if self._client.get_turn_usage_update(self._session_id or "") is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable - Good Taste: This synchronization approach is correct. Preparing the event before prompt(), waiting for UsageUpdate notification if not already received, then processing in a single path eliminates the race condition. The 2.0s timeout is reasonable (server writes UsageUpdate before PromptResponse per ACP protocol). Configurable via env var is pragmatic.

HookEventType.USER_PROMPT_SUBMIT,
HookEventType.STOP,
)
if any(hook_config.has_hooks_for_event(t) for t in unsupported):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Breaking Change: This changes hook behavior from "all hooks executed server-side" to "only SessionStart/SessionEnd supported locally, others emit warning but do not execute".

What breaks: Code relying on PRE_TOOL_USE, POST_TOOL_USE, USER_PROMPT_SUBMIT, or STOP hooks will silently stop working (just a warning in logs).

Why risky: This is a functional regression unrelated to the telemetry fix. It should be: (1) Documented in the PR description as a breaking change, OR (2) Split into a separate PR with justification for the regression.

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional feedback on test coverage:

mock_response = MagicMock()
mock_response.usage = None
assert len(agent.llm.metrics.token_usages) == 0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Test Gap: The PR tests when UsageUpdate is completely absent (cost=None in fixture), but not when it times out after 2.0s. Consider adding a test that verifies the timeout path:

def test_step_records_partial_metrics_on_usage_timeout(self, tmp_path):
    """Timeout waiting for UsageUpdate logs warning but records available token metrics."""
    # Setup: executor returns response but never populates turn_usage_updates
    # Assert: warning logged, token_usages recorded from PromptResponse, but costs remain at 0

This would verify graceful degradation when the ACP server is slow/buggy and UsageUpdate never arrives before timeout.

@simonrosenberg
Copy link
Collaborator Author

Addressed the remaining review items:

  • restored ACP prompt retry behavior for transient connection errors
  • reverted the unrelated RemoteConversation hook behavior change so hooks remain server-side
  • added a timeout-path ACP telemetry test to verify graceful degradation when UsageUpdate does not arrive in time

Verification:

  • PYTHONPATH=/tmp/sdk-issue-2375/openhands-sdk${PYTHONPATH:+:$PYTHONPATH} pytest tests/sdk/agent/test_acp_agent.py tests/sdk/conversation/remote/test_remote_conversation.py
  • result: 109 passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ACP Cost Tracking Bug

2 participants