Skip to content

feat(otel): cxtx OTEL bootstrap, usage parser, LLM cost spans#28

Open
divySDM wants to merge 4 commits intomainfrom
feat/otel
Open

feat(otel): cxtx OTEL bootstrap, usage parser, LLM cost spans#28
divySDM wants to merge 4 commits intomainfrom
feat/otel

Conversation

@divySDM
Copy link
Copy Markdown

@divySDM divySDM commented Apr 30, 2026

Summary

Adds OpenTelemetry observability to the cxtx wrapper for per-application LLM cost attribution. Server-side OTEL is intentionally deferred to a follow-up PR — this one ships the cxtx piece, where the cost-attribution KPI lives.

What's in

  • New cxdb-otel crate (workspace top-level) — env-gated OTEL bootstrap. Fully inert when OTEL_EXPORTER_OTLP_ENDPOINT is unset (no subscriber installed, no exporter spun up); W3C TraceContext + OTLP/gRPC traces + metrics when set. Helpers for tiny_http extraction and reqwest header injection. Lazy gen_ai.* metric instrument creation.
  • cxtx/src/otel/ — call context, finish-reason mapping (14 provider → canonical rows), derived buckets, finalize_llm_call (single emit site for chat <model> spans + gen_ai.client.token.usage histogram + gen_ai.calls counter, shared by Anthropic and OpenAI provider finalize paths).
  • cxtx/src/provider/usage.rs — typed UsageOutcome parser covering the 16-cell Anthropic/OpenAI matrix (SSE + JSON, ChatCompletions + Responses, with/without include_usage) plus aborted-stream classification. Real token counts now stamp TurnMetrics instead of zeros.
  • Wire-schema additions (additive, backward-compatible):
    • TurnMetrics.usage_status: Option<String> — msgpack tag 8, tags non-happy-path turns (not_reported, error:<class>).
    • ContextMetadata.tenant: Option<String> — msgpack tag 5, used for app.tenant OTEL attribution. Empty-string is treated as absent (no sentinels).
  • HTTP client wraps every outbound call in inject_reqwest so W3C traceparent/tracestate flow downstream. Async delivery worker captures parent context at enqueue and threads it through retries via an explicit-context variant (ContextGuard is not Send).
  • Replay-dedup invariance: session.rs normalization strips telemetry fields so OTEL attribution does not perturb HistoryItem equality. Pinned by regression test.
  • 17 fixtures under cxtx/tests/fixtures/usage/ (16-cell matrix + aborted stream) with a redaction lint test.
  • 6 new test suites: otel_emit, otel_noop, trace_continuity, usage_integration, usage_matrix, fixtures_lint.

What's NOT in (deferred)

  • Server-side OTEL (request spans, route metrics, store/registry instrumentation, trace continuity on receiving end, XFF trust contract, app.tenant server-side stamping). The server's HTTP layer has substantial drift requiring careful merging — separate PR. Note: ContextMetadata.tenant (msgpack tag 5) is wire-only in this PR — the server's extract_context_metadata does not yet read tag 5 and the cached ContextMetadata struct in server/src/store.rs has no tenant field, so list/projection APIs do not surface the tenant. Will land alongside server-side OTEL in the follow-up.
  • Validation pipeline (golden + wire OTEL test gates against an OTLP collector). Separate effort.
  • Binary-protocol TRACE frame (capability bit + wire-format trailer for client → server context propagation over the binary protocol). No wire-protocol bump in this PR.

Configuration

OTEL stays fully inert by default. Operators opt in by setting OTEL_EXPORTER_OTLP_ENDPOINT (and standard companion env vars: OTEL_SERVICE_NAME, OTEL_RESOURCE_ATTRIBUTES, OTEL_TRACES_SAMPLER, etc.). cxtx adds one app-specific env: CXTX_TENANT for app.tenant attribution.

Cardinality control: gen_ai.client.token.usage histogram drops app.session_id / app.user / app.wrapper_version from metric attribute sets (kept on spans).

Test plan

  • cargo check --workspace --all-targets clean
  • cargo test --workspace — 271 tests across 27 suites pass, including all 6 new cxtx OTEL suites + the new stream_aborted suite
  • OTEL no-op regression: otel_noop confirms unset endpoint installs no subscriber
  • Replay dedup invariance: assert_dedup_ignores_telemetry confirms CallContext and tenant do not affect HistoryItem equality
  • Fixtures lint: fixtures_lint confirms no provider response bodies contain redactable tokens
  • Live OTLP-collector smoke test (do once a tenant/collector is reachable)

🤖 Generated with Claude Code

divySDM and others added 4 commits April 30, 2026 14:48
Adds the cxtx-side observability layer:

- New cxdb-otel crate (workspace top-level) — env-gated OTEL bootstrap.
  Fully inert when OTEL_EXPORTER_OTLP_ENDPOINT is unset; W3C TraceContext
  + OTLP/gRPC traces + metrics when enabled. Helpers for tiny_http
  extraction and reqwest injection. gen_ai.* metric emit surface.

- cxtx/src/otel/ — call context, finish-reason mapping, derived buckets,
  finalize_llm_call (chat <model> spans + gen_ai.client.token.usage
  histogram + per-call counter, single emit site shared by Anthropic
  and OpenAI provider finalize paths).

- cxtx/src/provider/usage.rs — typed UsageOutcome parser covering 16-cell
  Anthropic/OpenAI matrix (SSE + JSON, ChatCompletions + Responses, with
  and without include_usage). Real token counts now flow into TurnMetrics.

- TurnMetrics gains usage_status: Option<String> (msgpack tag 8) tagging
  non-happy-path turns. ContextMetadata gains tenant: Option<String>
  (msgpack tag 5) for app.tenant attribution. Both are additive.

- cxtx HTTP client wraps every outbound call in inject_reqwest so W3C
  traceparent/tracestate flow to cxdb. Async delivery worker captures
  parent context at enqueue and threads it through retries via an
  explicit-context variant (ContextGuard is not Send).

- session.rs replay-dedup normalization strips telemetry fields so OTEL
  attribution does not perturb HistoryItem equality. Regression-pinned.

- Fixtures: 17 cxtx/tests/fixtures/usage/ (16-cell matrix + aborted)
  with redaction lint. Tests: otel_emit, otel_noop, trace_continuity,
  usage_integration, usage_matrix, fixtures_lint.

The integration-test start_http call temporarily drops a trusted_proxies
argument that the server side does not yet accept; the server OTEL port
restores it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces "Sprint NNN" tags in doc comments with neutral phrasings
(Phase, Decision, Tenant, OTEL emit) so the OSS code reads
self-contained without referencing internal sprint planning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire-schema:
- emit usage_status (tag 8) in turn_metrics_payload; previously serialized
  in the wire type but dropped on every HTTP upload
- register tag 5 (tenant) on ContextMetadata and tag 8 (usage_status) on
  TurnMetrics in conversation_registry_bundle.json; bump bundle_id to
  v3.1 since the server caches by id and rejects same-id-different-content
- add Tenant/UsageStatus to the Go client so msgpack stays additive across
  Rust/Go consumers

Provider parsing:
- honor request.stream when building CallContext; was hardcoded to true
- return NotReported (with canonical finish reason) for Responses JSON
  bodies that omit usage; previously returned None and lost the breadcrumb
- compute UsageOutcome before the empty-content early-return in OpenAI
  finalize_stream so calls that complete with billed usage but no content
  (e.g. Responses completed with empty output) still emit cost telemetry

Span attribution:
- thread error_type from map_openai_responses through canonical_finish so
  failed:<code> stamps error.type=<code> on the span

Bootstrap config:
- drop unused OtelConfig fields (headers, traces_sampler,
  default_histogram_aggregation); the OTLP exporter and trace SDK already
  read these env vars on their own in opentelemetry 0.27, so leaving them
  parsed-but-unused was misleading

Tests:
- TurnMetrics payload usage_status round-trip (present + omitted)
- Responses failed:<code> stamps error.type on span
- Responses JSON without usage returns NotReported with finish reason

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ervation

Addresses PR #28 review findings:

* Stream-aborted upstream now routes to OTEL `Error(StreamAborted)` instead
  of misclassifying as `NotReported`. Adds `stream_aborted: Option<String>`
  to OpenAi/Anthropic exchange state with a `mark_stream_aborted` setter
  on the `ExchangeState` wrapper; `proxy.rs` calls it on the SSE-loop Err
  branch before break. `finalize_stream` checks this first (above
  malformed-remainder, parse-errors, and status>=400) so the surviving
  2xx upstream status no longer routes the abort through the happy path.
  Partial assistant content is still stored, tagged with the same
  `error:StreamAborted` usage_status. New `cxtx/tests/stream_aborted.rs`
  pins both providers, both empty and partial-content abort cases.

* Responses-API streaming now preserves the canonical finish reason when
  `response.completed` lacks a `usage` object. `absorb_responses_event`
  derives the reason before the usage check and pushes it into
  `finish_reasons_raw` so the NotReported partial keeps the
  incomplete/failed signal that distinguishes a clean stop from
  `incomplete:length` / `failed:<code>`. New unit tests cover both
  variants.

CI fixes bundled in:
* `cargo fmt` collapse on `TurnMetrics.usage_status` serde attribute
* Dockerfile cache step now copies `cxdb-otel/Cargo.toml` + dummy src so
  the workspace dependency resolution succeeds
* server `turn_store/mod.rs` clippy drive-by: `sort_by_key` for the
  toolchain-bumped `unnecessary_sort_by` lint

271 tests pass (was 261); all 7 cxtx test suites green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant