Conversation
Adds the cxtx-side observability layer: - New cxdb-otel crate (workspace top-level) — env-gated OTEL bootstrap. Fully inert when OTEL_EXPORTER_OTLP_ENDPOINT is unset; W3C TraceContext + OTLP/gRPC traces + metrics when enabled. Helpers for tiny_http extraction and reqwest injection. gen_ai.* metric emit surface. - cxtx/src/otel/ — call context, finish-reason mapping, derived buckets, finalize_llm_call (chat <model> spans + gen_ai.client.token.usage histogram + per-call counter, single emit site shared by Anthropic and OpenAI provider finalize paths). - cxtx/src/provider/usage.rs — typed UsageOutcome parser covering 16-cell Anthropic/OpenAI matrix (SSE + JSON, ChatCompletions + Responses, with and without include_usage). Real token counts now flow into TurnMetrics. - TurnMetrics gains usage_status: Option<String> (msgpack tag 8) tagging non-happy-path turns. ContextMetadata gains tenant: Option<String> (msgpack tag 5) for app.tenant attribution. Both are additive. - cxtx HTTP client wraps every outbound call in inject_reqwest so W3C traceparent/tracestate flow to cxdb. Async delivery worker captures parent context at enqueue and threads it through retries via an explicit-context variant (ContextGuard is not Send). - session.rs replay-dedup normalization strips telemetry fields so OTEL attribution does not perturb HistoryItem equality. Regression-pinned. - Fixtures: 17 cxtx/tests/fixtures/usage/ (16-cell matrix + aborted) with redaction lint. Tests: otel_emit, otel_noop, trace_continuity, usage_integration, usage_matrix, fixtures_lint. The integration-test start_http call temporarily drops a trusted_proxies argument that the server side does not yet accept; the server OTEL port restores it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces "Sprint NNN" tags in doc comments with neutral phrasings (Phase, Decision, Tenant, OTEL emit) so the OSS code reads self-contained without referencing internal sprint planning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire-schema: - emit usage_status (tag 8) in turn_metrics_payload; previously serialized in the wire type but dropped on every HTTP upload - register tag 5 (tenant) on ContextMetadata and tag 8 (usage_status) on TurnMetrics in conversation_registry_bundle.json; bump bundle_id to v3.1 since the server caches by id and rejects same-id-different-content - add Tenant/UsageStatus to the Go client so msgpack stays additive across Rust/Go consumers Provider parsing: - honor request.stream when building CallContext; was hardcoded to true - return NotReported (with canonical finish reason) for Responses JSON bodies that omit usage; previously returned None and lost the breadcrumb - compute UsageOutcome before the empty-content early-return in OpenAI finalize_stream so calls that complete with billed usage but no content (e.g. Responses completed with empty output) still emit cost telemetry Span attribution: - thread error_type from map_openai_responses through canonical_finish so failed:<code> stamps error.type=<code> on the span Bootstrap config: - drop unused OtelConfig fields (headers, traces_sampler, default_histogram_aggregation); the OTLP exporter and trace SDK already read these env vars on their own in opentelemetry 0.27, so leaving them parsed-but-unused was misleading Tests: - TurnMetrics payload usage_status round-trip (present + omitted) - Responses failed:<code> stamps error.type on span - Responses JSON without usage returns NotReported with finish reason Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ervation Addresses PR #28 review findings: * Stream-aborted upstream now routes to OTEL `Error(StreamAborted)` instead of misclassifying as `NotReported`. Adds `stream_aborted: Option<String>` to OpenAi/Anthropic exchange state with a `mark_stream_aborted` setter on the `ExchangeState` wrapper; `proxy.rs` calls it on the SSE-loop Err branch before break. `finalize_stream` checks this first (above malformed-remainder, parse-errors, and status>=400) so the surviving 2xx upstream status no longer routes the abort through the happy path. Partial assistant content is still stored, tagged with the same `error:StreamAborted` usage_status. New `cxtx/tests/stream_aborted.rs` pins both providers, both empty and partial-content abort cases. * Responses-API streaming now preserves the canonical finish reason when `response.completed` lacks a `usage` object. `absorb_responses_event` derives the reason before the usage check and pushes it into `finish_reasons_raw` so the NotReported partial keeps the incomplete/failed signal that distinguishes a clean stop from `incomplete:length` / `failed:<code>`. New unit tests cover both variants. CI fixes bundled in: * `cargo fmt` collapse on `TurnMetrics.usage_status` serde attribute * Dockerfile cache step now copies `cxdb-otel/Cargo.toml` + dummy src so the workspace dependency resolution succeeds * server `turn_store/mod.rs` clippy drive-by: `sort_by_key` for the toolchain-bumped `unnecessary_sort_by` lint 271 tests pass (was 261); all 7 cxtx test suites green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds OpenTelemetry observability to the cxtx wrapper for per-application LLM cost attribution. Server-side OTEL is intentionally deferred to a follow-up PR — this one ships the cxtx piece, where the cost-attribution KPI lives.
What's in
cxdb-otelcrate (workspace top-level) — env-gated OTEL bootstrap. Fully inert whenOTEL_EXPORTER_OTLP_ENDPOINTis unset (no subscriber installed, no exporter spun up); W3C TraceContext + OTLP/gRPC traces + metrics when set. Helpers for tiny_http extraction and reqwest header injection. Lazygen_ai.*metric instrument creation.cxtx/src/otel/— call context, finish-reason mapping (14 provider → canonical rows), derived buckets,finalize_llm_call(single emit site forchat <model>spans +gen_ai.client.token.usagehistogram +gen_ai.callscounter, shared by Anthropic and OpenAI provider finalize paths).cxtx/src/provider/usage.rs— typedUsageOutcomeparser covering the 16-cell Anthropic/OpenAI matrix (SSE + JSON, ChatCompletions + Responses, with/withoutinclude_usage) plus aborted-stream classification. Real token counts now stampTurnMetricsinstead of zeros.TurnMetrics.usage_status: Option<String>— msgpack tag 8, tags non-happy-path turns (not_reported,error:<class>).ContextMetadata.tenant: Option<String>— msgpack tag 5, used forapp.tenantOTEL attribution. Empty-string is treated as absent (no sentinels).inject_reqwestso W3Ctraceparent/tracestateflow downstream. Async delivery worker captures parent context at enqueue and threads it through retries via an explicit-context variant (ContextGuardis notSend).session.rsnormalization strips telemetry fields so OTEL attribution does not perturbHistoryItemequality. Pinned by regression test.cxtx/tests/fixtures/usage/(16-cell matrix + aborted stream) with a redaction lint test.otel_emit,otel_noop,trace_continuity,usage_integration,usage_matrix,fixtures_lint.What's NOT in (deferred)
app.tenantserver-side stamping). The server's HTTP layer has substantial drift requiring careful merging — separate PR. Note:ContextMetadata.tenant(msgpack tag 5) is wire-only in this PR — the server'sextract_context_metadatadoes not yet read tag 5 and the cachedContextMetadatastruct inserver/src/store.rshas notenantfield, so list/projection APIs do not surface the tenant. Will land alongside server-side OTEL in the follow-up.Configuration
OTEL stays fully inert by default. Operators opt in by setting
OTEL_EXPORTER_OTLP_ENDPOINT(and standard companion env vars:OTEL_SERVICE_NAME,OTEL_RESOURCE_ATTRIBUTES,OTEL_TRACES_SAMPLER, etc.). cxtx adds one app-specific env:CXTX_TENANTforapp.tenantattribution.Cardinality control:
gen_ai.client.token.usagehistogram dropsapp.session_id/app.user/app.wrapper_versionfrom metric attribute sets (kept on spans).Test plan
cargo check --workspace --all-targetscleancargo test --workspace— 271 tests across 27 suites pass, including all 6 new cxtx OTEL suites + the newstream_abortedsuiteotel_noopconfirms unset endpoint installs no subscriberassert_dedup_ignores_telemetryconfirmsCallContextand tenant do not affectHistoryItemequalityfixtures_lintconfirms no provider response bodies contain redactable tokens🤖 Generated with Claude Code