chore: blob export reference by sumerman · Pull Request #2771 · langfuse/langfuse-docs

sumerman · 2026-04-02T12:20:44Z

No description provided.

vercel · 2026-04-02T12:20:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
langfuse-docs	Ready	Preview, Comment	Apr 2, 2026 3:23pm

github-actions · 2026-04-02T12:20:53Z

@claude review

content/docs/api-and-data-platform/features/blob-storage-export-fields.mdx

claude · 2026-04-02T15:33:47Z

content/docs/api-and-data-platform/features/blob-storage-export-fields.mdx

+| `name` | string | User-defined observation name. | Group/filter by name (e.g. function name, model call label). |
+| `metadata` | object | User-supplied key-value metadata. | Arbitrary context. Extract keys relevant to your analytics. |
+| `level` | string | Log level: `DEBUG`, `DEFAULT`, `WARNING`, `ERROR`. | Filter for errors or warnings. |
+| `status_message` | string | Status or error message. | Inspect for debugging failed observations. |
+| `version` | string | User-provided version string set via the SDK. | Informational. |
+| `input` | string | Observation input payload. | For generations: the prompt/messages sent to the LLM. May be plain text or JSON; may be large. |
+| `output` | string | Observation output payload. | For generations: the LLM response. May be plain text or JSON; may be large. |
+| `provided_model_name` | string | Model name as provided by the user/SDK. | The raw model string (e.g. `gpt-4o`, `claude-sonnet-4-20250514`). This is what the API returns as `model`. |
+| `model_parameters` | string | Model call parameters as a JSON-encoded string (e.g. `"{\"temperature\":0.7}"`). | Parse as JSON. Useful for analyzing how model settings affect quality/cost. |
+| `usage_details` | object (string → integer) | Token usage breakdown by category. | Extract keys: `input` for input tokens, `output` for output tokens, `total` for total. May contain additional keys like `input_cached_tokens`, `reasoning_tokens`, etc. |
+| `cost_details` | object (string → number) | Cost breakdown by category (USD). | Extract keys: `input` for input cost, `output` for output cost. |


🔴 Eight fields in the observations table are documented as non-nullable but will be null in practice: name, version, input, and output are optional SDK parameters that may be absent on any observation type; provided_model_name and model_parameters are GENERATION-specific and null for SPAN/EVENT observations; usage_details and cost_details are token/cost fields that are also null for non-GENERATION observations. All eight should be annotated with "or null" to match the documents own convention used for adjacent nullable fields like model_id, prompt_version, and end_time.

Extended reasoning...

What the bug is and how it manifests

The observations table in the new blob-storage-export-fields.mdx documents eight fields without a null annotation, implying they are non-nullable. However, all eight can and will be null in real exports depending on the observation type and how the SDK was called. The document itself establishes a clear convention of X or null for nullable fields (e.g., parent_observation_id: string or null, end_time: string (timestamp) or null, model_id: string or null, prompt_version: integer or null), making the omission on these eight fields an internally inconsistent documentation error.

The specific code paths that trigger nulls

Fields name, version, input, output (lines 81-87): These are all optional parameters in the Langfuse SDK. A user can call langfuse.span(trace_id=..., start_time=..., end_time=...) or langfuse.event(trace_id=..., start_time=...) without providing any of these. EVENT observations in particular are point-in-time markers that often have no name, no input, and no output. The docs own usage notes reinforce this: the input note says "For generations: the prompt/messages sent to the LLM" and output says "For generations: the LLM response" - both explicitly scope these fields to GENERATION type, implying they are absent (null) for SPAN and EVENT rows. The inconsistency is visible within the same table: prompt_version is correctly typed integer or null, but the semantically equivalent version field is plain string.

Fields provided_model_name, model_parameters (lines 88-89): These are LLM/model-specific fields. The type field documentation lists three values: SPAN, GENERATION, or EVENT, with the note "Generations are LLM calls; spans are arbitrary operations; events are point-in-time markers." Only GENERATION observations involve an LLM call and therefore have a model name or call parameters. SPAN and EVENT rows will be null for both fields. The document already acknowledges this pattern for the derived field model_id, which is correctly typed string or null with the note "Null if no model definition matched" - if the resolved ID is nullable, provided_model_name is even more clearly nullable for non-GENERATION observations that never supply a model at all.

Fields usage_details, cost_details (lines 90-91): These track LLM token counts and monetary costs. The usage_details description says "Extract keys: input for input tokens, output for output tokens" - tokens only exist for GENERATION calls. A SPAN tracking a retrieval step or an EVENT marker has no token usage and no cost. All adjacent model-specific cost fields (input_price, output_price, total_price, usage_pricing_tier_name) are already correctly documented as or null. Documenting usage_details and cost_details as non-nullable while all their derived pricing fields are nullable is internally contradictory.

Why existing documentation does not prevent it

This is a new documentation file added entirely by this PR. No automated linter or schema validator checks type accuracy in MDX prose tables. The inconsistency was introduced by applying or null correctly to many nullable fields but omitting it from these eight. The contrast is visible on the same table row: prompt_version is correctly typed integer or null, but version is plain string.

Impact

Data engineers reading this reference to define warehouse schemas will reasonably treat absent or null as meaning the column is always populated. A data engineer who writes name TEXT NOT NULL, input TEXT NOT NULL, provided_model_name TEXT NOT NULL in their DDL, or who unconditionally dereferences usage_details keys in their pipeline, will encounter NOT NULL constraint violations or null-dereference errors on the first batch containing any non-GENERATION observation or any observation created without these optional fields.

Step-by-step proof

A user instruments a RAG pipeline: SPAN observations for retrieval, GENERATION observations for LLM calls, EVENT observations for cache hits.

In the export, every SPAN and EVENT row has null for provided_model_name, model_parameters, usage_details, and cost_details. Many rows also have null for name, input, output, and version since those were not provided.

A data engineer reads the observations table and sees name: string, input: string, provided_model_name: string, usage_details: object - all without or null.

They create a warehouse table with NOT NULL constraints on these columns and pipeline code that always accesses usage_details["input"].

The first export load fails: NOT NULL violations on every non-GENERATION row; null-dereference errors when parsing usage_details on SPAN/EVENT rows.

How to fix

Change all eight fields to include or null: name: string or null, version: string or null, input: string or null, output: string or null, provided_model_name: string or null, model_parameters: string or null, usage_details: object (string -> integer) or null, cost_details: object (string -> number) or null. Optionally add usage notes clarifying that provided_model_name, model_parameters, usage_details, and cost_details are null for non-GENERATION observations.

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 2, 2026

dosubot bot added the documentation Improvements or additions to documentation label Apr 2, 2026

vercel bot deployed to Preview April 2, 2026 12:22 View deployment

claude bot reviewed Apr 2, 2026

View reviewed changes

content/docs/api-and-data-platform/features/blob-storage-export-fields.mdx Show resolved Hide resolved

content/docs/api-and-data-platform/features/blob-storage-export-fields.mdx Outdated Show resolved Hide resolved

vercel bot deployed to Preview April 2, 2026 12:31 View deployment

claude bot reviewed Apr 2, 2026

View reviewed changes

content/docs/api-and-data-platform/features/blob-storage-export-fields.mdx Show resolved Hide resolved

Copilot started work on behalf of sumerman April 2, 2026 15:17 View session

Copilot stopped work on behalf of sumerman due to an error April 2, 2026 15:18
The session was cancelled by the user.

chore: blob export reference

2029874

sumerman force-pushed the valeriy/blob-export-field-reference branch from 54ac538 to 2029874 Compare April 2, 2026 15:21

vercel bot deployed to Preview April 2, 2026 15:23 View deployment

claude bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: blob export reference#2771

chore: blob export reference#2771
sumerman wants to merge 1 commit intomainfrom
valeriy/blob-export-field-reference

sumerman commented Apr 2, 2026

Uh oh!

vercel bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sumerman commented Apr 2, 2026

Uh oh!

vercel bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot Apr 2, 2026

Choose a reason for hiding this comment

What the bug is and how it manifests

The specific code paths that trigger nulls

Why existing documentation does not prevent it

Impact

Step-by-step proof

How to fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Apr 2, 2026 •

edited

Loading