[Tracing] ByteStream objects cause oversized payloads in tracing backends

[Generated by Cursor]
## Description

### Problem
When using Haystack's tracing feature with components that handle multimodal data (images, audio, video via `ByteStream` objects), the tracing system serializes the full binary data, causing:

1. **Oversized payloads** that exceed backend limits (Langfuse, OpenTelemetry, etc.)
2. **Performance degradation** due to serializing/transmitting megabytes of base64 data
3. **Tracing failures** when backends reject large payloads
4. **No practical debugging value** (you rarely need the full image in traces)

Similarly, ImageContent may be affected as well.

### Root Cause

In `haystack/tracing/utils.py`, the `_serializable_value()` function calls `to_dict()` on objects that have it:

```python
def _serializable_value(value: Any) -> Any:
    if isinstance(value, list):
        return [_serializable_value(v) for v in value]

    if isinstance(value, dict):
        return {k: _serializable_value(v) for k, v in value.items()}

    if getattr(value, "to_dict", None):
        return _serializable_value(value.to_dict())  # ⚠️ Problem here

    return value
```

When a `ByteStream` (or any object containing one) is traced:
- `ByteStream.to_dict()` converts binary data to a list of integers or base64 string
- A 1MB image becomes ~1.3MB of serialized data in the trace
- This gets multiplied across multiple spans/components

### Example Scenario

```python
from haystack.dataclasses import ByteStream, ChatMessage
from haystack import Pipeline, tracing

# Create a message with an image
image_data = open("large_image.png", "rb").read()  # 5MB image
bytestream = ByteStream(data=image_data, mime_type="image/png")
message = ChatMessage.from_user(text="What's in this image?", media=[bytestream])

# When tracing is enabled
tracing.enable_tracing()
pipeline.run({"messages": [message]})

# Result: ~6.5MB of base64 data in EACH span that touches this message
# Langfuse/OpenTelemetry may reject the payload or timeout
```

The problem is **recursive**: `ChatMessage` → `MediaContent` → `ByteStream` means the serialization happens at multiple levels.

---

## Proposed Solution

Add special handling for `ByteStream` objects before calling `to_dict()`:

```python
def _serializable_value(value: Any) -> Any:
    # Special handling for ByteStream to avoid oversized payloads
    if type(value).__name__ == "ByteStream":
        return {
            "type": "ByteStream",
            "mime_type": getattr(value, "mime_type", None),
            "size_bytes": len(getattr(value, "data", b"")),
            "meta": getattr(value, "meta", {}),
            # Optional: small preview for text content
            "preview": _get_text_preview(value, max_bytes=100),
        }
    
    if isinstance(value, list):
        return [_serializable_value(v) for v in value]

    if isinstance(value, dict):
        return {k: _serializable_value(v) for k, v in value.items()}

    if getattr(value, "to_dict", None):
        return _serializable_value(value.to_dict())

    return value


def _get_text_preview(bytestream: Any, max_bytes: int = 100) -> Optional[str]:
    """Get a small preview of ByteStream data if it's text-like."""
    try:
        mime_type = getattr(bytestream, "mime_type", "")
        if mime_type and mime_type.startswith("text/"):
            data = getattr(bytestream, "data", b"")
            preview = data[:max_bytes].decode("utf-8", errors="ignore")
            return preview + "..." if len(data) > max_bytes else preview
    except Exception:
        pass
    return None
```

### Alternative: Add `to_trace_dict()` Method

Add a tracing-specific serialization protocol:

```python
def _serializable_value(value: Any) -> Any:
    # ... existing code ...
    
    # Check for trace-specific serialization first
    if getattr(value, "to_trace_dict", None):
        return _serializable_value(value.to_trace_dict())
    
    if getattr(value, "to_dict", None):
        return _serializable_value(value.to_dict())
    
    # ... rest of code ...
```

Then `ByteStream` can implement `to_trace_dict()` that returns a lightweight summary.

---

## Impact

**Affected Users:**
- Anyone using multimodal pipelines with tracing enabled
- Vision/audio/video processing applications
- RAG systems that index images/PDFs with media

**Severity:** High - This can make tracing completely unusable for multimodal applications

**Workaround:** Users must currently implement custom serializers or monkey-patch Haystack's tracing code

---

## Environment
- Haystack version: 2.x
- Tracing backend: Langfuse, OpenTelemetry (affects all backends)
- Python version: 3.9+

---

## Additional Context

This issue is particularly problematic because:
1. ByteStream is the recommended way to handle media in Haystack 2.x
2. Multimodal LLMs are becoming increasingly common
3. The serialization happens automatically and silently - users may not realize why tracing is failing
4. The fix is straightforward and backward compatible

Related: The same issue could potentially affect other large data structures in the future (embeddings, large documents, etc.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tracing] ByteStream objects cause oversized payloads in tracing backends #10063

Description

Problem

Root Cause

Example Scenario

Proposed Solution

Alternative: Add `to_trace_dict()` Method

Impact

Environment

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tracing] ByteStream objects cause oversized payloads in tracing backends #10063

Description

Description

Problem

Root Cause

Example Scenario

Proposed Solution

Alternative: Add to_trace_dict() Method

Impact

Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Alternative: Add `to_trace_dict()` Method