Skip to content

[Go SDK] Multimodal helpers missing: audio and generic file inputs not implemented #440

@santoshkumarradha

Description

@santoshkumarradha

Summary

The Go SDK AI package lacks WithAudioFile, WithAudioURL, and generic WithFile multimodal helper options that exist in the Python SDK, forcing Go agents to hand-build content parts for audio-capable models.

Context

sdk/go/ai/ exposes WithImageFile and WithImageURL but has no equivalent for audio inputs or generic file types. The Python SDK's multimodal.py supports audio and generic file content parts, enabling agents to work with Anthropic, Gemini, and OpenAI audio models using the same ergonomic option style. Go agents targeting these models must manually construct ContentPart slices, bypassing the SDK's abstraction layer and duplicating provider-specific encoding logic. This parity gap will grow as multimodal models expand.

Scope

In Scope

  • Add WithAudioFile(path string, mediaType string) RequestOption that reads a local audio file, base64-encodes it, and appends the appropriate content part.
  • Add WithAudioURL(url string, mediaType string) RequestOption for URL-referenced audio.
  • Add WithFile(path string, mediaType string) RequestOption as a generic file content part helper.
  • Ensure the content part format is compatible with at least the OpenAI and Anthropic provider schemas already supported by the image helpers.

Out of Scope

  • Implementing audio transcription or output generation — input content parts only.
  • Adding video content part helpers — a separate follow-up.
  • Changing the existing WithImageFile / WithImageURL behavior.

Files

  • sdk/go/ai/multimodal.go — add WithAudioFile, WithAudioURL, WithFile option functions
  • sdk/go/ai/request.go — extend ContentPart type / provider serialization to handle audio and file part types if not already present
  • sdk/go/ai/multimodal_test.go — unit tests: each helper correctly encodes content, attaches correct media type, produces valid provider-specific JSON

Acceptance Criteria

  • WithAudioFile reads a local audio file and attaches it as a base64-encoded content part with the specified media type
  • WithAudioURL attaches a URL-referenced audio content part
  • WithFile attaches a generic file content part (base64-encoded)
  • All three helpers produce request bodies that are accepted by at least OpenAI and Anthropic audio-capable endpoints (validate against provider schemas or mock round-trip)
  • Tests pass (go test ./sdk/go/...)
  • Linting passes (make lint)

Notes for Contributors

Severity: MEDIUM

Use sdk/python/agentfield/ai/multimodal.py as the reference implementation. Check the existing WithImageFile implementation for the base64-encode + content-part-append pattern — reuse it rather than duplicating. Media type should be passed explicitly by the caller (do not try to detect from file extension) to keep the helper simple and avoid magic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:aiAI/LLM integrationenhancementNew feature or requestsdk:goGo SDK related

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions