Feature Request: Native video file input support in Responses API (parity with Google Gemini & Volcengine)

## Summary

The OpenAI Responses API currently supports images, PDFs, documents, spreadsheets, and code files as `input_file`, but **does not support video files** (mp4, webm, mov, etc.) as native input. Google Gemini and Volcengine Seed2 have supported native video input for over a year.

## The gap

**GPT-4o launch (May 2024)** explicitly stated the model accepts "any combination of text, audio, image, and video inputs." The demo showed real-time video understanding. ChatGPT's Advanced Voice Mode supports live camera input today.

**Nearly 2 years later, the API still has no video input support.**

Current `input_file` accepted types (from [docs](https://developers.openai.com/api/docs/guides/file-inputs)):
- PDF, DOCX, PPTX, XLSX, CSV ✅
- Text, code, markdown ✅
- Images (via `input_image`) ✅
- **Video: not listed, not supported** ❌
- **Audio files as `input_file`: not supported** ❌ (only via Realtime API or `input_audio` in Chat Completions)

The official cookbook recommends extracting video frames with ffmpeg and sending them as an image array — a workaround, not a solution. This loses temporal information, audio track, and motion context.

## What competitors offer

### Google Gemini (available since Gemini 1.5, mid-2024)

```python
# Upload video via File API
video_file = client.files.upload(file="video.mp4")

# Pass directly to generateContent
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[video_file, "Describe what happens in this video"]
)
```

- Native video input with audio
- Supports mp4, webm, mov, avi, etc.
- Model sees full temporal + audio information
- Up to 1 hour of video

### Volcengine Seed2

- `input_video` field in API request
- Native video understanding with audio

## Proposed API

Following the existing `input_file` pattern:

```json
{
  "role": "user",
  "content": [
    {
      "type": "input_file",
      "file_id": "file-xxxxx"
    },
    {
      "type": "input_text",
      "text": "Describe what happens in this video"
    }
  ]
}
```

Or add `video/mp4`, `video/webm`, `video/quicktime` to the accepted MIME types for `input_file`. The Files API already supports arbitrary uploads with `purpose: "user_data"`.

## Why this matters

1. **Video editing / understanding products** cannot use OpenAI as a provider for their core workflows
2. **Frame extraction workaround** loses audio, temporal context, and motion — the model is literally blind to what happens between frames
3. **Competitive gap** is widening — Gemini has had this for 1.5 years, and OpenAI's own ChatGPT product already has video understanding via Advanced Voice Mode
4. **Developer community has been asking since May 2024** with zero official response:
   - https://community.openai.com/t/does-gpt-4o-api-natively-support-video-input-like-gemini-1-5/784779
   - https://community.openai.com/t/gpt-4o-video-input-availability/1013512
   - https://community.openai.com/t/realtime-api-video-input-like-advanced-voice-mode/1082024

## Scope

At minimum:
- Accept video files (mp4, webm) via `input_file` in the Responses API
- Extract and process both visual frames and audio track natively

Stretch:
- Support in Realtime API for streaming video input
- Video file support in the Files API with `purpose: "user_data"`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Native video file input support in Responses API (parity with Google Gemini & Volcengine) #1778

Summary

The gap

What competitors offer

Google Gemini (available since Gemini 1.5, mid-2024)

Volcengine Seed2

Proposed API

Why this matters

Scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Native video file input support in Responses API (parity with Google Gemini & Volcengine) #1778

Description

Summary

The gap

What competitors offer

Google Gemini (available since Gemini 1.5, mid-2024)

Volcengine Seed2

Proposed API

Why this matters

Scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions