Release v3.4.0 with image variant detection and docs

dbccccccc · dbccccccc · commit 1030232175d3 · 2025-10-28T19:48:55.000+08:00
Major update to v3.4.0: adds automatic Docker image variant detection, runtime capabilities API, speed adjustment, and real format conversion. README and documentation are rewritten to reflect new features, including improved error handling and usage guides. Slim and full Docker images are now clearly differentiated, and all web pages have a cleaner interface with the footer removed. Version numbers updated throughout, and changelog details all new features and fixes.
diff --git a/.github/workflows/docker-build-slim.yml b/.github/workflows/docker-build-slim.yml
@@ -105,7 +105,7 @@ jobs:
             ${{ env.REGISTRY_GHCR }}/${{ env.IMAGE_NAME }}
           tags: |
             type=semver,pattern=v{{version}},suffix=-slim
-            type=semver,pattern=v{{major}}.{{minor}},suffix=-slim,enable=${{ !contains(github.ref, 'alpha') && !contains(github.ref, 'beta') }}
+            type=raw,value=slim,enable=${{ !contains(github.ref, 'alpha') && !contains(github.ref, 'beta') }}
           labels: |
             org.opencontainers.image.source=${{ github.repositoryUrl }}
             org.opencontainers.image.description=Free TTS API server compatible with OpenAI's TTS API format using openai.fm (slim variant without ffmpeg)
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,48 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [3.4.0] - 2025-10-28
+
+### Added
+- **Image variant detection system**: Automatic detection of full vs slim Docker images
+  - New `ttsfm/capabilities.py` module with `SystemCapabilities` class
+  - Runtime detection of ffmpeg availability
+  - Global singleton instance for efficient capability checking
+- **New API endpoints**:
+  - `/api/capabilities` - Complete system capabilities report
+  - Enhanced `/api/health` endpoint with image variant information
+- **Comprehensive format support**: 6 audio formats with real ffmpeg-based conversion
+  - Always available: MP3, WAV
+  - Full image only: OPUS, AAC, FLAC, PCM
+- **Speed adjustment**: 0.25x to 4.0x playback speed (requires ffmpeg)
+- **Enhanced error handling**: Clear error messages with helpful hints
+- **Improved web documentation**: Complete rewrite with v3.4.0 features
+  - Docker image variants section
+  - OpenAI-compatible API documentation
+  - System capabilities documentation
+  - Speed adjustment guide
+  - Format conversion details
+  - Long text handling
+  - Python package examples
+  - WebSocket streaming
+  - Error handling reference
+
+### Fixed
+- Slim image error handling: Proper error reporting instead of silent failures
+- RuntimeError exception handling in web API
+- Footer removed from all web pages for cleaner interface
+
+### Changed
+- Improved error response format with structured messages
+- Updated README.md with v3.4.0 features and examples
+- Playground UI enhancements for feature detection
+- Documentation reorganized for better clarity
+
+### Technical
+- Capabilities detection uses singleton pattern
+- Early validation prevents expensive operations
+- All tests passing (25 unit tests + integration tests)
+
 ## [3.4.0-beta1] - 2025-10-28
 
 ### Added
diff --git a/README.md b/README.md
@@ -13,7 +13,29 @@
 
 ## Overview
 
-TTSFM is a free, OpenAI-compatible text-to-speech stack powered by the openai.fm backend. It ships with Python clients, a REST API, and a web playground.
+TTSFM is a free, OpenAI-compatible text-to-speech API service that provides a complete solution for converting text to natural-sounding speech based on OpenAI's GPT-4o mini TTS. Built on top of the openai.fm backend, it offers a powerful Python SDK, RESTful API endpoints, and an intuitive web playground for easy testing and integration.
+
+**What TTSFM Can Do:**
+- 🎤 **Multiple Voices**: Choose from 6 high-quality voices (alloy, echo, fable, onyx, nova, shimmer)
+- 🎵 **Flexible Audio Formats**: Support for 6 audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)
+- ⚡ **Speed Control**: Adjust playback speed from 0.25x to 4.0x for different use cases
+- 📝 **Long Text Support**: Automatic text splitting and audio combining for content of any length
+- 🔄 **Real-time Streaming**: WebSocket support for streaming audio generation
+- 🐍 **Python SDK**: Easy-to-use synchronous and asynchronous clients
+- 🌐 **Web Playground**: Interactive web interface for testing and experimentation
+- 🐳 **Docker Ready**: Pre-built Docker images for instant deployment
+- 🔍 **Smart Detection**: Automatic capability detection and helpful error messages
+- 🤖 **OpenAI Compatible**: Drop-in replacement for OpenAI's TTS API
+
+**Key Features in v3.4.0:**
+- 🎯 Image variant detection (full vs slim Docker images)
+- 🔍 Runtime capabilities API for feature availability checking
+- ⚡ Speed adjustment with ffmpeg-based audio processing
+- 🎵 Real format conversion for all 6 audio formats
+- 📊 Enhanced error handling with clear, actionable messages
+- 🐳 Dual Docker images optimized for different use cases
+
+> **⚠️ Disclaimer**: This project is intended for **educational and research purposes only**. It is a reverse-engineered implementation of the openai.fm service and should not be used for commercial purposes or in production environments. Users are responsible for ensuring compliance with applicable laws and terms of service.
 
 ## Installation
 
@@ -33,24 +55,32 @@ TTSFM offers two Docker image variants to suit different needs:
 docker run -p 8000:8000 dbcccc/ttsfm:latest
 ```
 
-Includes ffmpeg for advanced features:
-- ✅ MP3 auto-combine for long text
+**Includes ffmpeg for advanced features:**
+- ✅ All 6 audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)
 - ✅ Speed adjustment (0.25x - 4.0x)
-- ✅ Additional audio formats (AAC, FLAC, OPUS)
+- ✅ Format conversion with ffmpeg
+- ✅ MP3 auto-combine for long text
+- ✅ WAV auto-combine for long text
 
-#### Slim variant
+#### Slim variant - ~100MB
 ```bash
-docker run -p 8000:8000 dbcccc/ttsfm:v3.4.0-alpha1-slim
+docker run -p 8000:8000 dbcccc/ttsfm:slim
 ```
 
-Minimal image without ffmpeg:
-- ✅ Basic TTS (MP3/WAV)
-- ✅ WAV auto-combine (simple concatenation)
-- ❌ No MP3 auto-combine
+**Minimal image without ffmpeg:**
+- ✅ Basic TTS functionality
+- ✅ 2 audio formats (MP3, WAV only)
+- ✅ WAV auto-combine for long text
 - ❌ No speed adjustment
 - ❌ No format conversion
+- ❌ No MP3 auto-combine
 
-The container exposes the web playground at `http://localhost:8000` and an OpenAI-style endpoint at `/v1/audio/speech`.
+The container exposes the web playground at `http://localhost:8000` and an OpenAI-compatible endpoint at `/v1/audio/speech`.
+
+**Check available features:**
+```bash
+curl http://localhost:8000/api/capabilities
+```
 
 ## Quick start
 
@@ -85,12 +115,35 @@ response.save_to_file("fast")  # -> fast.mp3
 ttsfm "Hello, world" --voice nova --format mp3 --output hello.mp3
 ```
 
-### REST API
+### REST API (OpenAI-compatible)
 
 ```bash
-curl -X POST http://localhost:8000/v1/audio/speech   -H "Content-Type: application/json"   -d '{"model":"gpt-4o-mini-tts","input":"Hello world!","voice":"alloy"}'   --output speech.mp3
+# Basic request
+curl -X POST http://localhost:8000/v1/audio/speech \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "tts-1",
+    "input": "Hello world!",
+    "voice": "alloy",
+    "response_format": "mp3"
+  }' --output speech.mp3
+
+# With speed adjustment (requires full image)
+curl -X POST http://localhost:8000/v1/audio/speech \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "tts-1",
+    "input": "Hello world!",
+    "voice": "alloy",
+    "response_format": "mp3",
+    "speed": 1.5
+  }' --output speech_fast.mp3
 ```
 
+**Available voices:** alloy, echo, fable, onyx, nova, shimmer
+**Available formats:** mp3, wav (always) + opus, aac, flac, pcm (full image only)
+**Speed range:** 0.25 - 4.0 (requires full image)
+
 ## Learn more
 
 - Browse the full API reference and operational notes in the [web documentation](http://localhost:8000/docs) (or see `ttsfm-web/templates/docs.html`).
diff --git a/README.zh.md b/README.zh.md
@@ -13,7 +13,29 @@
 
 ## 概述
 
-TTSFM 是一个免费的 OpenAI 兼容文本转语音解决方案，基于 openai.fm 后端，同时提供 Python 客户端、REST API 与网页端 Playground。
+TTSFM 是一个免费的、兼容 OpenAI 的文本转语音 API 服务，提供将文本转换为自然语音的完整解决方案，使用OpenAI的GPT-4o mini TTS。基于 openai.fm 后端构建，提供强大的 Python SDK、RESTful API 接口以及直观的网页 Playground，方便测试和集成。
+
+**TTSFM 的功能：**
+- 🎤 **多种语音选择**：6 种高质量语音（alloy、echo、fable、onyx、nova、shimmer）
+- 🎵 **灵活的音频格式**：支持 6 种音频格式（MP3、WAV、OPUS、AAC、FLAC、PCM）
+- ⚡ **语速控制**：0.25x 到 4.0x 的播放速度调节，适应不同使用场景
+- 📝 **长文本支持**：自动文本分割和音频合并，支持任意长度内容
+- 🔄 **实时流式传输**：WebSocket 支持流式音频生成
+- 🐍 **Python SDK**：易用的同步和异步客户端
+- 🌐 **网页 Playground**：交互式网页界面，方便测试和实验
+- 🐳 **Docker 就绪**：预构建的 Docker 镜像，即刻部署
+- 🔍 **智能检测**：自动功能检测和友好的错误提示
+- 🤖 **OpenAI 兼容**：可直接替代 OpenAI 的 TTS API
+
+**v3.4.0 版本的主要特性：**
+- 🎯 镜像变体检测（完整版 vs 精简版 Docker 镜像）
+- 🔍 运行时功能 API，检查特性可用性
+- ⚡ 基于 ffmpeg 的语速调节
+- 🎵 所有 6 种音频格式的真实格式转换
+- 📊 增强的错误处理，提供清晰、可操作的错误信息
+- 🐳 针对不同使用场景优化的双镜像版本
+
+> **⚠️ 免责声明**：本项目仅用于**学习和研究目的**。这是对 openai.fm 服务的逆向工程实现，不应用于商业用途或生产环境。用户需自行确保遵守适用的法律法规和服务条款。
 
 ## 安装
 
@@ -33,25 +55,33 @@ TTSFM 提供两种 Docker 镜像变体以满足不同需求：
 docker run -p 8000:8000 dbcccc/ttsfm:latest
 ```
 
-包含 ffmpeg，支持高级功能：
-- ✅ 长文本 MP3 自动合并
+**包含 ffmpeg，支持高级功能：**
+- ✅ 所有 6 种音频格式（MP3、WAV、OPUS、AAC、FLAC、PCM）
 - ✅ 语速调节（0.25x - 4.0x）
-- ✅ 额外音频格式（AAC、FLAC、OPUS）
+- ✅ 使用 ffmpeg 进行格式转换
+- ✅ 长文本 MP3 自动合并
+- ✅ 长文本 WAV 自动合并
 
 #### 精简版
 ```bash
-docker run -p 8000:8000 dbcccc/ttsfm:v3.4.0-alpha1-slim
+docker run -p 8000:8000 dbcccc/ttsfm:slim
 ```
 
-不含 ffmpeg 的最小化镜像：
-- ✅ 基础 TTS（MP3/WAV）
-- ✅ WAV 自动合并（简单拼接）
-- ❌ 不支持 MP3 自动合并
+**不含 ffmpeg 的最小化镜像：**
+- ✅ 基础 TTS 功能
+- ✅ 2 种音频格式（仅 MP3、WAV）
+- ✅ 长文本 WAV 自动合并
 - ❌ 不支持语速调节
 - ❌ 不支持格式转换
+- ❌ 不支持 MP3 自动合并
 
 容器默认开放网页 Playground（`http://localhost:8000`）以及兼容 OpenAI 的 `/v1/audio/speech` 接口。
 
+**检查可用功能：**
+```bash
+curl http://localhost:8000/api/capabilities
+```
+
 ## 快速开始
 
 ### Python 客户端
@@ -85,12 +115,35 @@ response.save_to_file("fast")  # -> fast.mp3
 ttsfm "你好，世界" --voice nova --format mp3 --output hello.mp3
 ```
 
-### REST API
+### REST API（兼容 OpenAI）
 
 ```bash
-curl -X POST http://localhost:8000/v1/audio/speech   -H "Content-Type: application/json"   -d '{"model":"gpt-4o-mini-tts","input":"你好，世界","voice":"alloy"}'   --output speech.mp3
+# 基础请求
+curl -X POST http://localhost:8000/v1/audio/speech \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "tts-1",
+    "input": "你好，世界",
+    "voice": "alloy",
+    "response_format": "mp3"
+  }' --output speech.mp3
+
+# 使用语速调节（需要完整版镜像）
+curl -X POST http://localhost:8000/v1/audio/speech \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "tts-1",
+    "input": "你好，世界",
+    "voice": "alloy",
+    "response_format": "mp3",
+    "speed": 1.5
+  }' --output speech_fast.mp3
 ```
 
+**可用语音：** alloy、echo、fable、onyx、nova、shimmer
+**可用格式：** mp3、wav（始终可用）+ opus、aac、flac、pcm（仅完整版镜像）
+**语速范围：** 0.25 - 4.0（需要完整版镜像）
+
 ## 了解更多
 
 - 在 [Web 文档](http://localhost:8000/docs)（或 `ttsfm-web/templates/docs.html`）查看完整接口说明与运行注意事项。
diff --git a/pyproject.toml b/pyproject.toml
@@ -86,7 +86,7 @@ ttsfm = "ttsfm.cli:main"
 version_scheme = "no-guess-dev"
 local_scheme = "no-local-version"
 
-fallback_version = "3.4.0-beta1"
+fallback_version = "3.4.0"
 [tool.setuptools]
 packages = ["ttsfm"]
 
diff --git a/ttsfm-web/app.py b/ttsfm-web/app.py
@@ -499,7 +499,7 @@ def get_status():
             {
                 "status": "online",
                 "tts_service": "openai.fm (free)",
-                "package_version": "3.4.0b1",
+                "package_version": "3.4.0",
                 "timestamp": datetime.now().isoformat(),
             }
         )
@@ -526,7 +526,7 @@ def health_check():
     return jsonify(
         {
             "status": "healthy",
-            "package_version": "3.4.0b1",
+            "package_version": "3.4.0",
             "image_variant": caps.get_capabilities()["image_variant"],
             "ffmpeg_available": caps.ffmpeg_available,
             "timestamp": datetime.now().isoformat(),
diff --git a/ttsfm-web/templates/base.html b/ttsfm-web/templates/base.html
@@ -88,7 +88,7 @@
             <a class="navbar-brand" href="{{ url_for('index') }}">
                 <i class="fas fa-microphone-alt me-2"></i>
                 <span class="fw-bold">TTSFM</span>
-                <span class="badge bg-primary ms-2 small">v3.4.0-beta1</span>
+                <span class="badge bg-primary ms-2 small">v3.4.0</span>
             </a>
 
             <button class="navbar-toggler border-0" type="button" data-bs-toggle="collapse" data-bs-target="#navbarNav" aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation">
@@ -147,32 +147,10 @@
     </nav>
 
     <!-- Main Content -->
-    <main id="main-content" style="padding-top: 76px;">
+    <main id="main-content" style="padding-top: 76px; min-height: calc(100vh - 76px);">
         {% block content %}{% endblock %}
     </main>
 
-    <!-- Simplified Footer -->
-    <footer class="footer py-3" style="background-color: #f9fafb; border-top: 1px solid #e5e7eb;" role="contentinfo">
-        <div class="container">
-            <div class="row align-items-center">
-                <div class="col-md-6">
-                    <div class="d-flex align-items-center">
-                        <i class="fas fa-microphone-alt me-2 text-primary"></i>
-                        <strong class="text-dark">TTSFM</strong>
-                        <span class="ms-2 text-muted">v3.4.0-beta1</span>
-                    </div>
-                </div>
-                <div class="col-md-6 text-md-end">
-                    <small class="text-muted">
-                        {{ _('home.footer_copyright') }} •
-                        <a href="{{ url_for('docs') }}" class="text-decoration-none text-muted">{{ _('nav.documentation') }}</a> •
-                        <a href="https://github.com/dbccccccc/ttsfm" class="text-decoration-none text-muted" target="_blank">{{ _('nav.github') }}</a>
-                    </small>
-                </div>
-            </div>
-        </div>
-    </footer>
-
     <!-- Bootstrap JS -->
     <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js"></script>
 
diff --git a/ttsfm-web/templates/docs.html b/ttsfm-web/templates/docs.html
diff --git a/ttsfm/__init__.py b/ttsfm/__init__.py

Original file line number	Diff line number	Diff line change
`@@ -499,7 +499,7 @@ def get_status():`
`499`	`499`	`{`
`500`	`500`	`"status": "online",`
`501`	`501`	`"tts_service": "openai.fm (free)",`
`502`		`- "package_version": "3.4.0b1",`
	`502`	`+ "package_version": "3.4.0",`
`503`	`503`	`"timestamp": datetime.now().isoformat(),`
`504`	`504`	`}`
`505`	`505`	`)`
`@@ -526,7 +526,7 @@ def health_check():`
`526`	`526`	`return jsonify(`
`527`	`527`	`{`
`528`	`528`	`"status": "healthy",`
`529`		`- "package_version": "3.4.0b1",`
	`529`	`+ "package_version": "3.4.0",`
`530`	`530`	`"image_variant": caps.get_capabilities()["image_variant"],`
`531`	`531`	`"ffmpeg_available": caps.ffmpeg_available,`
`532`	`532`	`"timestamp": datetime.now().isoformat(),`