Skip to content

Commit 1030232

Browse files
committed
Release v3.4.0 with image variant detection and docs
Major update to v3.4.0: adds automatic Docker image variant detection, runtime capabilities API, speed adjustment, and real format conversion. README and documentation are rewritten to reflect new features, including improved error handling and usage guides. Slim and full Docker images are now clearly differentiated, and all web pages have a cleaner interface with the footer removed. Version numbers updated throughout, and changelog details all new features and fixes.
1 parent dd59d6b commit 1030232

File tree

9 files changed

+593
-526
lines changed

9 files changed

+593
-526
lines changed

.github/workflows/docker-build-slim.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ jobs:
105105
${{ env.REGISTRY_GHCR }}/${{ env.IMAGE_NAME }}
106106
tags: |
107107
type=semver,pattern=v{{version}},suffix=-slim
108-
type=semver,pattern=v{{major}}.{{minor}},suffix=-slim,enable=${{ !contains(github.ref, 'alpha') && !contains(github.ref, 'beta') }}
108+
type=raw,value=slim,enable=${{ !contains(github.ref, 'alpha') && !contains(github.ref, 'beta') }}
109109
labels: |
110110
org.opencontainers.image.source=${{ github.repositoryUrl }}
111111
org.opencontainers.image.description=Free TTS API server compatible with OpenAI's TTS API format using openai.fm (slim variant without ffmpeg)

CHANGELOG.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,48 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [3.4.0] - 2025-10-28
9+
10+
### Added
11+
- **Image variant detection system**: Automatic detection of full vs slim Docker images
12+
- New `ttsfm/capabilities.py` module with `SystemCapabilities` class
13+
- Runtime detection of ffmpeg availability
14+
- Global singleton instance for efficient capability checking
15+
- **New API endpoints**:
16+
- `/api/capabilities` - Complete system capabilities report
17+
- Enhanced `/api/health` endpoint with image variant information
18+
- **Comprehensive format support**: 6 audio formats with real ffmpeg-based conversion
19+
- Always available: MP3, WAV
20+
- Full image only: OPUS, AAC, FLAC, PCM
21+
- **Speed adjustment**: 0.25x to 4.0x playback speed (requires ffmpeg)
22+
- **Enhanced error handling**: Clear error messages with helpful hints
23+
- **Improved web documentation**: Complete rewrite with v3.4.0 features
24+
- Docker image variants section
25+
- OpenAI-compatible API documentation
26+
- System capabilities documentation
27+
- Speed adjustment guide
28+
- Format conversion details
29+
- Long text handling
30+
- Python package examples
31+
- WebSocket streaming
32+
- Error handling reference
33+
34+
### Fixed
35+
- Slim image error handling: Proper error reporting instead of silent failures
36+
- RuntimeError exception handling in web API
37+
- Footer removed from all web pages for cleaner interface
38+
39+
### Changed
40+
- Improved error response format with structured messages
41+
- Updated README.md with v3.4.0 features and examples
42+
- Playground UI enhancements for feature detection
43+
- Documentation reorganized for better clarity
44+
45+
### Technical
46+
- Capabilities detection uses singleton pattern
47+
- Early validation prevents expensive operations
48+
- All tests passing (25 unit tests + integration tests)
49+
850
## [3.4.0-beta1] - 2025-10-28
951

1052
### Added

README.md

Lines changed: 66 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,29 @@
1313

1414
## Overview
1515

16-
TTSFM is a free, OpenAI-compatible text-to-speech stack powered by the openai.fm backend. It ships with Python clients, a REST API, and a web playground.
16+
TTSFM is a free, OpenAI-compatible text-to-speech API service that provides a complete solution for converting text to natural-sounding speech based on OpenAI's GPT-4o mini TTS. Built on top of the openai.fm backend, it offers a powerful Python SDK, RESTful API endpoints, and an intuitive web playground for easy testing and integration.
17+
18+
**What TTSFM Can Do:**
19+
- 🎤 **Multiple Voices**: Choose from 6 high-quality voices (alloy, echo, fable, onyx, nova, shimmer)
20+
- 🎵 **Flexible Audio Formats**: Support for 6 audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)
21+
-**Speed Control**: Adjust playback speed from 0.25x to 4.0x for different use cases
22+
- 📝 **Long Text Support**: Automatic text splitting and audio combining for content of any length
23+
- 🔄 **Real-time Streaming**: WebSocket support for streaming audio generation
24+
- 🐍 **Python SDK**: Easy-to-use synchronous and asynchronous clients
25+
- 🌐 **Web Playground**: Interactive web interface for testing and experimentation
26+
- 🐳 **Docker Ready**: Pre-built Docker images for instant deployment
27+
- 🔍 **Smart Detection**: Automatic capability detection and helpful error messages
28+
- 🤖 **OpenAI Compatible**: Drop-in replacement for OpenAI's TTS API
29+
30+
**Key Features in v3.4.0:**
31+
- 🎯 Image variant detection (full vs slim Docker images)
32+
- 🔍 Runtime capabilities API for feature availability checking
33+
- ⚡ Speed adjustment with ffmpeg-based audio processing
34+
- 🎵 Real format conversion for all 6 audio formats
35+
- 📊 Enhanced error handling with clear, actionable messages
36+
- 🐳 Dual Docker images optimized for different use cases
37+
38+
> **⚠️ Disclaimer**: This project is intended for **educational and research purposes only**. It is a reverse-engineered implementation of the openai.fm service and should not be used for commercial purposes or in production environments. Users are responsible for ensuring compliance with applicable laws and terms of service.
1739
1840
## Installation
1941

@@ -33,24 +55,32 @@ TTSFM offers two Docker image variants to suit different needs:
3355
docker run -p 8000:8000 dbcccc/ttsfm:latest
3456
```
3557

36-
Includes ffmpeg for advanced features:
37-
- ✅ MP3 auto-combine for long text
58+
**Includes ffmpeg for advanced features:**
59+
-All 6 audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)
3860
- ✅ Speed adjustment (0.25x - 4.0x)
39-
- ✅ Additional audio formats (AAC, FLAC, OPUS)
61+
- ✅ Format conversion with ffmpeg
62+
- ✅ MP3 auto-combine for long text
63+
- ✅ WAV auto-combine for long text
4064

41-
#### Slim variant
65+
#### Slim variant - ~100MB
4266
```bash
43-
docker run -p 8000:8000 dbcccc/ttsfm:v3.4.0-alpha1-slim
67+
docker run -p 8000:8000 dbcccc/ttsfm:slim
4468
```
4569

46-
Minimal image without ffmpeg:
47-
- ✅ Basic TTS (MP3/WAV)
48-
-WAV auto-combine (simple concatenation)
49-
- ❌ No MP3 auto-combine
70+
**Minimal image without ffmpeg:**
71+
- ✅ Basic TTS functionality
72+
-2 audio formats (MP3, WAV only)
73+
- ✅ WAV auto-combine for long text
5074
- ❌ No speed adjustment
5175
- ❌ No format conversion
76+
- ❌ No MP3 auto-combine
5277

53-
The container exposes the web playground at `http://localhost:8000` and an OpenAI-style endpoint at `/v1/audio/speech`.
78+
The container exposes the web playground at `http://localhost:8000` and an OpenAI-compatible endpoint at `/v1/audio/speech`.
79+
80+
**Check available features:**
81+
```bash
82+
curl http://localhost:8000/api/capabilities
83+
```
5484

5585
## Quick start
5686

@@ -85,12 +115,35 @@ response.save_to_file("fast") # -> fast.mp3
85115
ttsfm "Hello, world" --voice nova --format mp3 --output hello.mp3
86116
```
87117

88-
### REST API
118+
### REST API (OpenAI-compatible)
89119

90120
```bash
91-
curl -X POST http://localhost:8000/v1/audio/speech -H "Content-Type: application/json" -d '{"model":"gpt-4o-mini-tts","input":"Hello world!","voice":"alloy"}' --output speech.mp3
121+
# Basic request
122+
curl -X POST http://localhost:8000/v1/audio/speech \
123+
-H "Content-Type: application/json" \
124+
-d '{
125+
"model": "tts-1",
126+
"input": "Hello world!",
127+
"voice": "alloy",
128+
"response_format": "mp3"
129+
}' --output speech.mp3
130+
131+
# With speed adjustment (requires full image)
132+
curl -X POST http://localhost:8000/v1/audio/speech \
133+
-H "Content-Type: application/json" \
134+
-d '{
135+
"model": "tts-1",
136+
"input": "Hello world!",
137+
"voice": "alloy",
138+
"response_format": "mp3",
139+
"speed": 1.5
140+
}' --output speech_fast.mp3
92141
```
93142

143+
**Available voices:** alloy, echo, fable, onyx, nova, shimmer
144+
**Available formats:** mp3, wav (always) + opus, aac, flac, pcm (full image only)
145+
**Speed range:** 0.25 - 4.0 (requires full image)
146+
94147
## Learn more
95148

96149
- Browse the full API reference and operational notes in the [web documentation](http://localhost:8000/docs) (or see `ttsfm-web/templates/docs.html`).

README.zh.md

Lines changed: 64 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,29 @@
1313

1414
## 概述
1515

16-
TTSFM 是一个免费的 OpenAI 兼容文本转语音解决方案,基于 openai.fm 后端,同时提供 Python 客户端、REST API 与网页端 Playground。
16+
TTSFM 是一个免费的、兼容 OpenAI 的文本转语音 API 服务,提供将文本转换为自然语音的完整解决方案,使用OpenAI的GPT-4o mini TTS。基于 openai.fm 后端构建,提供强大的 Python SDK、RESTful API 接口以及直观的网页 Playground,方便测试和集成。
17+
18+
**TTSFM 的功能:**
19+
- 🎤 **多种语音选择**:6 种高质量语音(alloy、echo、fable、onyx、nova、shimmer)
20+
- 🎵 **灵活的音频格式**:支持 6 种音频格式(MP3、WAV、OPUS、AAC、FLAC、PCM)
21+
-**语速控制**:0.25x 到 4.0x 的播放速度调节,适应不同使用场景
22+
- 📝 **长文本支持**:自动文本分割和音频合并,支持任意长度内容
23+
- 🔄 **实时流式传输**:WebSocket 支持流式音频生成
24+
- 🐍 **Python SDK**:易用的同步和异步客户端
25+
- 🌐 **网页 Playground**:交互式网页界面,方便测试和实验
26+
- 🐳 **Docker 就绪**:预构建的 Docker 镜像,即刻部署
27+
- 🔍 **智能检测**:自动功能检测和友好的错误提示
28+
- 🤖 **OpenAI 兼容**:可直接替代 OpenAI 的 TTS API
29+
30+
**v3.4.0 版本的主要特性:**
31+
- 🎯 镜像变体检测(完整版 vs 精简版 Docker 镜像)
32+
- 🔍 运行时功能 API,检查特性可用性
33+
- ⚡ 基于 ffmpeg 的语速调节
34+
- 🎵 所有 6 种音频格式的真实格式转换
35+
- 📊 增强的错误处理,提供清晰、可操作的错误信息
36+
- 🐳 针对不同使用场景优化的双镜像版本
37+
38+
> **⚠️ 免责声明**:本项目仅用于**学习和研究目的**。这是对 openai.fm 服务的逆向工程实现,不应用于商业用途或生产环境。用户需自行确保遵守适用的法律法规和服务条款。
1739
1840
## 安装
1941

@@ -33,25 +55,33 @@ TTSFM 提供两种 Docker 镜像变体以满足不同需求:
3355
docker run -p 8000:8000 dbcccc/ttsfm:latest
3456
```
3557

36-
包含 ffmpeg,支持高级功能:
37-
-长文本 MP3 自动合并
58+
**包含 ffmpeg,支持高级功能:**
59+
-所有 6 种音频格式(MP3、WAV、OPUS、AAC、FLAC、PCM)
3860
- ✅ 语速调节(0.25x - 4.0x)
39-
- ✅ 额外音频格式(AAC、FLAC、OPUS)
61+
- ✅ 使用 ffmpeg 进行格式转换
62+
- ✅ 长文本 MP3 自动合并
63+
- ✅ 长文本 WAV 自动合并
4064

4165
#### 精简版
4266
```bash
43-
docker run -p 8000:8000 dbcccc/ttsfm:v3.4.0-alpha1-slim
67+
docker run -p 8000:8000 dbcccc/ttsfm:slim
4468
```
4569

46-
不含 ffmpeg 的最小化镜像:
47-
- ✅ 基础 TTS(MP3/WAV)
48-
-WAV 自动合并(简单拼接
49-
- ❌ 不支持 MP3 自动合并
70+
**不含 ffmpeg 的最小化镜像:**
71+
- ✅ 基础 TTS 功能
72+
-2 种音频格式(仅 MP3、WAV
73+
- ✅ 长文本 WAV 自动合并
5074
- ❌ 不支持语速调节
5175
- ❌ 不支持格式转换
76+
- ❌ 不支持 MP3 自动合并
5277

5378
容器默认开放网页 Playground(`http://localhost:8000`)以及兼容 OpenAI 的 `/v1/audio/speech` 接口。
5479

80+
**检查可用功能:**
81+
```bash
82+
curl http://localhost:8000/api/capabilities
83+
```
84+
5585
## 快速开始
5686

5787
### Python 客户端
@@ -85,12 +115,35 @@ response.save_to_file("fast") # -> fast.mp3
85115
ttsfm "你好,世界" --voice nova --format mp3 --output hello.mp3
86116
```
87117

88-
### REST API
118+
### REST API(兼容 OpenAI)
89119

90120
```bash
91-
curl -X POST http://localhost:8000/v1/audio/speech -H "Content-Type: application/json" -d '{"model":"gpt-4o-mini-tts","input":"你好,世界","voice":"alloy"}' --output speech.mp3
121+
# 基础请求
122+
curl -X POST http://localhost:8000/v1/audio/speech \
123+
-H "Content-Type: application/json" \
124+
-d '{
125+
"model": "tts-1",
126+
"input": "你好,世界",
127+
"voice": "alloy",
128+
"response_format": "mp3"
129+
}' --output speech.mp3
130+
131+
# 使用语速调节(需要完整版镜像)
132+
curl -X POST http://localhost:8000/v1/audio/speech \
133+
-H "Content-Type: application/json" \
134+
-d '{
135+
"model": "tts-1",
136+
"input": "你好,世界",
137+
"voice": "alloy",
138+
"response_format": "mp3",
139+
"speed": 1.5
140+
}' --output speech_fast.mp3
92141
```
93142

143+
**可用语音:** alloy、echo、fable、onyx、nova、shimmer
144+
**可用格式:** mp3、wav(始终可用)+ opus、aac、flac、pcm(仅完整版镜像)
145+
**语速范围:** 0.25 - 4.0(需要完整版镜像)
146+
94147
## 了解更多
95148

96149
-[Web 文档](http://localhost:8000/docs)(或 `ttsfm-web/templates/docs.html`)查看完整接口说明与运行注意事项。

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ ttsfm = "ttsfm.cli:main"
8686
version_scheme = "no-guess-dev"
8787
local_scheme = "no-local-version"
8888

89-
fallback_version = "3.4.0-beta1"
89+
fallback_version = "3.4.0"
9090
[tool.setuptools]
9191
packages = ["ttsfm"]
9292

ttsfm-web/app.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -499,7 +499,7 @@ def get_status():
499499
{
500500
"status": "online",
501501
"tts_service": "openai.fm (free)",
502-
"package_version": "3.4.0b1",
502+
"package_version": "3.4.0",
503503
"timestamp": datetime.now().isoformat(),
504504
}
505505
)
@@ -526,7 +526,7 @@ def health_check():
526526
return jsonify(
527527
{
528528
"status": "healthy",
529-
"package_version": "3.4.0b1",
529+
"package_version": "3.4.0",
530530
"image_variant": caps.get_capabilities()["image_variant"],
531531
"ffmpeg_available": caps.ffmpeg_available,
532532
"timestamp": datetime.now().isoformat(),

ttsfm-web/templates/base.html

Lines changed: 2 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@
8888
<a class="navbar-brand" href="{{ url_for('index') }}">
8989
<i class="fas fa-microphone-alt me-2"></i>
9090
<span class="fw-bold">TTSFM</span>
91-
<span class="badge bg-primary ms-2 small">v3.4.0-beta1</span>
91+
<span class="badge bg-primary ms-2 small">v3.4.0</span>
9292
</a>
9393

9494
<button class="navbar-toggler border-0" type="button" data-bs-toggle="collapse" data-bs-target="#navbarNav" aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation">
@@ -147,32 +147,10 @@
147147
</nav>
148148

149149
<!-- Main Content -->
150-
<main id="main-content" style="padding-top: 76px;">
150+
<main id="main-content" style="padding-top: 76px; min-height: calc(100vh - 76px);">
151151
{% block content %}{% endblock %}
152152
</main>
153153

154-
<!-- Simplified Footer -->
155-
<footer class="footer py-3" style="background-color: #f9fafb; border-top: 1px solid #e5e7eb;" role="contentinfo">
156-
<div class="container">
157-
<div class="row align-items-center">
158-
<div class="col-md-6">
159-
<div class="d-flex align-items-center">
160-
<i class="fas fa-microphone-alt me-2 text-primary"></i>
161-
<strong class="text-dark">TTSFM</strong>
162-
<span class="ms-2 text-muted">v3.4.0-beta1</span>
163-
</div>
164-
</div>
165-
<div class="col-md-6 text-md-end">
166-
<small class="text-muted">
167-
{{ _('home.footer_copyright') }} •
168-
<a href="{{ url_for('docs') }}" class="text-decoration-none text-muted">{{ _('nav.documentation') }}</a>
169-
<a href="https://github.com/dbccccccc/ttsfm" class="text-decoration-none text-muted" target="_blank">{{ _('nav.github') }}</a>
170-
</small>
171-
</div>
172-
</div>
173-
</div>
174-
</footer>
175-
176154
<!-- Bootstrap JS -->
177155
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"></script>
178156

0 commit comments

Comments
 (0)