Skip to content

Commit dd59d6b

Browse files
committed
Add system capabilities detection and slim image support
Introduces a new capabilities module for runtime detection of ffmpeg and image variant, with API endpoints for feature discovery. Enhances error handling and validation for ffmpeg-dependent features, updates UI to reflect available features and image variant, and bumps version to 3.4.0-beta1. Improves user feedback for unavailable features and adds comprehensive tests for slim image scenarios.
1 parent f6c8c51 commit dd59d6b

File tree

7 files changed

+346
-8
lines changed

7 files changed

+346
-8
lines changed

CHANGELOG.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,56 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [3.4.0-beta1] - 2025-10-28
9+
10+
### Added
11+
- **Image variant detection system**: Automatic detection of full vs slim Docker images
12+
- New `ttsfm/capabilities.py` module with `SystemCapabilities` class
13+
- Runtime detection of ffmpeg availability using `shutil.which("ffmpeg")`
14+
- Global singleton instance via `get_capabilities()` function
15+
- **New API endpoints for feature discovery**:
16+
- `/api/capabilities` - Returns complete system capabilities report
17+
- `ffmpeg_available`: Boolean indicating ffmpeg availability
18+
- `image_variant`: "full" or "slim"
19+
- `features`: Dictionary of available features (speed_adjustment, format_conversion, mp3_auto_combine, basic_formats)
20+
- `supported_formats`: List of available audio formats
21+
- Enhanced `/api/health` endpoint with `image_variant` and `ffmpeg_available` fields
22+
- **Early validation for ffmpeg-dependent features**:
23+
- Advanced formats (OPUS, AAC, FLAC, PCM) checked before processing
24+
- Speed adjustment (speed != 1.0) validated before processing
25+
- MP3 auto-combine for long text validated before processing
26+
- Returns 400 error with helpful hints when features unavailable
27+
- **Playground UI enhancements for slim image**:
28+
- Automatic capabilities loading on page load
29+
- Image variant badge in navbar ("Full Image" green / "Slim Image" yellow)
30+
- Speed slider disabled with tooltip when ffmpeg unavailable
31+
- Advanced format options disabled and marked "(requires full image)"
32+
- Error messages include hints from API responses
33+
- **Comprehensive test scripts**:
34+
- `scripts/test_slim_image.py` - Integration tests against running server
35+
- `scripts/test_slim_simulation.py` - Unit tests with mocked ffmpeg unavailability
36+
37+
### Fixed
38+
- **Slim image error handling**: Slim image now properly reports errors instead of failing silently
39+
- Clear error messages for unavailable features
40+
- Helpful hints directing users to full Docker image
41+
- Proper HTTP 400 status codes with structured error responses
42+
- **RuntimeError exception handling**: Web API now catches ffmpeg-related errors from audio_processing module
43+
44+
### Changed
45+
- **Improved error response format**: All feature unavailability errors now include:
46+
- `message`: Clear description of the issue
47+
- `type`: "feature_unavailable_error"
48+
- `code`: "ffmpeg_required"
49+
- `hint`: Helpful suggestion to use full Docker image
50+
- `available_formats`: List of supported formats (when applicable)
51+
52+
### Technical
53+
- Capabilities detection uses singleton pattern for efficiency
54+
- Early validation prevents expensive operations when features unavailable
55+
- Playground JavaScript loads capabilities asynchronously
56+
- All 25 tests passing plus new integration and simulation tests
57+
858
## [3.4.0-alpha4] - 2025-10-28
959

1060
### Added

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ ttsfm = "ttsfm.cli:main"
8686
version_scheme = "no-guess-dev"
8787
local_scheme = "no-local-version"
8888

89-
fallback_version = "3.4.0-alpha4"
89+
fallback_version = "3.4.0-beta1"
9090
[tool.setuptools]
9191
packages = ["ttsfm"]
9292

ttsfm-web/app.py

Lines changed: 103 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
try:
4545
from ttsfm import AudioFormat, TTSClient, Voice
4646
from ttsfm.audio import combine_audio_chunks
47+
from ttsfm.capabilities import get_capabilities
4748
from ttsfm.exceptions import (
4849
APIException,
4950
AudioProcessingException,
@@ -58,6 +59,7 @@
5859
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
5960
from ttsfm import AudioFormat, TTSClient, Voice
6061
from ttsfm.audio import combine_audio_chunks
62+
from ttsfm.capabilities import get_capabilities
6163
from ttsfm.exceptions import APIException, NetworkException, ValidationException
6264
from ttsfm.utils import split_text_by_length
6365

@@ -497,7 +499,7 @@ def get_status():
497499
{
498500
"status": "online",
499501
"tts_service": "openai.fm (free)",
500-
"package_version": "3.4.0a4",
502+
"package_version": "3.4.0b1",
501503
"timestamp": datetime.now().isoformat(),
502504
}
503505
)
@@ -519,12 +521,26 @@ def get_status():
519521

520522
@app.route("/api/health", methods=["GET"])
521523
def health_check():
522-
"""Simple health check endpoint."""
524+
"""Health check endpoint with capabilities info."""
525+
caps = get_capabilities()
523526
return jsonify(
524-
{"status": "healthy", "package_version": "3.4.0a4", "timestamp": datetime.now().isoformat()}
527+
{
528+
"status": "healthy",
529+
"package_version": "3.4.0b1",
530+
"image_variant": caps.get_capabilities()["image_variant"],
531+
"ffmpeg_available": caps.ffmpeg_available,
532+
"timestamp": datetime.now().isoformat(),
533+
}
525534
)
526535

527536

537+
@app.route("/api/capabilities", methods=["GET"])
538+
def get_system_capabilities():
539+
"""Get system capabilities and available features."""
540+
caps = get_capabilities()
541+
return jsonify(caps.get_capabilities())
542+
543+
528544
@app.route("/api/websocket/status", methods=["GET"])
529545
def websocket_status():
530546
"""Get WebSocket server status and active connections."""
@@ -688,6 +704,66 @@ def openai_speech():
688704
400,
689705
)
690706

707+
# Check feature availability before processing
708+
caps = get_capabilities()
709+
710+
# Check if requested format requires ffmpeg
711+
if format_enum.value in ["opus", "aac", "flac", "pcm"] and not caps.ffmpeg_available:
712+
return (
713+
jsonify(
714+
{
715+
"error": {
716+
"message": f"Format '{format_enum.value}' requires ffmpeg. "
717+
f"Available formats: {', '.join(caps.get_supported_formats())}",
718+
"type": "feature_unavailable_error",
719+
"code": "ffmpeg_required",
720+
"available_formats": caps.get_supported_formats(),
721+
"hint": "Use the full Docker image (dbcccc/ttsfm:latest) instead of the slim variant.",
722+
}
723+
}
724+
),
725+
400,
726+
)
727+
728+
# Check if speed adjustment requires ffmpeg
729+
if speed is not None and speed != 1.0 and not caps.ffmpeg_available:
730+
return (
731+
jsonify(
732+
{
733+
"error": {
734+
"message": "Speed adjustment requires ffmpeg. "
735+
"Use the full Docker image (dbcccc/ttsfm:latest).",
736+
"type": "feature_unavailable_error",
737+
"code": "ffmpeg_required",
738+
"hint": "Speed adjustment is only available in the full Docker image.",
739+
}
740+
}
741+
),
742+
400,
743+
)
744+
745+
# Check if MP3 auto-combine requires ffmpeg (for long text)
746+
if (
747+
len(input_text) > max_length
748+
and auto_combine
749+
and format_enum == AudioFormat.MP3
750+
and not caps.ffmpeg_available
751+
):
752+
return (
753+
jsonify(
754+
{
755+
"error": {
756+
"message": "MP3 auto-combine for long text requires ffmpeg. "
757+
"Use WAV format, disable auto_combine, or use the full Docker image.",
758+
"type": "feature_unavailable_error",
759+
"code": "ffmpeg_required",
760+
"hint": "MP3 auto-combine is only available in the full Docker image.",
761+
}
762+
}
763+
),
764+
400,
765+
)
766+
691767
logger.info(
692768
"OpenAI API: Generating speech: text='%s...', voice=%s, "
693769
"requested_format=%s, auto_combine=%s, speed=%s",
@@ -899,6 +975,30 @@ def openai_speech():
899975
400,
900976
)
901977

978+
except RuntimeError as e:
979+
# Catch ffmpeg-related errors from audio_processing module
980+
error_msg = str(e)
981+
logger.error(f"OpenAI API runtime error: {error_msg}")
982+
983+
# Check if it's an ffmpeg-related error
984+
if "ffmpeg" in error_msg.lower():
985+
return (
986+
jsonify(
987+
{
988+
"error": {
989+
"message": error_msg,
990+
"type": "feature_unavailable_error",
991+
"code": "ffmpeg_required",
992+
"hint": "This feature requires the full Docker image. "
993+
"Use dbcccc/ttsfm:latest instead of the slim variant.",
994+
}
995+
}
996+
),
997+
400,
998+
)
999+
# Re-raise if not ffmpeg-related
1000+
raise
1001+
9021002
except Exception as e:
9031003
logger.error(f"OpenAI API unexpected error: {e}")
9041004
return (

ttsfm-web/static/js/playground-enhanced-fixed.js

Lines changed: 67 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ const PlaygroundApp = (() => {
1919
wsClient: null,
2020
streamingMode: false,
2121
activeStreamId: null,
22-
defaultText: ''
22+
defaultText: '',
23+
capabilities: null // System capabilities
2324
};
2425

2526
const els = {};
@@ -38,6 +39,7 @@ const PlaygroundApp = (() => {
3839
initWebSocket();
3940

4041
checkAuthStatus();
42+
loadCapabilities(); // Load system capabilities first
4143
loadVoices();
4244

4345
if (document.getElementById('format-select')) {
@@ -352,11 +354,19 @@ const PlaygroundApp = (() => {
352354

353355
if (!response.ok) {
354356
let message = `Error: ${response.status} ${response.statusText}`;
357+
let hint = null;
355358
try {
356359
const errorData = await response.json();
357360
if (errorData.error?.message) {
358361
message = errorData.error.message;
359362
}
363+
if (errorData.error?.hint) {
364+
hint = errorData.error.hint;
365+
}
366+
// Add hint to message if available
367+
if (hint) {
368+
message += `\n\n💡 ${hint}`;
369+
}
360370
} catch (error) {
361371
// ignore parse errors
362372
}
@@ -628,6 +638,62 @@ const PlaygroundApp = (() => {
628638
}
629639
}
630640

641+
async function loadCapabilities() {
642+
try {
643+
const response = await fetch('/api/capabilities');
644+
if (!response.ok) {
645+
console.warn('Failed to load capabilities, assuming full image');
646+
return;
647+
}
648+
const caps = await response.json();
649+
state.capabilities = caps;
650+
updateUIForCapabilities(caps);
651+
} catch (error) {
652+
console.error('Failed to load capabilities:', error);
653+
}
654+
}
655+
656+
function updateUIForCapabilities(caps) {
657+
if (!caps) return;
658+
659+
// Update speed slider if ffmpeg not available
660+
const speedSlider = document.getElementById('speed-slider');
661+
const speedValue = document.getElementById('speed-value');
662+
if (speedSlider && !caps.features.speed_adjustment) {
663+
speedSlider.disabled = true;
664+
speedSlider.title = 'Speed adjustment requires full Docker image';
665+
if (speedValue) {
666+
speedValue.insertAdjacentHTML('afterend',
667+
'<small class="text-warning ms-2">⚠️ Requires full image</small>');
668+
}
669+
}
670+
671+
// Filter format options based on availability
672+
const formatSelect = document.getElementById('format-select');
673+
if (formatSelect && caps.supported_formats) {
674+
Array.from(formatSelect.options).forEach(option => {
675+
if (!caps.supported_formats.includes(option.value)) {
676+
option.disabled = true;
677+
option.textContent += ' (requires full image)';
678+
}
679+
});
680+
}
681+
682+
// Show image variant badge in navbar
683+
const variant = caps.image_variant;
684+
const badgeHtml = variant === 'full'
685+
? '<span class="badge bg-success ms-2">Full Image</span>'
686+
: '<span class="badge bg-warning ms-2">Slim Image</span>';
687+
688+
const navbar = document.querySelector('.navbar-brand');
689+
if (navbar && !document.querySelector('.image-variant-badge')) {
690+
const badge = document.createElement('span');
691+
badge.className = 'image-variant-badge';
692+
badge.innerHTML = badgeHtml;
693+
navbar.appendChild(badge);
694+
}
695+
}
696+
631697
async function loadVoices({ refresh = false } = {}) {
632698
try {
633699
const data = await fetchVoices({ refresh });

ttsfm-web/templates/base.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@
8888
<a class="navbar-brand" href="{{ url_for('index') }}">
8989
<i class="fas fa-microphone-alt me-2"></i>
9090
<span class="fw-bold">TTSFM</span>
91-
<span class="badge bg-primary ms-2 small">v3.4.0-alpha4</span>
91+
<span class="badge bg-primary ms-2 small">v3.4.0-beta1</span>
9292
</a>
9393

9494
<button class="navbar-toggler border-0" type="button" data-bs-toggle="collapse" data-bs-target="#navbarNav" aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation">
@@ -159,7 +159,7 @@
159159
<div class="d-flex align-items-center">
160160
<i class="fas fa-microphone-alt me-2 text-primary"></i>
161161
<strong class="text-dark">TTSFM</strong>
162-
<span class="ms-2 text-muted">v3.4.0-alpha4</span>
162+
<span class="ms-2 text-muted">v3.4.0-beta1</span>
163163
</div>
164164
</div>
165165
<div class="col-md-6 text-md-end">

ttsfm/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@
6262
)
6363
from .utils import split_text_by_length, validate_text_length
6464

65-
__version__ = "3.4.0-alpha4"
65+
__version__ = "3.4.0-beta1"
6666
__author__ = "dbcccc"
6767
__email__ = "[email protected]"
6868
__description__ = "Text-to-Speech API Client with OpenAI compatibility"

0 commit comments

Comments
 (0)