Skip to content

Latest commit

 

History

History
421 lines (338 loc) · 11.4 KB

File metadata and controls

421 lines (338 loc) · 11.4 KB

Language Toolkit API

A REST API for the Language Toolkit providing document processing, translation, transcription, and video creation capabilities.

Features

  • Advanced PPTX Translation: Translate PowerPoint presentations with full formatting preservation - fonts, colors, styles, typography
  • Text Translation: Translate text files using DeepL API
  • Audio Transcription: Convert audio files to text using OpenAI Whisper
  • PPTX Conversion: Convert PowerPoint files to PDF or PNG images
  • Text-to-Speech: Generate audio from text files using ElevenLabs
  • Video Merging: Combine audio and images into videos
  • Smart Downloads: Single files download directly, multiple files as ZIP
  • Individual File Downloads: Download specific files from multi-file results
  • Asynchronous Processing: Handle long-running tasks with progress tracking
  • File Size Validation: Automatic validation of upload sizes with configurable limits

Installation

  1. Install API-specific dependencies:
pip install -r api_requirements.txt
  1. Configure API keys in .env file (copy from .env.example):
OPENAI_API_KEY=your-openai-api-key
DEEPL_API_KEY=your-deepl-api-key
ELEVENLABS_API_KEY=your-elevenlabs-api-key
CONVERTAPI_SECRET=your-convertapi-secret
  1. Configure authentication in .env file:
# Client credentials for OAuth2 authentication
CLIENT_ID=your-client-id
CLIENT_SECRET=your-client-secret

# Or for multiple clients:
# CLIENT_ID_1=first-client-id
# CLIENT_SECRET_1=first-client-secret
# CLIENT_ID_2=second-client-id
# CLIENT_SECRET_2=second-client-secret

Running the API

Start the server:

python api_server.py

Or with uvicorn directly:

uvicorn api_server:app --host 0.0.0.0 --port 8000 --reload

The API will be available at http://localhost:8000

File Size Limits

The API enforces file size limits to prevent resource exhaustion:

File Type Default Limit Environment Variable
PPTX files 50MB MAX_PPTX_SIZE
Text files 10MB MAX_TEXT_SIZE
Audio files 200MB MAX_AUDIO_SIZE
General files 100MB MAX_FILE_SIZE

Error Response: Files exceeding limits return HTTP 413 (Payload Too Large) with details:

{
  "detail": "File 'large.pptx' is too large (75.2MB). Maximum allowed size for pptx files is 50.0MB."
}

Configuration: Override limits via environment variables:

export MAX_PPTX_SIZE=104857600  # 100MB in bytes
export MAX_AUDIO_SIZE=524288000 # 500MB in bytes

API Documentation

Interactive API documentation is available at:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

API Endpoints

Core Endpoints

  • GET / - API information and available endpoints
  • GET /health - Health check
  • GET /tasks - List all active tasks
  • GET /tasks/{task_id} - Get task status
  • DELETE /tasks/{task_id} - Clean up task and temporary files
  • GET /download/{task_id} - Download task results

Processing Endpoints

PPTX Translation

POST /translate/pptx
  • Files: Upload PPTX files
  • Form Data:
    • source_lang: Source language code (e.g., "en")
    • target_lang: Target language code (e.g., "fr")

Text Translation

POST /translate/text
  • Files: Upload TXT files
  • Form Data:
    • source_lang: Source language code
    • target_lang: Target language code

Audio Transcription

POST /transcribe/audio
  • Files: Upload audio files (MP3, WAV, M4A, etc.)

PPTX Conversion

POST /convert/pptx
  • Files: Upload PPTX files
  • Form Data:
    • output_format: "pdf" or "png"

Text-to-Speech

POST /tts
  • Files: Upload TXT files (must contain voice name in filename)

Text Translation from S3

POST /translate/text_s3
  • JSON Body:
    • input_keys: Array of S3 object keys for the input TXT files
    • output_prefix: (Optional) Destination S3 prefix for translated files
    • source_lang: Source language code (e.g., "en")
    • target_lang: Target language code (e.g., "fr")

Course Translation from S3

POST /translate/course_s3
  • JSON Body:
    • course_id: Unique identifier for the course
    • source_lang: Current language present in S3 folder
    • target_langs: Array of target language codes (e.g., ["fr", "it"])
    • output_prefix: (Optional) Root prefix where translated course will be written

PPTX Translation from S3

POST /translate/pptx_s3
  • JSON Body:
    • input_keys: Array of S3 object keys for the input PPTX files
    • output_prefix: (Optional) Destination S3 prefix for translated files
    • source_lang: Source language code (e.g., "en")
    • target_lang: Target language code (e.g., "fr")

Audio Transcription from S3

POST /transcribe/audio_s3
  • JSON Body:
    • input_keys: Array of S3 object keys for the input audio files
    • output_prefix: (Optional) Destination S3 prefix for transcription results

Usage Examples

Using curl

  1. Translate a PPTX file:
curl -X POST "http://localhost:8000/translate/pptx" \
  -H "Authorization: Bearer token_admin_abc123def456" \
  -F "source_lang=en" \
  -F "target_lang=fr" \
  -F "files=@presentation.pptx"
  1. Check task status:
curl -H "Authorization: Bearer token_admin_abc123def456" \
  "http://localhost:8000/tasks/{task_id}"
  1. Download results:
# Download all results (single file directly, multiple files as ZIP)
curl -H "Authorization: Bearer token_admin_abc123def456" \
  -O "http://localhost:8000/download/{task_id}"

# Download specific file by index (0-based)
curl -H "Authorization: Bearer token_admin_abc123def456" \
  -O "http://localhost:8000/download/{task_id}/0"
  1. Translate a TXT file stored in S3:
curl -X POST "http://localhost:8000/translate/text_s3" \
  -H "Authorization: Bearer token_admin_abc123def456" \
  -H "Content-Type: application/json" \
  -d '{
        "input_keys": ["bucket/folder/document.txt"],
        "output_prefix": "translated/",
        "source_lang": "en",
        "target_lang": "fr"
      }'
  1. Translate a PPTX stored in S3:
curl -X POST "http://localhost:8000/translate/pptx_s3" \
  -H "Authorization: Bearer token_admin_abc123def456" \
  -H "Content-Type: application/json" \
  -d '{
        "input_keys": ["bucket/folder/presentation.pptx"],
        "output_prefix": "translated/",
        "source_lang": "en",
        "target_lang": "fr"
      }'
  1. Translate an entire course from S3:
curl -X POST "http://localhost:8000/translate/course_s3" \
  -H "Authorization: Bearer token_admin_abc123def456" \
  -H "Content-Type: application/json" \
  -d '{
        "course_id": "cad798e6-3acf-11f0-b82c-771d758cf407",
        "source_lang": "en",
        "target_langs": ["fr", "it"],
        "output_prefix": "translated/"
      }'
  1. Transcribe an audio file stored in S3:
curl -X POST "http://localhost:8000/transcribe/audio_s3" \
  -H "Authorization: Bearer token_admin_abc123def456" \
  -H "Content-Type: application/json" \
  -d '{
        "input_keys": ["bucket/folder/lecture.mp3"],
        "output_prefix": "transcripts/"
      }'

Using Python requests

import requests

# Setup authentication
headers = {'Authorization': 'Bearer token_admin_abc123def456'}

# Upload file for translation
files = {'files': open('presentation.pptx', 'rb')}
data = {'source_lang': 'en', 'target_lang': 'fr'}

response = requests.post(
    'http://localhost:8000/translate/pptx', 
    files=files, 
    data=data,
    headers=headers
)

task_id = response.json()['task_id']

# Check status
status_response = requests.get(
    f'http://localhost:8000/tasks/{task_id}',
    headers=headers
)
print(status_response.json())

# Download when complete
if status_response.json()['status'] == 'completed':
    download_response = requests.get(
        f'http://localhost:8000/download/{task_id}',
        headers=headers
    )
    
    # Save with proper extension based on Content-Type
    content_type = download_response.headers.get('content-type', '')
    if 'presentation' in content_type:
        filename = 'translated_presentation.pptx'
    elif 'application/zip' in content_type:
        filename = 'results.zip'
    else:
        filename = 'result.file'
    
    with open(filename, 'wb') as f:
        f.write(download_response.content)

Advanced PPTX Translation

The API provides professional-grade PPTX translation that preserves all formatting:

Complete Formatting Preservation

  • Fonts: Names, sizes, styles maintained
  • Colors: RGB and theme colors preserved
  • Typography: Bold, italic, underline styles
  • Layout: Paragraph spacing, alignment, indentation
  • Structure: Text frames, runs, paragraph levels

🎯 Same Quality as GUI App

The API uses the same advanced translation engine as the desktop application, ensuring identical results between interfaces.

📊 Professional Results

  • Maintains original presentation design
  • Preserves corporate branding and styling
  • Ready for professional use without reformatting

Task Management

The API uses asynchronous task processing:

  1. Submit a processing request → Get a task_id
  2. Poll the task status using the task_id
  3. Download results when status is "completed"
  4. Clean up the task when done

Task Status Values

  • pending: Task queued but not started
  • running: Task currently processing
  • completed: Task finished successfully
  • failed: Task encountered an error

Error Handling

The API returns standard HTTP status codes:

  • 200: Success
  • 400: Bad Request (invalid parameters)
  • 404: Not Found (task or file not found)
  • 422: Validation Error
  • 500: Internal Server Error

Error responses include details:

{
    "detail": "Error description"
}

Security Considerations

For production deployment:

  1. Authentication: Add API key authentication
  2. Rate Limiting: Implement request rate limiting
  3. File Validation: Enhanced file type and size validation
  4. HTTPS: Use HTTPS in production
  5. Resource Limits: Set memory and processing limits
  6. Monitoring: Add logging and monitoring

Deployment

Docker Deployment

Create a Dockerfile:

FROM python:3.9-slim

WORKDIR /app
COPY . .
RUN pip install -r api_requirements.txt

EXPOSE 8000
CMD ["uvicorn", "api_server:app", "--host", "0.0.0.0", "--port", "8000"]

Production Deployment

For production, consider:

  • Gunicorn with uvicorn workers
  • Nginx as reverse proxy
  • Docker containerization
  • Load balancing for multiple instances
  • Database for task persistence
  • Redis for task queues

Example with Gunicorn:

gunicorn api_server:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Supported File Formats

  • PPTX Translation: .pptx files
  • Text Translation: .txt files
  • Audio Transcription: .wav, .mp3, .m4a, .webm, .mp4, .mpga, .mpeg
  • PPTX Conversion: .pptx files → PDF/PNG
  • Text-to-Speech: .txt files (with voice name in filename)

API Limits

Current implementation limits:

  • File Size: 25MB per file (adjustable)
  • Concurrent Tasks: Limited by server resources
  • Audio Length: 20MB per audio file (API limitation)

Support

For issues and questions:

  1. Check the interactive API documentation at /docs
  2. Review the logs for detailed error information
  3. Ensure all required API keys are configured