A Node.js application for real-time speech-to-text and translation during conferences, featuring WebSocket-based audio streaming and multiple STT service integrations.
- 🎤 Real-time Audio Capture: Browser-based microphone access with audio level monitoring
- 🔄 WebSocket Communication: Low-latency bidirectional communication between frontend and backend
- 🗣️ Multiple STT Services: Support for Mock, Google Cloud, OpenAI Whisper, and Local AI services
- 🌍 Translation Support: Optional real-time translation to target languages
- 📱 Responsive UI: Modern React frontend with real-time subtitle display
- 🔧 Configurable: Easy switching between different STT services via environment variables
- Node.js 18+
- npm or yarn
- Microphone access in your browser
- Clone and setup the project:
cd /home/vlelicanin/Projects/translator/realtime-titling
npm install- Setup environment variables:
cp env.example .env
# Edit .env with your configuration- Install frontend dependencies:
cd frontend
npm install
cd ..- Start the backend server:
npm run dev
# or
npm start- Start the frontend (in a new terminal):
npm run frontend- Access the application:
- Frontend: http://localhost:3000
- Backend API: http://localhost:3001
- WebSocket: ws://localhost:3001/ws
Create a .env file in the root directory:
# Server Configuration
PORT=3001
NODE_ENV=development
# STT Service Configuration
STT_SERVICE=mock
# Options: mock, google, openai, local
# Google Cloud Speech-to-Text (if using google STT service)
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account-key.json
GOOGLE_PROJECT_ID=your-project-id
# OpenAI API (if using openai STT service)
OPENAI_API_KEY=your-openai-api-key
# Local AI Service (if using local STT service)
LOCAL_AI_URL=http://localhost:8080/v1/audio/transcriptions
# Translation Configuration
ENABLE_TRANSLATION=true
TARGET_LANGUAGE=en
SOURCE_LANGUAGE=sr
# WebSocket Configuration
WS_HEARTBEAT_INTERVAL=30000
WS_MAX_CONNECTIONS=100- Purpose: Development and testing
- Features: Simulated responses, no API keys required
- Configuration:
STT_SERVICE=mock
- Purpose: Production-ready cloud STT
- Features: High accuracy, streaming support, multiple languages
- Configuration:
STT_SERVICE=googleGOOGLE_APPLICATION_CREDENTIALS=path/to/key.jsonGOOGLE_PROJECT_ID=your-project-id
- Purpose: High-quality cloud STT with translation
- Features: Excellent accuracy, built-in translation
- Configuration:
STT_SERVICE=openaiOPENAI_API_KEY=your-api-key
- Purpose: Self-hosted STT (LocalAI, Whisper.cpp, etc.)
- Features: Privacy, no cloud costs, offline operation
- Configuration:
STT_SERVICE=localLOCAL_AI_URL=http://localhost:8080/v1/audio/transcriptions
GET /health- Health checkGET /api/config- Get current configurationGET /*- Serve React frontend
// Start audio streaming
{ type: 'start_stream' }
// Send audio data
{ type: 'audio', data: 'base64-encoded-audio' }
// Stop audio streaming
{ type: 'stop_stream' }
// Ping for connection health
{ type: 'ping' }// Connection established
{ type: 'connected', connectionId: 'uuid', config: {...} }
// Subtitle data
{ type: 'subtitle', data: { text: '...', translatedText: '...', confidence: 0.95 } }
// Error message
{ type: 'error', message: 'Error description' }
// Pong response
{ type: 'pong' }┌─────────────────┐ WebSocket ┌─────────────────┐
│ React Frontend │ ◄─────────────► │ Node.js Backend │
│ │ │ │
│ • Microphone │ │ • WebSocket │
│ • Audio Capture │ │ • STT Service │
│ • Subtitle UI │ │ • Translation │
└─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ STT Services │
│ │
│ • Mock │
│ • Google Cloud │
│ • OpenAI │
│ • Local AI │
└─────────────────┘
realtime-titling/
├── src/
│ ├── services/ # STT service implementations
│ │ ├── sttServiceFactory.js
│ │ ├── mockSttService.js
│ │ ├── googleSttService.js
│ │ ├── openaiSttService.js
│ │ └── localSttService.js
│ ├── websocket/ # WebSocket handling
│ │ └── titlingHandler.js
│ └── app.js # Main server file
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── hooks/ # Custom React hooks
│ │ ├── types/ # TypeScript types
│ │ └── App.tsx # Main React app
│ └── package.json
├── package.json # Backend dependencies
└── .env # Environment configuration
- Create a new service class in
src/services/ - Implement the required methods:
processAudio(audioBuffer, options)getServiceInfo()initialize()(optional)cleanup()(optional)
- Add the service to
sttServiceFactory.js - Update environment configuration
# Run backend tests
npm test
# Test WebSocket connection
curl http://localhost:3001/health
# Test configuration endpoint
curl http://localhost:3001/api/config# Build frontend
npm run frontend-build
# Start production server
NODE_ENV=production npm startFROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY src/ ./src/
COPY frontend/build/ ./frontend/build/
EXPOSE 3001
CMD ["npm", "start"]-
Microphone Permission Denied
- Ensure browser has microphone access
- Check HTTPS requirement for production
-
WebSocket Connection Failed
- Verify backend server is running on port 3001
- Check firewall settings
-
STT Service Errors
- Verify API keys and credentials
- Check service-specific configuration
-
Audio Quality Issues
- Ensure good microphone quality
- Check audio processing settings
Enable debug logging:
DEBUG=* npm run dev- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details.