This platform implements distributed reinforcement learning using actor-learner architecture with Kafka as the message broker.
Actors generate environment experiences and stream them to Kafka:
- Actors run parallel environments, collecting (state, action, reward, next_state)
- Experiences pushed to Kafka topic:
experiences - Learners batch-consume experiences for training
Learners train and broadcast model weights:
- Learners train PPO/other RL algorithms on batched experiences
- Updated model weights pushed to Kafka topic:
model_updates - Actors consume weights to sync their local policy networks
Web interface for training control:
- Frontend sends control requests (start/stop, scale actors, configure hyperparams)
- API forwards commands to Kafka topic:
commands.control - API consumes metrics from Kafka (
metrics.training,metrics.actors) - Returns status/metrics via REST API and WebSocket
Parallel control interface for AI agents:
- AI assistants (like Claude) control training via MCP protocol tools
- Same functionality as API (start/stop, scale, configure)
- Sends commands to Kafka topic:
commands.control - Consumes metrics from Kafka for reporting
Both control interfaces scale pods:
- Scale actor deployment (0-16 replicas)
- Scale learner deployment (0-1 replicas)
- Update ConfigMaps for environment changes
All services containerized:
Dockerfile.actor: Actor service containerDockerfile.api: API service containerDockerfile.learner: Learner service containerdocker-compose.yaml: Orchestrates all services with Kafka
K8s configs for scalability:
actor.yaml: Actor pod deploymentapi.yaml: API service deploymentlearner.yaml: Learner pod deploymentkafka.yaml: Kafka broker deploymentconfigmap.yaml: Shared configuration
Frontend → API ↘
↘ commands.control
AI Agent → MCP → Kafka ↔ Actors ↔ Learner (core RL loop)
↗ metrics.training
K8s (pod scaling)
Core RL Loop (isolated):
Actor → Kafka (experiences) → Learner → Train → Kafka (weights) → Actor
- Actor-Learner: Kafka pub/sub (asynchronous, core training)
- Control→Kafka: Commands via
commands.controltopic - Kafka→Control: Metrics via
metrics.training,metrics.actorstopics - Frontend-API: HTTP REST + WebSocket (synchronous)
- AI Agent-MCP: MCP protocol (tool-based control)
rl_core/actor.py: Environment interaction logicrl_core/learner.py: Training loop implementationrl_core/agent.py: Neural network policy/value functionsmcp_server/api.py: REST API endpointsmcp_server/server.py: MCP protocol server