Core SRL enables simultaneous training of multiple modern LLMs using reinforcement learning with cooperative-competitive dynamics. Train 4-8 models like Qwen3-14B, together with real-time weight updates.
- Multi-Model Training: Simultaneous RL training of 4-8 modern LLMs
- Live Weight Updates: Real-time parameter synchronization during training
- Cooperative-Competitive RL: Novel algorithm balancing cooperation and competition
- Modern Model Support: Qwen3-14B, Llama-3.1, and other open-weight models
- VERL/AReaL Integration: Efficient training with advanced caching
- Checkpoint Management: Automatic saving and recovery
┌─────────────────────────────────────────────────────────────────┐
│ Core SRL Architecture │
├─────────────────────────────────────────────────────────────────┤
│ Multi-Model Trainer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐│
│ │ Qwen3-14B │ │ Qwen-Math │ │ Qwen-Coder │ │ Llama-3.1 ││
│ │ + LoRA │ │ + LoRA │ │ + LoRA │ │ + LoRA ││
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘│
│ │ │ │ │ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Cooperative-Competitive RL Engine │ │
│ │ • Weight Update Coordination • Parameter Sharing │ │
│ │ • VERL Integration • AReaL Optimization │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
git clone https://github.com/NoakLiu/SandBox-RL.git
cd core-srl
pip install -r requirements.txtimport asyncio
from core_srl import quick_start_multimodel_training
async def main():
results = await quick_start_multimodel_training(
num_models=4,
max_episodes=100
)
print(f"Training completed: {results['status']}")
asyncio.run(main())from core_srl import MultiModelTrainer, MultiModelConfig, TrainingMode
config = MultiModelConfig(
num_models=6,
model_types=["qwen3", "qwen_coder", "llama3"],
training_mode=TrainingMode.MIXED,
max_episodes=1000,
checkpoint_dir="./my_checkpoints"
)
trainer = MultiModelTrainer(config)
results = asyncio.run(trainer.train())from core_srl import list_available_checkpoints
# List checkpoints
checkpoints = list_available_checkpoints()
print("Available:", checkpoints)
# Resume training
trainer.load_checkpoint(checkpoints[0])MODERN_MODELS = {
"qwen3": "Qwen/Qwen2.5-14B-Instruct", # Latest Qwen
"qwen_coder": "Qwen/Qwen2.5-Coder-14B-Instruct", # Code specialized
"qwen_math": "Qwen/Qwen2.5-Math-14B-Instruct", # Math specialized
"llama3": "meta-llama/Llama-3.1-8B-Instruct" # Latest Llama
}core-srl/
├── core_srl/ # Core framework (8 files)
├── examples/ # Training examples (8 examples)
├── tests/ # Test suites
├── docs/ # Documentation (6 docs)
├── data/ # Training data and results
└── checkpoints/ # Model checkpoints
- Quick Start - 5-minute setup
- Multi-Model Training - Training guide
- Model Configuration - Modern LLM setup
- Checkpoints - Save/restore training
- VERL/AReaL - Advanced optimization
- API Reference - Complete API
Focus areas:
- New modern LLM integrations
- Advanced multi-model strategies
- Performance optimizations
MIT License
Core SRL v2.0.0 - Multi-Model RL Training Made Efficient

