Skip to content

wisc-arch/SandBox-RL

 
 

Repository files navigation

SandBox RL - Multi-Model Reinforcement Learning

Core SRL Logo

Advanced Multi-Model RL Framework for Training Modern LLMs

Python 3.8+ License: MIT

What is Core SRL?

Core SRL enables simultaneous training of multiple modern LLMs using reinforcement learning with cooperative-competitive dynamics. Train 4-8 models like Qwen3-14B, together with real-time weight updates.

Key Features

  • Multi-Model Training: Simultaneous RL training of 4-8 modern LLMs
  • Live Weight Updates: Real-time parameter synchronization during training
  • Cooperative-Competitive RL: Novel algorithm balancing cooperation and competition
  • Modern Model Support: Qwen3-14B, Llama-3.1, and other open-weight models
  • VERL/AReaL Integration: Efficient training with advanced caching
  • Checkpoint Management: Automatic saving and recovery

System Architecture

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Core SRL Architecture                        │
├─────────────────────────────────────────────────────────────────┤
│  Multi-Model Trainer                                           │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐│
│  │   Qwen3-14B │ │   Qwen-Math │ │ Qwen-Coder  │ │ Llama-3.1   ││
│  │   + LoRA    │ │   + LoRA    │ │   + LoRA    │ │   + LoRA    ││
│  └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘│
│         │               │               │               │        │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │           Cooperative-Competitive RL Engine               │ │
│  │  • Weight Update Coordination  • Parameter Sharing        │ │
│  │  • VERL Integration           • AReaL Optimization        │ │
│  └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Quick Start

Installation

git clone https://github.com/NoakLiu/SandBox-RL.git
cd core-srl
pip install -r requirements.txt

Basic Training

import asyncio
from core_srl import quick_start_multimodel_training

async def main():
    results = await quick_start_multimodel_training(
        num_models=4,
        max_episodes=100
    )
    print(f"Training completed: {results['status']}")

asyncio.run(main())

Advanced Configuration

from core_srl import MultiModelTrainer, MultiModelConfig, TrainingMode

config = MultiModelConfig(
    num_models=6,
    model_types=["qwen3", "qwen_coder", "llama3"],
    training_mode=TrainingMode.MIXED,
    max_episodes=1000,
    checkpoint_dir="./my_checkpoints"
)

trainer = MultiModelTrainer(config)
results = asyncio.run(trainer.train())

Checkpoint Management

from core_srl import list_available_checkpoints

# List checkpoints
checkpoints = list_available_checkpoints()
print("Available:", checkpoints)

# Resume training
trainer.load_checkpoint(checkpoints[0])

Supported Models

MODERN_MODELS = {
    "qwen3": "Qwen/Qwen2.5-14B-Instruct",           # Latest Qwen
    "qwen_coder": "Qwen/Qwen2.5-Coder-14B-Instruct", # Code specialized
    "qwen_math": "Qwen/Qwen2.5-Math-14B-Instruct",   # Math specialized
    "llama3": "meta-llama/Llama-3.1-8B-Instruct"     # Latest Llama
}

📁 Project Structure

core-srl/
├── core_srl/           # Core framework (8 files)
├── examples/           # Training examples (8 examples)
├── tests/              # Test suites
├── docs/               # Documentation (6 docs)
├── data/               # Training data and results
└── checkpoints/        # Model checkpoints

📚 Documentation

Contributing

Focus areas:

  • New modern LLM integrations
  • Advanced multi-model strategies
  • Performance optimizations

📄 License

MIT License


Core SRL v2.0.0 - Multi-Model RL Training Made Efficient

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.2%
  • Shell 0.8%