Releases: MatN23/AdaptiveTrainingSystem
LuminaAI 1.4.0
LuminaAI v1.4.0
Release Date: 2025-01-09
🚀 What's New
Autonomous Training System
- AI-Driven Orchestrator: Real-time hyperparameter optimization and anomaly detection
- 18 Adaptive Methods: Dynamic expert management, routing adjustments, emergency recovery
- Meta-Learning: Learns from previous runs to optimize future training
Chinchilla Scaling
- Auto Epoch Calculation: Compute-optimal training (20 tokens/parameter)
- Loss Landscape Analysis: Real-time plateau and divergence detection
- Smart Early Stopping: Convergence-aware termination
Enhanced Precision & Quantization
- 16 Precision Modes: FP64 → FP8, mixed precision variants
- 3 Quantization Methods: BitsAndBytes (8-bit), GPTQ (4-bit), Optimum Quanto
- Hardware-Aware: Auto-optimizes for CUDA, Apple Silicon (MPS), CPU
Architectures
- Dense: GQA, RoPE, SwiGLU, RMSNorm
- MoE: 40-60% parameter savings, dynamic expert management
- MoD: 30-50% compute savings, adaptive routing
- Hybrid: MoE + MoD combined
📦 Quick Start
git clone https://github.com/matn23/luminaai
cd luminaai
pip install -r requirements.txt
cd Src/Main_Scripts
python Main.pyMinimal Example
config_choice = 'b1'
use_adaptive_training = True
training_params = {
'num_epochs': 3,
'batch_size': 8,
'learning_rate': 1e-4,
'precision': 'auto',
}
data_params = {
'training_mode': 'finetuning_only',
'finetuning_paths': ['data/train.jsonl'],
}📊 Pre-Configured Models
| Preset | Active | Total | Hardware |
|---|---|---|---|
debug |
500K | 4M | Any |
b1 |
1B | 8B | RTX 3090, M1 Max |
b7 |
7B | 56B | A100 40GB |
b14 |
14B | 112B | A100 80GB |
b50 |
50B | 400B | Multi-H100 |
b100 |
100B | 800B | H200 Server |
b200 |
200B | 1600B | H200 Server |
b300 |
300B | 2400B | H200 Server |
✨ Key Features
- Fully Autonomous: Self-optimizing with minimal config
- Emergency Recovery: Auto rollback, gradient explosion handling, OOM recovery
- Multi-Device: CUDA, MPS, CPU support with distributed training
- Production-Ready: Comprehensive checkpointing, monitoring, error handling
Documentation: README.md
LuminaAI v2.0.0
🚀 LuminaAI v2.0 – Hybrid MoE + MoD Release
LuminaAI just leveled up. This release brings hybrid Mixture-of-Experts (MoE) + Mixture-of-Depths (MoD) architectures, unlocking next-level efficiency and scalability for transformer models. Train massive models with fewer resources, smarter routing, and lightning-fast attention.
Key Highlights
Hybrid MoE + MoD: Combine expert routing with token-level dynamic depth for max efficiency.
Flash Attention 2 & GQA: 2-4x faster attention and optimized KV caching for long sequences.
Advanced Sparse & Dense Architectures: Flexible MoE patterns, dynamic depth skipping, SwiGLU + RMSNorm, RoPE embeddings.
Multi-Dataset & Streaming Support: Pre-training, fine-tuning, hybrid, or interleaved modes with automatic data validation.
Adaptive Training Orchestrator: Real-time monitoring, anomaly detection, auto-recovery from OOM errors, and meta-learning hyperparameters.
Quantization & Gradient Checkpointing: INT8/4-bit inference, mixed precision training, memory-efficient large model support.
Comprehensive Metrics & Profiling: Track loss, perplexity, token routing, per-layer performance, and training health in real time.
Multi-Hardware Scaling: NVIDIA GPUs, Apple M-series (M1/M2/M3), CPU fallback, and multi-node distributed training.
Automatic Checkpointing & Best Model Tracking: Keep your training safe and recoverable at all times.
Why v2.0
This release is about maximum efficiency and flexibility. Whether you’re experimenting with small models or pushing the limits with multi-billion-parameter architectures, LuminaAI v2.0 gives you the tools to train smarter, faster, and more reliably than ever.
LuminaAI 1.2.6
LuminaAI v1.1.6
🚀 LuminaAI Conversational Transformer v1.1.6
Production-Ready Conversational AI Training Framework with Comprehensive Precision Support
🎯 What's New in v1.1.6
🔥 Major Features
- 🎯 Multi-Precision Training & Inference: Comprehensive support for FP32, FP16, BF16, Mixed Precision, and TensorFloat-32 with automatic optimization
- ⚡ Production-Grade Architecture: Grouped Query Attention (GQA), RoPE, SwiGLU, RMSNorm, and Flash Attention support
- 🛡️ Enterprise-Level Monitoring: Real-time health monitoring, fault tolerance, and automatic recovery systems
- 📊 Advanced Analytics: Comprehensive precision benchmarking, performance profiling, and training insights
- 🎪 Dynamic Precision Selection: Auto-tuning capabilities that select optimal precision based on hardware and use case
- 💾 Robust Checkpointing: Automatic backup systems with emergency recovery and training resumption
🔧 Technical Highlights
- Multi-Device Support: Seamless CPU/GPU training with hardware-specific optimizations
- Memory Optimization: Advanced GPU memory management with configurable limits
- Scalable Data Processing: Multi-threaded OASST dataset processing with comprehensive validation
- Enhanced Tokenization: GPT-4 compatible tokenizer with conversation-aware encoding
- Model Compilation: PyTorch 2.0+ compilation support for accelerated training
📋 System Requirements
Minimum Requirements
- Python: 3.8+
- PyTorch: 1.13.0+
- CUDA: 11.0+ (for GPU acceleration)
- RAM: 16GB system memory
- Storage: 10GB free space
Recommended Requirements
- Python: 3.10+
- PyTorch: 2.0+
- CUDA: 12.0+ with Compute Capability 8.0+ (for TF32 support)
- GPU: NVIDIA RTX 3090/4090, A100, or H100
- RAM: 32GB+ system memory
- Storage: 100GB+ NVMe SSD
🚀 Quick Start
1️⃣ Installation
# Clone the repository git clone https://github.com/MatN23/LuminaAI.git cd LuminaAIInstall core dependencies
pip install torch>=1.13.0 tiktoken>=0.5.0 numpy>=1.21.0 psutil>=5.8.0
Install optional dependencies for enhanced features
pip install flash-attn>=2.0.0 wandb>=0.15.0 # Optional but recommended
2️⃣ Basic Training
python main.py
--config medium
--train-data data/train.jsonl
--eval-data data/eval.jsonl
--epochs 10
--lr 1e-4
--batch-size 4
--precision fp16
--inference-precision auto
--experiment-name my_first_model
3️⃣ Data Preparation
# Process...LuminaAI v.1.1.4
LuminaAI v1.1.3 Release
A conversational transformer training system with comprehensive monitoring, fault tolerance, and production-ready features.
What's Included
Core Training System
Transformer Model: Implementation with Grouped Query Attention, RoPE positional encoding, and SwiGLU activation
Conversation Tokenizer: Handles multi-turn conversations with proper role formatting
Training Pipeline: Complete training loop with gradient accumulation and mixed precision support
Dataset Handling: JSONL conversation format with validation and preprocessing
Configuration & Presets
4 Built-in Presets: Debug (6M params), Small (50M params), Medium (400M params), Large (1.2B params)
Flexible Configuration: YAML-based config system with validation
Easy Customization: Simple variable modification in Main.py for common settings
Monitoring & Logging
Multi-backend Logging: File logs, optional Wandb and TensorBoard integration
Health Monitoring: Training stability tracking with anomaly detection
Performance Metrics: Loss, perplexity, throughput, and system resource monitoring
Structured Logging: JSON-formatted metrics for analysis
Fault Tolerance
Checkpoint Management: Automatic saving with configurable frequency
Recovery System: Resume training from interruptions
Error Handling: Comprehensive exception handling with detailed logging
Data Validation: Pre-training data quality checks
Utilities
Environment Validation: System compatibility checks
Data Processing: OASST format conversion and quality analysis
Report Generation: Training summaries and dataset analysis
Performance Estimation: Training time and resource usage predictions
Technical Specifications
Model Architecture
Transformer decoder with modern optimizations
Supports sequence lengths up to 4096 tokens
Mixed precision training (FP16/BF16)
Optional model compilation with PyTorch 2.0
Configurable attention mechanisms
Training Features
Gradient accumulation for large effective batch sizes
Learning rate scheduling (cosine, linear, one-cycle)
Early stopping with patience-based monitoring
Weighted loss computation for conversation training
Automatic gradient clipping and normalization
Data Support
JSONL conversation format
Multi-turn conversation handling
Role-based message formatting (user, assistant, system)
Automatic data validation and quality scoring
Support for OASST and similar datasets
System Requirements
Python 3.8+
PyTorch 2.0+
CUDA-capable GPU (recommended)
8GB+ system RAM
Variable VRAM requirements based on model size
Usage
Basic Training
python Main.py
Starts training with debug preset and sample data.
Custom Configuration
python Main.py --config medium --epochs 10 --lr 1e-4
Data Processing
python Main.py --validate-data data.jsonl
python Main.py --process-oasst input.jsonl output.jsonl
Configuration Options
Model Presets
debug: 6M parameters, minimal resources for testing
small: 50M parameters, 8GB VRAM requirement
medium: 400M parameters, 16GB VRAM requirement
large: 1.2B parameters, 32GB+ VRAM requirement
Key Parameters
Learning rates: Configurable with scheduling options
Batch sizes: Per-device and gradient accumulation settings
Precision: FP32, FP16, or BF16 training
Sequence length: Up to 4096 tokens
Checkpoint frequency: Configurable save intervals
Dependencies
Core Requirements:
torch>=2.0.0
numpy
tiktoken
pyyaml
psutil
Optional Monitoring:
wandb (for experiment tracking)
tensorboard (for local visualization)
License
Custom License - see LICENSE file for terms and conditions.
Installation
git clone https://github.com/MatN23/LuminaAI.git
pip install -r requirements.txt
cd LuminaAI/Src/Main_Scripts
python Setup.py
The setup script validates the environment and creates necessary directories.
LuminaAI v.1.1.3
LuminaAI v1.1.3 Release
A conversational transformer training system with comprehensive monitoring, fault tolerance, and production-ready features.
What's Included
Core Training System
- Transformer Model: Implementation with Grouped Query Attention, RoPE positional encoding, and SwiGLU activation
- Conversation Tokenizer: Handles multi-turn conversations with proper role formatting
- Training Pipeline: Complete training loop with gradient accumulation and mixed precision support
- Dataset Handling: JSONL conversation format with validation and preprocessing
Configuration & Presets
- 4 Built-in Presets: Debug (6M params), Small (50M params), Medium (400M params), Large (1.2B params)
- Flexible Configuration: YAML-based config system with validation
- Easy Customization: Simple variable modification in Main.py for common settings
Monitoring & Logging
- Multi-backend Logging: File logs, optional Wandb and TensorBoard integration
- Health Monitoring: Training stability tracking with anomaly detection
- Performance Metrics: Loss, perplexity, throughput, and system resource monitoring
- Structured Logging: JSON-formatted metrics for analysis
Fault Tolerance
- Checkpoint Management: Automatic saving with configurable frequency
- Recovery System: Resume training from interruptions
- Error Handling: Comprehensive exception handling with detailed logging
- Data Validation: Pre-training data quality checks
Utilities
- Environment Validation: System compatibility checks
- Data Processing: OASST format conversion and quality analysis
- Report Generation: Training summaries and dataset analysis
- Performance Estimation: Training time and resource usage predictions
Technical Specifications
Model Architecture
- Transformer decoder with modern optimizations
- Supports sequence lengths up to 4096 tokens
- Mixed precision training (FP16/BF16)
- Optional model compilation with PyTorch 2.0
- Configurable attention mechanisms
Training Features
- Gradient accumulation for large effective batch sizes
- Learning rate scheduling (cosine, linear, one-cycle)
- Early stopping with patience-based monitoring
- Weighted loss computation for conversation training
- Automatic gradient clipping and normalization
Data Support
- JSONL conversation format
- Multi-turn conversation handling
- Role-based message formatting (user, assistant, system)
- Automatic data validation and quality scoring
- Support for OASST and similar datasets
System Requirements
- Python 3.8+
- PyTorch 2.0+
- CUDA-capable GPU (recommended)
- 8GB+ system RAM
- Variable VRAM requirements based on model size
Usage
Basic Training
python Main.pyStarts training with debug preset and sample data.
Custom Configuration
python Main.py --config medium --epochs 10 --lr 1e-4Data Processing
python Main.py --validate-data data.jsonl
python Main.py --process-oasst input.jsonl output.jsonlConfiguration Options
Model Presets
- debug: 6M parameters, minimal resources for testing
- small: 50M parameters, 8GB VRAM requirement
- medium: 400M parameters, 16GB VRAM requirement
- large: 1.2B parameters, 32GB+ VRAM requirement
Key Parameters
- Learning rates: Configurable with scheduling options
- Batch sizes: Per-device and gradient accumulation settings
- Precision: FP32, FP16, or BF16 training
- Sequence length: Up to 4096 tokens
- Checkpoint frequency: Configurable save intervals
Dependencies
Core Requirements:
- torch>=2.0.0
- numpy
- tiktoken
- pyyaml
- psutil
Optional Monitoring:
- wandb (for experiment tracking)
- tensorboard (for local visualization)
License
Custom License - see LICENSE file for terms and conditions.
Installation
git clone https://github.com/MatN23/LuminaAI.git
pip install -r requirements.txt
cd LuminaAI/Src/Main_Scripts
python Setup.pyThe setup script validates the environment and creates necessary directories.