The Advanced ML Training Engine streamlines the entire machineβlearning lifecycleβfrom data ingestion to model deployment. Now featuring a modern Gradio-powered web interface, intelligent preprocessing, stateβofβtheβart hyperβparameter optimisation, deviceβaware acceleration, and firstβclass experiment tracking.
- Gradio-powered UI with intuitive tabbed interface
- Real-time data visualization and comprehensive data previews
- Interactive model training with progress tracking
- Dedicated inference server for production deployments
- Sample dataset integration with popular ML datasets
- Secure model management with encryption support
- Multiβtask support: classification, regression, clustering
- Seamless integration with scikitβlearn, XGBoost, LightGBM & CatBoost
- Automated model selection & tuning
Classification | Regression |
---|---|
Logistic Regression | Linear Regression |
Random Forest Classifier | Random Forest Regressor |
Gradient Boosting Classifier | Gradient Boosting Regressor |
XGBoost Classifier | XGBoost Regressor |
LightGBM Classifier | LightGBM Regressor |
CatBoost Classifier | CatBoost Regressor |
Support Vector Classifier | Support Vector Regressor |
Neural Network | Neural Network |
- Grid Search, Random Search, Bayesian Optimisation
- ASHT (Adaptive SurrogateβAssisted Hyperβparameter Tuning)
- HyperX (metaβoptimiser for large search spaces)
- Autoβscaling & encoding
- Robust missingβvalue & outlier handling
- Feature selection / extraction pipelines
- Deviceβaware config & adaptive batching
- Quantisation & parallel execution
- Memoryβefficient data loaders
- Realβtime learning curves & metric dashboards
- Builtβin experiment tracker
- Performance comparison across models
- Feature importance visualizations
- Python 3.10 or newer
Option 1 β Fast Setup with UV π₯ (Recommended)
# 1. Clone the repository
git clone https://github.com/Genta-Technology/kolosal_automl.git
cd kolosal_automl
# 2. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# or on Windows:
# powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# 3. Create and activate virtual environment with dependencies
uv venv
# Activate virtual environment
# Windows: .venv\Scripts\activate
# macOS/Linux: source .venv/bin/activate
# 4. Install dependencies ultra-fast with uv
uv pip install -r requirements.txt
# Optional: Install GPU-accelerated packages
uv pip install xgboost lightgbm catboost
git clone https://github.com/Genta-Technology/kolosal_automl.git
cd kolosal_automl
python -m venv venv && source venv/bin/activate # create & activate venv
pip install --upgrade pip
pip install -r requirements.txt
Tip: For GPUβaccelerated algorithms (XGBoost, LightGBM, CatBoost) install the respective extras:
uv pip install xgboost lightgbm catboost # or with pip: pip install xgboost lightgbm catboost
Launch the full-featured web interface:
# Using uv (recommended)
uv run python app.py
# Or with standard Python
python app.py
# Launch in inference-only mode
uv run python app.py --inference-only
# Custom host and port
uv run python app.py --host 0.0.0.0 --port 8080
# Create public shareable link
uv run python app.py --share
Available Command Line Options:
--inference-only
: Run in inference-only mode (no training capabilities)--model-path
: Path to pre-trained model file (for inference-only mode)--config-path
: Path to model configuration file--host
: Host address (default: 0.0.0.0)--port
: Port number (default: 7860)--share
: Create a public Gradio link
from modules.engine.train_engine import MLTrainingEngine
from modules.configs import MLTrainingEngineConfig, TaskType, OptimizationStrategy
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Load your data
# X, y = load_your_data()
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Configure the engine
config = MLTrainingEngineConfig(
task_type=TaskType.CLASSIFICATION,
optimization_strategy=OptimizationStrategy.HYPERX,
cv_folds=5,
test_size=0.2,
)
engine = MLTrainingEngine(config)
best_model, metrics = engine.train_model(
model=RandomForestClassifier(),
model_name="RandomForest",
param_grid={
"n_estimators": [50, 100, 200],
"max_depth": [None, 5, 10],
},
X=X_train,
y=y_train,
)
engine.save_model(best_model)
predictions = engine.predict(X_test)
- Upload your CSV, Excel, Parquet, or JSON files
- Or try built-in sample datasets (Iris, Titanic, Boston Housing, etc.)
- View comprehensive data previews with statistics and visualizations
- Explore missing values, data types, and feature distributions
- Select task type (Classification/Regression)
- Choose optimization strategy (Random Search, Grid Search, Bayesian, HyperX)
- Configure cross-validation settings
- Set preprocessing options (normalization, feature selection)
- Enable advanced features (quantization, early stopping)
- Select your target column
- Choose from multiple algorithms (Random Forest, XGBoost, Neural Networks, etc.)
- Monitor training progress in real-time
- View training metrics and feature importance
- Make predictions on new data
- Compare model performance across different algorithms
- Visualize results with confusion matrices and residual plots
- Test with external datasets
- Save trained models with optional encryption
- Load previously saved models
- Export models in multiple formats (Pickle, Joblib, ONNX)
- Secure model deployment with access controls
- Dedicated inference endpoint for production use
- Real-time predictions with minimal latency
- Support for encrypted model files
- RESTful API compatibility
config = MLTrainingEngineConfig(
task_type=TaskType.CLASSIFICATION,
optimization_strategy=OptimizationStrategy.BAYESIAN,
cv_folds=5,
test_size=0.2,
random_state=42,
enable_quantization=True,
batch_size=64,
n_jobs=-1,
feature_selection=True,
early_stopping=True,
early_stopping_rounds=10,
)
The web interface includes several popular datasets for quick experimentation:
- Iris: Classic flower classification dataset
- Titanic: Passenger survival classification
- Boston Housing: House price regression
- Wine Quality: Wine rating prediction
- Diabetes: Medical classification dataset
- Car Evaluation: Multi-class classification
kolosal_automl/
βββ π main.py # Main application entry point
βββ π app.py # π Gradio web interface
βββ π modules/
β βββ π __init__.py
β βββ π configs.py # Configuration management
β βββ π api/ # π API endpoints
β β βββ π __init__.py
β β βββ π app.py
β β βββ π data_preprocessor_api.py
β β βββ π device_optimizer_api.py
β β βββ π inference_engine_api.py
β β βββ π model_manager_api.py
β β βββ π quantizer_api.py
β β βββ π train_engine_api.py
β βββ π engine/ # Core ML engines
β β βββ π __init__.py
β β βββ π batch_processor.py
β β βββ π data_preprocessor.py
β β βββ π inference_engine.py
β β βββ π lru_ttl_cache.py
β β βββ π quantizer.py
β β βββ π train_engine.py
β βββ π optimizer/ # Optimization algorithms
β β βββ π __init__.py
β β βββ π configs.py
β β βββ π device_optimizer.py # π Device optimization
β β βββ π model_manager.py # π Secure model management
β βββ π static/ # π Static assets
β βββ π utils/ # Utility functions
βββ π temp_data/ # π Temporary data storage
βββ π tests/ # Test suites
β βββ π .gitignore
β βββ π env/ # Test environments
β βββ π functional/ # Functional tests
β βββ π integration/ # Integration tests
β βββ π templates/ # Test templates
β β βββ π .gitattributes
β β βββ π .gitignore
β βββ π unit/ # Unit tests
βββ π .gitignore
βββ π app.py # Alternative app launcher
βββ π compose.yaml # π Docker Compose configuration
βββ π Dockerfile # π Docker containerization
βββ π kolosal_apilog # API logging
βββ π LICENSE # MIT License
βββ π python-version # Python version specification
βββ π README.md # Project documentation
βββ π requirements.txt # Dependencies
File | Status |
---|---|
tests/functional/test/app_api.py | β FAILED |
tests/functional/test/quantizer_api.py | β FAILED |
tests/functional/test/data_preprocessor_api.py | β FAILED |
tests/functional/test/device_optimizer_api.py | β FAILED |
tests/functional/test/inference_engine_api.py | β FAILED |
tests/functional/test/train_engine_api.py | β FAILED |
tests/functional/test/model_manager_api.py | β FAILED |
File | Status |
---|---|
tests/unit/test/batch_processor.py | β PASSED |
tests/unit/test/data_preprocessor.py | β FAILED |
tests/unit/test/device_optimizer.py | β FAILED |
tests/unit/test/inference_engine.py | β FAILED |
tests/unit/test/lru_ttl_cache.py | β PASSED |
tests/unit/test/model_manager.py | β FAILED |
tests/unit/test/optimizer_asht.py | β FAILED |
tests/unit/test/optimizer_hyperx.py | β PASSED |
tests/unit/test/quantizer.py | β FAILED |
tests/unit/test/train_engine.py | β FAILED |
Run all tests:
pytest -vv
- π Gradio Web Interface β Complete redesign from Streamlit to Gradio for better performance and user experience
- π§ Enhanced UV Integration β Streamlined installation and dependency management with UV package manager
- π― Dedicated Inference Server β Production-ready inference endpoint with minimal latency
- π Advanced Data Visualization β Comprehensive data previews with correlation matrices and distribution plots
- π Secure Model Management β Enhanced model encryption and access control features
- Sample Dataset Integration β Built-in access to popular ML datasets (Iris, Titanic, Boston Housing, etc.)
- Real-time Training Progress β Live updates during model training with detailed metrics
- Performance Comparison Dashboard β Side-by-side model evaluation and ranking
- Enhanced Device Optimization β Better GPU detection and memory management
- Improved Error Handling β More robust error messages and debugging information
- Multiple Export Formats β Support for Pickle, Joblib, and ONNX model exports
- Command Line Interface β Flexible CLI options for different deployment scenarios
- Interactive Data Exploration β In-browser data analysis with statistical summaries
- Feature Importance Visualization β Automated generation of feature importance plots
- Model Encryption β Secure model storage with password protection
- Faster Model Loading β Optimized model serialization and deserialization
- Memory Optimization β Reduced memory footprint during training and inference
- Parallel Processing β Enhanced multi-core utilization for training workflows
- Caching System β Intelligent caching for faster repeated operations
- Complete Test Suite & CI green β¨
- REST API Endpoints for programmatic access
- Docker Containerization for easy deployment
- Model Monitoring & drift detection
- AutoML Pipeline with automated feature engineering
- Timeβseries & anomalyβdetection modules
- Cloudβnative deployment recipes (AWS, GCP, Azure)
- MLOps Integration with popular platforms
Purpose | Library |
---|---|
Web UI | Gradio π |
Package Mgmt | UV π |
Data Ops | Pandas / NumPy |
Core ML | scikitβlearn |
Boosting | XGBoost / LightGBM / CatBoost |
Visuals | Matplotlib / Seaborn |
Serialisation | Joblib / Pickle |
Optimization | Optuna / Hyperopt |
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes and add tests
- Verify tests pass:
uv run pytest -q
- Commit your changes:
git commit -m 'Add amazing feature'
- Push to the branch:
git push origin feature/amazing-feature
- Open a Pull Request
For comprehensive documentation and tutorials:
- API Reference: docs/api.md
- Configuration Guide: docs/configuration.md
- Deployment Guide: docs/deployment.md
- Contributing Guide: CONTRIBUTING.md
Released under the MIT License. See LICENSE
for details.
Ready to explore advanced machine learning? Try our quickstart:
# Clone and setup
git clone https://github.com/Genta-Technology/kolosal_automl.git
cd kolosal_automl
# Quick install with UV
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt
# Launch the web interface
uv run python app.py
# Open http://localhost:7860 in your browser and start experimenting! π
Built with β€οΈ by the Kolosal AI Team
π Star us on GitHub | π Documentation | π Report Issues