This directory contains examples and documentation for using the CUDA-accelerated Support Vector Machine (SVM) implementation with automatic CPU fallback support.
from SVM.cuda_svm import CudaSVC, CudaSVR
import numpy as np
# Classification - automatically uses CUDA if available, falls back to CPU
svc = CudaSVC(C=1.0, kernel='rbf', gamma='scale')
svc.fit(X_train, y_train)
predictions = svc.predict(X_test)
# Regression - same cross-platform compatibility
svr = CudaSVR(C=1.0, epsilon=0.1, kernel='rbf')
svr.fit(X_train, y_train)
predictions = svr.predict(X_test)- GPU (Optional): NVIDIA GPU with CUDA Compute Capability 7.0+ (RTX 20 series, GTX 1650+, Tesla V100+)
- CPU (Required): Any modern x86_64 processor
- RAM: 4GB+ system memory (8GB+ recommended for large datasets)
- CUDA Toolkit (Optional): Version 12.0+ for GPU acceleration
- Python: 3.7+
- Dependencies: numpy, scikit-learn
- GPU-Accelerated: Systems with CUDA-capable NVIDIA GPUs
- CPU-Only: Any system (automatic fallback when CUDA unavailable)
- Cloud Platforms: Google Colab, AWS, Azure, etc.
- Cross-Platform: Linux, Windows, macOS (CPU mode only on macOS)
cd /path/to/Cuda-ML-Library/SVM
make clean
makeThe build process will:
- Auto-detect CUDA availability and GPU architecture
- Compile CUDA kernels when GPU is available
- Create CPU fallback implementation when CUDA is unavailable
- Generate
libcuda_svm.soshared library with universal compatibility
pip install numpy scikit-learnpython -c "from SVM.cuda_svm import CudaSVC; print('CUDA SVM imported successfully!')"from SVM.cuda_svm import CudaSVC
# Check if CUDA acceleration is available (available in optimized version)
try:
svc = CudaSVC()
print("CUDA SVM loaded successfully")
# Hardware detection features available in optimized version
except Exception as e:
print(f"Error: {e}")The library automatically detects and utilizes available hardware:
- Full GPU acceleration for training and prediction
- Optimized CUDA kernels for maximum performance
- Support for all kernel types (linear, RBF, polynomial, sigmoid)
- Memory pooling for efficient GPU memory management
- High-performance CPU implementation using OpenMP
- Automatic activation when CUDA is unavailable
- Compatible with all platforms (Linux, Windows, macOS)
- Optimized for multi-core processors
# Works on any platform - automatic hardware detection
from SVM.cuda_svm import CudaSVC
# Colab with GPU
svc_colab = CudaSVC() # Uses GPU acceleration
print(f"Running on: {svc_colab.device_name}")
# Local machine without GPU
svc_cpu = CudaSVC() # Falls back to CPU
print(f"Running on: CPU Mode")
# Same API, same results, optimal performanceimport numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from SVM.cuda_svm import CudaSVC
# Generate sample data
X, y = make_classification(n_samples=10000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Preprocess data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Convert labels to -1, 1 format
y_train_svm = np.where(y_train == 0, -1, 1)
y_test_svm = np.where(y_test == 0, -1, 1)
# Train SVM (automatically uses best available hardware)
svc = CudaSVC(C=1.0, kernel='rbf', gamma='scale', probability=True)
# Check hardware status
print(f"Using: {'GPU (' + svc.device_name + ')' if svc.cuda_available else 'CPU'}")
svc.fit(X_train_scaled, y_train_svm)
# Make predictions
predictions = svc.predict(X_test_scaled)
probabilities = svc.predict_proba(X_test_scaled)
accuracy = svc.score(X_test_scaled, y_test_svm)
print(f"Accuracy: {accuracy:.4f}")from sklearn.datasets import make_regression
from SVM.cuda_svm import CudaSVR
# Generate regression data
X, y = make_regression(n_samples=5000, n_features=15, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize features and targets
scaler_X = StandardScaler()
scaler_y = StandardScaler()
X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)
y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).ravel()
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1)).ravel()
# Train SVM (automatic hardware detection)
svr = CudaSVR(C=1.0, epsilon=0.1, kernel='rbf', gamma='scale')
print(f"Training on: {'GPU (' + svr.device_name + ')' if svr.cuda_available else 'CPU'}")
svr.fit(X_train_scaled, y_train_scaled)
# Evaluate
r2_score = svr.score(X_test_scaled, y_test_scaled)
print(f"R² Score: {r2_score:.4f}")C(float, default=1.0): Regularization parameterkernel(str, default='rbf'): Kernel type ('linear', 'rbf', 'poly', 'sigmoid')gamma(str/float, default='scale'): Kernel coefficient ('scale', 'auto', or numeric value)probability(bool, default=False): Enable probability estimatestolerance(float, default=1e-3): Tolerance for stopping criterionmax_iter(int, default=1000): Maximum iterations
C(float, default=1.0): Regularization parameterepsilon(float, default=0.1): Epsilon-tube parameterkernel(str, default='rbf'): Kernel typegamma(str/float, default='scale'): Kernel coefficienttolerance(float, default=1e-3): Tolerance for stopping criterionmax_iter(int, default=1000): Maximum iterations
- Linear:
K(x,y) = x·y - RBF:
K(x,y) = exp(-γ||x-y||²) - Polynomial:
K(x,y) = (γx·y + r)^d - Sigmoid:
K(x,y) = tanh(γx·y + r)
# Always standardize your features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# For classification, use -1/+1 labels
y_svm = np.where(y == 0, -1, 1)# For large datasets, process in batches
batch_size = 10000
predictions = []
for i in range(0, len(X_test), batch_size):
batch_pred = svc.predict(X_test[i:i+batch_size])
predictions.extend(batch_pred)from sklearn.model_selection import GridSearchCV
from sklearn.base import BaseEstimator, ClassifierMixin
class CudaSVMWrapper(BaseEstimator, ClassifierMixin):
def __init__(self, C=1.0, kernel='rbf', gamma='scale'):
self.C = C
self.kernel = kernel
self.gamma = gamma
def fit(self, X, y):
self.svm_ = CudaSVC(C=self.C, kernel=self.kernel, gamma=self.gamma)
self.svm_.fit(X, y)
return self
def predict(self, X):
return self.svm_.predict(X)
def score(self, X, y):
return self.svm_.score(X, y)
# Grid search
param_grid = {
'C': [0.1, 1, 10, 100],
'gamma': ['scale', 'auto', 0.01, 0.1, 1],
'kernel': ['rbf', 'linear', 'poly']
}
grid_search = GridSearchCV(CudaSVMWrapper(), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train_scaled, y_train_svm)
print(f"Best parameters: {grid_search.best_params_}")- Small datasets (< 1K samples): 2-5x faster than scikit-learn
- Medium datasets (1K-10K samples): 5-15x faster than scikit-learn
- Large datasets (10K+ samples): 10-50x faster than scikit-learn
- Small datasets (< 1K samples): Comparable to scikit-learn
- Medium datasets (1K-10K samples): 1.5-3x faster than scikit-learn (multi-core optimized)
- Large datasets (10K+ samples): 2-5x faster than scikit-learn
from SVM.cuda_svm import CudaSVC
import time
# The library automatically chooses the best available hardware
svc = CudaSVC(C=1.0, kernel='rbf', gamma='scale')
print(f"Hardware: {'GPU (' + svc.device_name + ')' if svc.cuda_available else 'CPU'}")
# Same performance-optimized code for both modes
start_time = time.time()
svc.fit(X_train, y_train)
training_time = time.time() - start_time
print(f"Training completed in {training_time:.2f}s")Performance depends on dataset size, features, kernel type, and hardware specifications. The library automatically optimizes for your specific hardware configuration.
"CUDA SVM library not found"
cd SVM/
make clean && make"CUDA error" during execution
- Check GPU memory usage:
nvidia-smi - Reduce batch size or dataset size
- Ensure CUDA drivers are up to date
- The library will automatically fall back to CPU mode if GPU issues persist
Poor performance compared to CPU
- Small datasets may not benefit from GPU acceleration
- Ensure data is preprocessed (standardized)
- Try different kernel parameters
- Check if CPU mode is active (library automatically chooses optimal hardware)
Out of memory errors
- Reduce dataset size or process in batches
- Use data types with lower precision (float32)
- Check available GPU/CPU memory
- Library automatically manages memory across hardware types
Cross-platform compatibility issues
# Verify library installation
from SVM.cuda_svm import CudaSVC
svc = CudaSVC()
print("Library loaded successfully")
# For advanced hardware detection, use the optimized version:
# from HBM_SVM.cuda_svm_optimized import OptimizedCudaSVMColab-specific issues
- Ensure GPU runtime is enabled in Colab
- The library works on Colab but hardware detection features require the optimized version
- Falls back gracefully to CPU if GPU unavailable
from SVM.cuda_svm import CudaSVC, CudaSVR
# Check classification hardware
svc = CudaSVC()
print(f"SVC Hardware: {'GPU (' + svc.device_name + ')' if svc.cuda_available else 'CPU'}")
# Check regression hardware
svr = CudaSVR()
print(f"SVR Hardware: {'GPU (' + svr.device_name + ')' if svr.cuda_available else 'CPU'}")
# Force CPU mode (for testing)
svc_cpu = CudaSVC(force_cpu=True)
print(f"Forced CPU Mode: {svc_cpu.cuda_available}")import time
# Benchmark your SVM
start_time = time.time()
svc.fit(X_train, y_train)
fit_time = time.time() - start_time
start_time = time.time()
predictions = svc.predict(X_test)
predict_time = time.time() - start_time
print(f"Training time: {fit_time:.2f}s")
print(f"Prediction time: {predict_time:.2f}s")See svm_usage.py for comprehensive examples including:
- Performance comparison with scikit-learn
- Hyperparameter tuning with GridSearchCV
- Advanced usage patterns
- Error handling and best practices
For issues, questions, or contributions:
- Check the troubleshooting section above
- Review the main project README
- Submit issues to the project repository
Note: This implementation provides CUDA acceleration when available. For advanced hardware detection and optimization features, use the optimized version in the HBM_SVM directory. The library is designed to work across platforms with automatic compatibility.