A comprehensive platform for fine-tuning, managing, and deploying large language models with VLLM, openweb-ui and Unsloth.
Features • Architecture • Installation • Usage • Configuration • Troubleshooting
Llora Lab provides a complete environment for experimenting with large language models - from dataset preparation to fine-tuning with LoRA adapters to deployment and testing. It combines user-friendly interfaces with powerful backend capabilities, making advanced LLM workflows accessible to both researchers and developers.
- Simplified Workflows: From raw datasets to deployed models in just a few clicks
- Resource Efficiency: Fine-tune powerful models with affordable hardware requirements
- Complete Solution: Everything you need for the full LLM lifecycle in one integrated platform
- Docker-based: Easy deployment with containers that handle the complexity for you
- Model Management: Import, configure, and organize LLM models from Hugging Face
- LoRA Adapter Training: Create and train efficient adapters on custom datasets
- Dataset Handling: Upload, preview, and manage training datasets in JSONL format
- Serving Interface: Deploy models with an OpenAI-compatible API endpoint
- Testing Environment: Test models directly within the UI or via API
- System Monitoring: Track GPU usage, memory, and container status
- Real-time Logs: Access training and serving logs in real-time
Llora Lab is built as a containerized application with several core components:
- Admin API: FastAPI backend that orchestrates the entire system
- Admin UI: React-based interface for managing all operations
- Trainer: Container for running model fine-tuning jobs
- vLLM Server: High-performance inference server for model deployment
- Open WebUI: Chat interface for interacting with deployed models
- Docker and Docker Compose
- NVIDIA GPU with CUDA support
- NVIDIA Container Toolkit installed (for GPU access)
- 16GB+ system RAM (32GB+ recommended)
- 100GB+ disk space for models and datasets
-
Clone the repository:
git clone https://github.com/yourusername/llora-lab.git cd llora-lab -
Create a
.envfile with your Hugging Face token:echo "HF_TOKEN=your_huggingface_token" > .env
-
Build and start the services:
make build make build-ui # Build the admin UI frontend make start -
Access the admin interface at http://localhost:3001
Alternatively, you can use Docker Compose directly:
# Build the images
docker compose build admin-api trainer vllm
# Start the admin services
docker compose up -d admin-api admin-ui- Configure Models: Add model configurations from Hugging Face
- Upload Datasets: Prepare and upload training data in JSONL format
- Create Adapters: Configure LoRA adapters for your models
- Train Adapters: Start training jobs with your datasets
- Deploy Models: Serve models with or without adapters
- Test Models: Interact with your deployed models through the UI or API
- Navigate to the "Models" tab
- Click "Add Model"
- Enter the Hugging Face model ID (e.g.,
meta-llama/Llama-3.1-8B-Instruct) - Configure model parameters as needed
- Click "Save Model"
- Navigate to the "Datasets" tab
- Click "Upload Dataset"
- Select a JSONL file with your training data
- Wait for validation and processing
- Preview the dataset to ensure proper formatting
- Navigate to the "Adapters" tab
- Click "Create Adapter"
- Select a base model and dataset
- Configure LoRA parameters (rank, alpha, etc.)
- Click "Start Training"
- Monitor progress in the "Training" tab
- Navigate to the "Serving" tab
- Select a model and optionally an adapter
- Click "Start Serving"
- Wait for initialization to complete
- Access your model via the API or the integrated chat UI
- Use the built-in testing interface in the "Serving" tab
- Access the OpenWebUI chat interface at http://localhost:3000
- Connect via the OpenAI-compatible API at http://localhost:8000/v1
Create a .env file with these options:
# Required
HF_TOKEN=your_huggingface_token
# Optional
LOG_LEVEL=info # Log level (debug, info, warning, error)
CORS_ORIGINS=* # CORS allowed origins
CUDA_VERSION=124 # CUDA version for the trainerModel configs support these parameters:
- name: Unique identifier for the model
- model_id: HuggingFace model ID
- quantization: Quantization method (bitsandbytes, awq, gptq, gguf)
- max_model_len: Maximum sequence length
- gpu_memory_utilization: GPU memory usage (0.0-1.0)
- tensor_parallel_size: Number of GPUs for tensor parallelism
Adapter configs support these parameters:
- name: Unique identifier for the adapter
- base_model: Reference to a configured model
- dataset: Training dataset filename
- lora_rank: LoRA rank parameter (typically 8-64)
- lora_alpha: LoRA alpha parameter (typically 16-32)
- steps: Number of training steps
- learning_rate: Learning rate for training
- Ensure NVIDIA drivers are installed and up-to-date
- Verify the NVIDIA Container Toolkit is properly installed
- Run
nvidia-smito confirm GPU is accessible - Check GPU memory availability before starting jobs
- View container logs:
make logs service=admin-api - Check container status:
docker compose ps - If containers are stuck, try stopping and restarting:
make stop && make start
- Verify dataset format is correct JSONL
- Check adapter parameters are appropriate for your hardware
- Ensure sufficient disk space for model weights
- Review logs in the UI or with
make logs service=trainer
- Make sure there's enough GPU memory for the selected model
- Verify no other serving containers are running
- Check network connectivity between containers
- Review logs in the UI or with
make logs service=vllm
llora-lab/
├── admin/ # Admin API backend
├── admin-ui/ # React frontend
├── configs/ # Model and adapter configurations
├── docker/ # Dockerfiles for services
├── datasets/ # Training datasets
├── adapters/ # Trained adapters
├── logs/ # Log files
├── huggingface-cache/ # Cached model files
└── scripts/ # Utility scripts
The UI uses Vite and React:
cd admin-ui
npm install
npm run dev # Development mode
npm run build # Production buildThe Admin API uses FastAPI:
cd admin
pip install -r requirements.txt
uvicorn main:app --reload # Development modeThis project is licensed under the MIT License.
- vLLM for high-performance inference
- Open WebUI for the chat interface
- Hugging Face for model hosting and libraries
- FastAPI and React for the tech stack
