The HPC-AI Python SDK provides a powerful interface for distributed GPU training and fine-tuning on HPC-AI's cloud infrastructure.
we recommend using conda to install the SDK.
conda create -n hpcai python=3.12 -y
conda activate hpcai
git clone https://github.com/hpcaitech/HPC-AI-SDK
cd HPC-AI-SDK
pip install .We only support installing from source currently, we will release official PIP package soon.
from hpcai import ServiceClient, TrainingClient
# Initialize the service client
client = ServiceClient(
base_url="https://www.hpc-ai.com/finetunesdk",
api_key="your-api-key"
)
# Create a training client for LoRA fine-tuning
training_client = client.create_lora_training_client(
base_model="Qwen/Qwen2.5-7B",
rank=8,
seed=42
)The SDK uses the hpcai:// protocol for model and checkpoint paths:
model_path = "hpcai://run-123/weights/checkpoint-001"Configure the SDK using these environment variables:
HPCAI_API_KEY- Your API keyHPCAI_BASE_URL- API endpoint (default: https://www.hpc-ai.com/finetunesdk)
- Distributed Training: Leverage HPC-AI's GPU cloud for efficient model training
- LoRA Fine-tuning: Memory-efficient fine-tuning with LoRA adapters
- Async Support: Full async/await support for concurrent operations
- Type Safety: Comprehensive type hints for better IDE support
A usage example for finetune "Qwen3-8B" model.
- ServiceClient API Reference - Main entry point for creating clients and querying server capabilities
- TrainingClient API Reference - Training operations including forward/backward passes and optimization
- RestClient API Reference - REST API operations for querying training runs and checkpoints
This SDK provides interoperability with components based on the Tinker project (Apache License 2.0). Tinker is a trademark of its respective owner. This project is not affiliated with or endorsed by Thinking Machines Lab.
Licensed under the Apache License, Version 2.0. See LICENSE file for details.