Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion litgpt/deploy/serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,9 @@ def run_server(
The "auto" setting (default) chooses a GPU if available, and otherwise uses a CPU.
port: The network port number on which the model is configured to be served.
stream: Whether to stream the responses.
openai_spec: Whether to use the OpenAISpec.
openai_spec: Whether to use the OpenAISpec and enable OpenAI-compatible API endpoints. When True, the server will provide
`/v1/chat/completions` endpoints that work with the OpenAI SDK and other OpenAI-compatible clients,
making it easy to integrate with existing applications that use the OpenAI API.
access_token: Optional API token to access models with restrictions.
"""
checkpoint_dir = auto_download_checkpoint(model_name=checkpoint_dir, access_token=access_token)
Expand Down
54 changes: 54 additions & 0 deletions tutorials/deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,60 @@ Sure, here is the corrected sentence:
Example input
```

 
## Serve an LLM with OpenAI-compatible API

LitGPT provides OpenAI-compatible endpoints that allow you to use the OpenAI SDK or any OpenAI-compatible client to interact with your models. This is useful for integrating LitGPT into existing applications that use the OpenAI API.

 
### Step 1: Start the server with OpenAI specification

```bash
# 1) Download a pretrained model (alternatively, use your own finetuned model)
litgpt download HuggingFaceTB/SmolLM2-135M-Instruct

# 2) Start the server with OpenAI-compatible endpoints
litgpt serve HuggingFaceTB/SmolLM2-135M-Instruct --openai_spec true
```

> [!TIP]
> The `--openai_spec true` flag enables OpenAI-compatible endpoints at `/v1/chat/completions` instead of the default `/predict` endpoint.

 
### Step 2: Query using OpenAI-compatible endpoints

You can now send requests to the OpenAI-compatible endpoint using curl:

```bash
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "SmolLM2-135M-Instruct",
"messages": [{"role": "user", "content": "Hello! How are you?"}]
}'
```

Or use the OpenAI Python SDK:

```python
from openai import OpenAI

# Configure the client to use your local LitGPT server
client = OpenAI(
base_url="http://127.0.0.1:8000/v1",
api_key="not-needed" # LitGPT doesn't require authentication by default
)

response = client.chat.completions.create(
model="SmolLM2-135M-Instruct",
messages=[
{"role": "user", "content": "Hello! How are you?"}
]
)

print(response.choices[0].message.content)
```

 
## Serve an LLM UI with Chainlit

Expand Down