Skip to content

Conversation

@CharlesR-W
Copy link

Summary

Adds an optional --server-port CLI argument that allows connecting to an external vLLM server via its OpenAI-compatible API, instead of loading the model in-process.

This is useful when:

  • Sharing a single model server across multiple delphi runs
  • Managing GPU memory separately from the delphi process
  • Running the model on a different machine

Usage

# Start vLLM server separately
vllm serve meta-llama/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 --port 8000

# Run delphi with server mode
python -m delphi --server-port 8000 ...

Default behavior is unchanged (server_port=None loads model locally as before).

Changes

  • delphi/clients/offline.py: Add server_port parameter, _generate_server() method for OpenAI API calls
  • delphi/config.py: Add server_port config field
  • delphi/__main__.py: Pass server_port to Offline client
  • pyproject.toml: Add openai>=1.0.0 dependency

🤖 Generated with Claude Code

When set, uses OpenAI-compatible API to talk to a vLLM server instead
of loading the model in-process. Default behavior unchanged.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@CLAassistant
Copy link

CLAassistant commented Jan 15, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants