Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions mcp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.venv/
__pycache__/
*.pyc
*.egg-info/
dist/
build/
.olive-mcp/
.olive-cache/
8 changes: 8 additions & 0 deletions mcp/.mcp.json.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"mcpServers": {
"olive": {
"command": "uv",
"args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"]
}
}
}
125 changes: 125 additions & 0 deletions mcp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Olive MCP Server

MCP server for Microsoft Olive model optimization. Provides tools for model optimization, quantization, fine-tuning, and benchmarking through the [Model Context Protocol](https://modelcontextprotocol.io/).

## Features

| Tool | Description |
|------|-------------|
| `optimize` | End-to-end optimization with automatic pass scheduling |
| `quantize` | Model quantization (RTN, GPTQ, AWQ, HQQ, and more) |
| `finetune` | LoRA / QLoRA fine-tuning (including diffusion LoRA for SD 1.5, SDXL, Flux) |
| `capture_onnx_graph` | Capture ONNX graph via PyTorch Exporter or Model Builder |
| `benchmark` | Model evaluation using lm-eval tasks |
| `detect_hardware` | Auto-detect CPU, RAM, GPU, and disk space for smart defaults |
| `manage_outputs` | List or delete previous optimization outputs |
| `get_job_status` | Check progress of a running job with structured phase detection |
| `cancel_job` | Cancel a running background job |

Each tool runs in an **isolated Python environment** (managed by uv) with the appropriate dependencies, so different onnxruntime variants (CPU, CUDA, DirectML, OpenVINO, etc.) never conflict.

## Prerequisites

- Python 3.10+
- [uv](https://docs.astral.sh/uv/) (recommended) or pip

## Installation

```bash
git clone https://github.com/microsoft/Olive.git
cd Olive/mcp
uv sync
```

## Configuration

All MCP clients use the same server config — only the config file location differs.

**Server definition:**

```json
{
"command": "uv",
"args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"]
}
```

> Replace `/path/to/Olive/mcp` with your actual project path.

| Client | Config file | Key |
|--------|------------|-----|
| **VS Code (Copilot)** | `.vscode/mcp.json` | `servers.olive` |
| **Claude Desktop** | `%APPDATA%\Claude\claude_desktop_config.json` (Win) / `~/Library/Application Support/Claude/claude_desktop_config.json` (Mac) | `mcpServers.olive` |
| **Claude Code** | `.mcp.json` in project root | `mcpServers.olive` |
| **Cursor** | `.cursor/mcp.json` | `mcpServers.olive` |
| **Windsurf** | `~/.codeium/windsurf/mcp_config.json` | `mcpServers.olive` |

<details>
<summary>VS Code example (.vscode/mcp.json)</summary>

```json
{
"servers": {
"olive": {
"type": "stdio",
"command": "uv",
"args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"]
}
}
}
```
</details>

<details>
<summary>Claude Desktop / Claude Code / Cursor / Windsurf example</summary>

```json
{
"mcpServers": {
"olive": {
"command": "uv",
"args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"]
}
}
}
```
</details>

## Usage with VS Code Copilot

1. Open **Copilot Chat** panel (`Ctrl+Alt+I`) and switch to **Agent** mode
2. Click the **Tools** icon to verify the Olive MCP tools are listed
3. Ask Copilot, for example: *"Optimize microsoft/Phi-3-mini-4k-instruct for CPU with int4"*
4. Copilot will ask for your confirmation before calling each MCP tool

## Example Prompts

```
Optimize microsoft/Phi-3-mini-4k-instruct

Quantize microsoft/Phi-3-mini-4k-instruct

Fine-tune microsoft/Phi-3-mini-4k-instruct on nampdn-ai/tiny-codes

Capture ONNX graph from microsoft/Phi-3-mini-4k-instruct

Benchmark microsoft/Phi-3-mini-4k-instruct

Train a LoRA for runwayml/stable-diffusion-v1-5 with dataset linoyts/Tuxemon

What's the best way to optimize Phi-4-mini for my hardware?

What passes are available for int4 quantization?

Help me write a custom Olive config with OnnxQuantization and GraphSurgeries
```

## Output

All optimization outputs are saved to `~/.olive-mcp/outputs/` with timestamped directories.

Completed jobs include:
- **Pass summary** — which passes ran and how long each took
- **File sizes** — output model size (and input model size when available) for before/after comparison
- **Structured progress** — `get_job_status` returns a `phase` field (e.g. "downloading", "quantizing", "saving") in addition to raw logs
- **Smart error suggestions** — if a job fails, actionable suggestions are attached (e.g. "Out of GPU memory, try int4" or "CPU does not support fp16")
19 changes: 19 additions & 0 deletions mcp/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "olive-mcp"
version = "0.1.0"
description = "MCP server for Microsoft Olive model optimization"
requires-python = ">=3.10"
dependencies = [
"mcp[cli]",
"psutil"
]

[project.scripts]
olive-mcp = "olive_mcp:main"

[tool.hatch.build.targets.wheel]
packages = ["src/olive_mcp"]
10 changes: 10 additions & 0 deletions mcp/src/olive_mcp/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import olive_mcp.tools # noqa: F401 — registers @mcp.tool() and @mcp.prompt() on import
from olive_mcp.server import mcp


def main():
mcp.run()
7 changes: 7 additions & 0 deletions mcp/src/olive_mcp/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from olive_mcp import main

main()
86 changes: 86 additions & 0 deletions mcp/src/olive_mcp/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from enum import StrEnum
from pathlib import Path

# ---------------------------------------------------------------------------
# Paths
# ---------------------------------------------------------------------------

VENV_BASE = Path.home() / ".olive-mcp" / "venvs"
OUTPUT_BASE = Path.home() / ".olive-mcp" / "outputs"
WORKER_PATH = Path(__file__).parent / "worker.py"

# Auto-purge venvs not used within this many days.
_VENV_MAX_AGE_DAYS = 14

# ---------------------------------------------------------------------------
# Command names
# ---------------------------------------------------------------------------


class Command(StrEnum):
OPTIMIZE = "optimize"
QUANTIZE = "quantize"
FINETUNE = "finetune"
CAPTURE_ONNX_GRAPH = "capture_onnx_graph"
BENCHMARK = "benchmark"
DIFFUSION_LORA = "diffusion_lora"
EXPLORE_PASSES = "explore_passes"
VALIDATE_CONFIG = "validate_config"
RUN_CONFIG = "run_config"


# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------


class SupportedProvider(StrEnum):
CPU = "CPUExecutionProvider"
CUDA = "CUDAExecutionProvider"
DML = "DmlExecutionProvider"
OPENVINO = "OpenVINOExecutionProvider"
TENSORRT = "TensorrtExecutionProvider"
ROCM = "ROCMExecutionProvider"
QNN = "QNNExecutionProvider"
VITISAI = "VitisAIExecutionProvider"
WEBGPU = "WebGpuExecutionProvider"
NV_TENSORRT_RTX = "NvTensorRTRTXExecutionProvider"


class SupportedPrecision(StrEnum):
FP32 = "fp32"
FP16 = "fp16"
BF16 = "bf16"
INT4 = "int4"
INT8 = "int8"
INT16 = "int16"
INT32 = "int32"
UINT4 = "uint4"
UINT8 = "uint8"
UINT16 = "uint16"
UINT32 = "uint32"


SUPPORTED_QUANT_ALGORITHMS = ["rtn", "gptq", "awq", "hqq"]

# Maps provider → olive-ai extras key for onnxruntime variant
PROVIDER_TO_EXTRAS = {
"CPUExecutionProvider": "cpu",
"CUDAExecutionProvider": "gpu",
"TensorrtExecutionProvider": "gpu",
"ROCMExecutionProvider": "gpu",
"OpenVINOExecutionProvider": "openvino",
"DmlExecutionProvider": "directml",
"QNNExecutionProvider": "qnn",
}

# Maps provider → onnxruntime-genai variant (for ModelBuilder pass)
PROVIDER_TO_GENAI = {
"CPUExecutionProvider": "onnxruntime-genai",
"CUDAExecutionProvider": "onnxruntime-genai-cuda",
"DmlExecutionProvider": "onnxruntime-genai-directml",
}
Loading
Loading