Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,19 @@ All notable changes to OpenContracts will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased] - 2025-12-28

### Added

#### MCP (Model Context Protocol) Interface Proposal (Issue #387)
- **Comprehensive MCP interface design** (`docs/mcp/mcp_interface_proposal.md`): Read-only access to public OpenContracts resources for AI assistants
- **4 resource types**: corpus, document, annotation, thread - with hierarchical URI patterns
- **7 tools for discovery and retrieval**: `list_public_corpuses`, `list_documents`, `get_document_text`, `list_annotations`, `search_corpus`, `list_threads`, `get_thread_messages`
- **Anonymous user permission model**: Operates as AnonymousUser with automatic filtering to `is_public=True` resources
- **Synchronous Django ORM implementation**: Uses `sync_to_async` wrapper pattern for MCP server integration
- **Performance optimizations**: Uses existing `AnnotationQueryOptimizer`, `prefetch_related` for threaded messages, and proper pagination
- **Robust URI parsing**: Regex-based URI parsing with slug validation to prevent injection attacks
- **Helper function implementations**: Complete `format_*` functions for corpus, document, annotation, thread, and message formatting
## [Unreleased] - 2025-12-27

### Added
Expand Down
25 changes: 24 additions & 1 deletion config/asgi.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
from config.websocket.consumers.unified_agent_conversation import ( # noqa: E402
UnifiedAgentConsumer,
)
from opencontractserver.mcp.server import mcp_asgi_app # noqa: E402

logger = logging.getLogger(__name__)

Expand All @@ -52,6 +53,28 @@
# This application object is used by any ASGI server configured to use this file.
django_application = get_asgi_application()


def create_http_router(django_app, mcp_app):
"""
Create an HTTP router that dispatches to MCP or Django based on path.

Routes /mcp and /mcp/* to the MCP ASGI app, everything else to Django.
The MCP server uses Streamable HTTP transport in stateless mode.
"""

async def router(scope, receive, send):
path = scope.get("path", "")
# Match /mcp exactly or /mcp/* paths
if path == "/mcp" or path.startswith("/mcp/"):
await mcp_app(scope, receive, send)
else:
await django_app(scope, receive, send)

return router


http_application = create_http_router(django_application, mcp_asgi_app)

document_query_pattern = re_path(
r"ws/document/(?P<document_id>[-a-zA-Z0-9_=]+)/query/(?:corpus/(?P<corpus_id>[-a-zA-Z0-9_=]+)/)?$",
DocumentQueryConsumer.as_asgi(),
Expand Down Expand Up @@ -119,7 +142,7 @@
# 4. URL routing
application = ProtocolTypeRouter(
{
"http": django_application,
"http": http_application, # Routes /mcp/* to MCP, rest to Django
"websocket": websocket_auth_middleware(URLRouter(websocket_urlpatterns)),
}
)
Expand Down
15 changes: 14 additions & 1 deletion config/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -720,7 +720,7 @@
"http://127.0.0.1:5173",
]

DEFAULT_IMAGE = """""" # noqa
DEFAULT_IMAGE = """""" # noqa

# Model paths
DOCLING_MODELS_PATH = env.str("DOCLING_MODELS_PATH", default="/models/docling")
Expand Down Expand Up @@ -983,3 +983,16 @@
)
POSTHOG_HOST = env.str("POSTHOG_HOST", default="https://us.i.posthog.com")
MODE = "LOCAL"

# MCP Server Configuration
# ------------------------------------------------------------------------------
# See docs/mcp/mcp_interface_proposal.md for details
MCP_SERVER = {
"enabled": env.bool("MCP_SERVER_ENABLED", default=False),
"max_results_per_page": env.int("MCP_MAX_RESULTS_PER_PAGE", default=100),
"rate_limit": {
"requests": env.int("MCP_RATE_LIMIT_REQUESTS", default=100),
"window": env.int("MCP_RATE_LIMIT_WINDOW", default=60),
},
"cache_ttl": env.int("MCP_CACHE_TTL", default=300),
}
174 changes: 174 additions & 0 deletions docs/mcp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# OpenContracts MCP Server

## TL;DR

OpenContracts exposes a read-only [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) server for AI assistants to access **public** corpuses, documents, annotations, and discussion threads.

- **Endpoint**: `POST /mcp/` (Streamable HTTP, stateless)
- **Scope**: Public resources only (anonymous user visibility)
- **Auth**: None required (public data only)

### Claude Desktop Quick Start

Add to `~/.config/Claude/claude_desktop_config.json`:

```json
{
"mcpServers": {
"opencontracts": {
"command": "npx",
"args": [
"mcp-remote",
"https://your-instance.com/mcp/"
]
}
}
}
```

---

## Available Tools

| Tool | Description |
|------|-------------|
| `list_public_corpuses` | List all public corpuses (paginated, searchable) |
| `list_documents` | List documents in a corpus |
| `get_document_text` | Get full extracted text from a document |
| `list_annotations` | List annotations on a document (filter by page/label) |
| `search_corpus` | Semantic vector search within a corpus |
| `list_threads` | List discussion threads in a corpus |
| `get_thread_messages` | Get messages in a thread (flat or hierarchical) |

## Available Resources

Resources use URI patterns for direct access:

| URI Pattern | Description |
|-------------|-------------|
| `corpus://{corpus_slug}` | Corpus metadata and document list |
| `document://{corpus_slug}/{document_slug}` | Document with extracted text |
| `annotation://{corpus_slug}/{document_slug}/{annotation_id}` | Specific annotation |
| `thread://{corpus_slug}/threads/{thread_id}` | Thread with messages |

---

## Transport Options

### HTTP (Streamable HTTP)

The primary transport. Stateless mode - each request is independent.

```bash
# Test with curl
curl -X POST https://your-instance.com/mcp/ \
-H "Content-Type: application/json" \
-d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}'
```

### stdio (CLI)

For local development or direct integration:

```bash
cd /path/to/OpenContracts
python -m opencontractserver.mcp.server
```

---

## Example Usage

### List Public Corpuses

```json
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "list_public_corpuses",
"arguments": {"limit": 10}
},
"id": 1
}
```

### Semantic Search

```json
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "search_corpus",
"arguments": {
"corpus_slug": "my-corpus",
"query": "indemnification clause",
"limit": 5
}
},
"id": 2
}
```

### Read Resource

```json
{
"jsonrpc": "2.0",
"method": "resources/read",
"params": {
"uri": "document://my-corpus/contract-2024"
},
"id": 3
}
```

---

## Architecture

```
┌─────────────────┐ POST /mcp/ ┌──────────────────────┐
│ MCP Client │ ◄────────────────► │ StreamableHTTP │
│ (Claude, etc) │ JSON-RPC 2.0 │ Session Manager │
└─────────────────┘ │ (stateless mode) │
└──────────┬───────────┘
┌──────────▼───────────┐
│ MCP Server │
│ - Tools (7) │
│ - Resources (4) │
└──────────┬───────────┘
┌──────────▼───────────┐
│ Django ORM │
│ visible_to_user() │
│ (AnonymousUser) │
└──────────────────────┘
```

**Key files**:
- `opencontractserver/mcp/server.py` - Server setup, ASGI app, URI parsing
- `opencontractserver/mcp/tools.py` - Tool implementations
- `opencontractserver/mcp/resources.py` - Resource handlers
- `opencontractserver/mcp/formatters.py` - Response formatters
- `config/asgi.py` - HTTP routing (`/mcp/*` → MCP app)

---

## Security Model

- **Read-only**: No mutations, no writes
- **Public only**: Uses `AnonymousUser` for all permission checks
- **Slug-based**: All identifiers are URL-safe slugs (no internal IDs exposed)
- **No auth required**: Only public resources are accessible

---

## Limitations

- No authentication (future: JWT/API key support for private resources)
- No write operations (by design)
- No streaming of large documents (text returned in full)
- Semantic search requires corpus to have embeddings configured
Loading
Loading