Skip to content

Commit b3c7737

Browse files
committed
CLAUDE.md
1 parent 82c70fa commit b3c7737

File tree

1 file changed

+237
-0
lines changed

1 file changed

+237
-0
lines changed

CLAUDE.md

Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
Academia MCP is an MCP (Model Context Protocol) server that provides tools for searching, fetching, analyzing, and reporting on scientific papers and datasets. It integrates with multiple academic APIs (ArXiv, ACL Anthology, Semantic Scholar, Hugging Face) and web search providers (Exa, Brave, Tavily), plus optional LLM-powered document analysis tools.
8+
9+
**Key Features:**
10+
- ArXiv and ACL Anthology search/download
11+
- OpenAlex comprehensive search (works, authors, institutions)
12+
- Semantic Scholar citation graphs
13+
- Hugging Face datasets search
14+
- Web search and page crawling
15+
- LaTeX compilation and PDF reading
16+
- LLM-powered document QA and research proposal workflows
17+
18+
**Tech Stack:**
19+
- Python 3.12+ with type hints (strict mypy)
20+
- FastMCP framework for the MCP server
21+
- OpenAI SDK for LLM calls (via OpenRouter)
22+
- Pydantic for data models and settings
23+
- Fire for CLI argument parsing
24+
- Multiple transport options: stdio, SSE, streamable-http
25+
26+
## Development Commands
27+
28+
**IMPORTANT: Always prefer `make` commands when available.** The Makefile provides consistent, tested workflows.
29+
30+
### Setup
31+
```bash
32+
# Create virtual environment and install dependencies
33+
uv venv .venv
34+
make install
35+
```
36+
37+
### Validation (ALWAYS run before committing)
38+
```bash
39+
# Format code with black (line length: 100)
40+
make black
41+
42+
# Run all validation: black, flake8, mypy --strict
43+
make validate
44+
```
45+
46+
This is the most important command - run `make validate` frequently during development.
47+
48+
### Testing
49+
```bash
50+
# Run full test suite (via make)
51+
make test
52+
53+
# Run a single test file
54+
uv run pytest -s ./tests/test_arxiv_search.py
55+
56+
# Run a specific test
57+
uv run pytest -s ./tests/test_arxiv_search.py::test_arxiv_search
58+
```
59+
60+
### Running the Server Locally
61+
```bash
62+
# Run with streamable-http (default, port 5056)
63+
uv run -m academia_mcp --transport streamable-http
64+
65+
# Run with stdio (for Claude Desktop)
66+
uv run -m academia_mcp --transport stdio
67+
68+
# Run with custom port
69+
uv run -m academia_mcp --transport streamable-http --port 8080
70+
```
71+
72+
### Publishing
73+
```bash
74+
make publish # Builds and publishes to PyPI
75+
```
76+
77+
## Architecture
78+
79+
### Server Initialization (server.py)
80+
81+
The `create_server()` function in `academia_mcp/server.py` is the heart of the application:
82+
83+
1. **Core Tools** (always available): arxiv_search, arxiv_download, anthology_search, openalex_* (OpenAlex search), s2_* (Semantic Scholar), hf_datasets_search, visit_webpage, get_latex_templates_list, show_image, yt_transcript
84+
85+
2. **Conditional Tool Registration** (based on environment variables):
86+
- `WORKSPACE_DIR` set → enables compile_latex, download_pdf_paper, read_pdf
87+
- `OPENROUTER_API_KEY` set → enables LLM tools (document_qa, review_pdf_paper, bitflip tools, describe_image)
88+
- `EXA_API_KEY`/`BRAVE_API_KEY`/`TAVILY_API_KEY` set → enables respective web_search tools
89+
90+
3. **Transport Modes**:
91+
- `stdio`: for local MCP clients (Claude Desktop)
92+
- `streamable-http`: HTTP with CORS enabled for browser clients
93+
- `sse`: server-sent events
94+
95+
### Tool Structure
96+
97+
All tools live in `academia_mcp/tools/` and follow this pattern:
98+
- Each tool is a standalone async function with type hints
99+
- Tools use Pydantic models for inputs/outputs (enables structured_output mode)
100+
- Most tools are registered with `structured_output=True` for schema validation
101+
- Tools import from shared utilities (`utils.py`, `llm.py`, `settings.py`)
102+
103+
**Key Tool Categories:**
104+
- **Search tools**: arxiv_search.py, anthology_search.py, openalex.py, s2.py, hf_datasets_search.py, web_search.py
105+
- **Fetch/download tools**: arxiv_download.py, visit_webpage.py, review.py
106+
- **Document processing**: latex.py (compile_latex, read_pdf), image_processing.py
107+
- **LLM-powered tools**: document_qa.py, bitflip.py (research proposals), review.py
108+
109+
### Settings Management (settings.py)
110+
111+
Uses `pydantic-settings` to load configuration from `.env` file or environment variables:
112+
- API keys: OPENROUTER_API_KEY, TAVILY_API_KEY, EXA_API_KEY, BRAVE_API_KEY, OPENAI_API_KEY
113+
- Model names: REVIEW_MODEL_NAME, BITFLIP_MODEL_NAME, DOCUMENT_QA_MODEL_NAME, DESCRIBE_IMAGE_MODEL_NAME
114+
- Workspace: WORKSPACE_DIR (Path), PORT (int)
115+
- All settings accessible via `from academia_mcp.settings import settings`
116+
117+
### LLM Integration (llm.py)
118+
119+
Two main functions for calling LLMs via OpenRouter:
120+
- `llm_acall()`: unstructured text response
121+
- `llm_acall_structured()`: structured response with Pydantic validation (uses OpenAI's `.parse()` with retry logic)
122+
123+
Both use `ChatMessage` model for message formatting.
124+
125+
### Utilities (utils.py)
126+
127+
Common helper functions used across tools:
128+
- `get_with_retries()`: HTTP GET with retry logic
129+
- File handling utilities
130+
- Text processing helpers
131+
132+
## Adding New Tools
133+
134+
To add a new tool:
135+
136+
1. Create a new file in `academia_mcp/tools/` (e.g., `my_tool.py`)
137+
2. Define Pydantic models for input/output if using structured output
138+
3. Implement an async function with proper type hints
139+
4. Export the function in `academia_mcp/tools/__init__.py`
140+
5. Register the tool in `create_server()` in `academia_mcp/server.py`
141+
6. Add tests in `tests/test_my_tool.py`
142+
143+
Example pattern:
144+
```python
145+
from pydantic import BaseModel, Field
146+
147+
class MyToolInput(BaseModel):
148+
query: str = Field(description="Search query")
149+
150+
class MyToolOutput(BaseModel):
151+
result: str = Field(description="Result")
152+
153+
async def my_tool(query: str) -> MyToolOutput:
154+
# Implementation
155+
return MyToolOutput(result="...")
156+
```
157+
158+
Then in server.py:
159+
```python
160+
from academia_mcp.tools.my_tool import my_tool
161+
# ...
162+
server.add_tool(my_tool, structured_output=True)
163+
```
164+
165+
## Testing Notes
166+
167+
- Tests use pytest with asyncio support (see `pytest.ini_options` in pyproject.toml)
168+
- `conftest.py` contains shared fixtures
169+
- Tests requiring API keys should check for env vars or use mocking
170+
- Workspace-dependent tests use `tests/workdir/` for temporary files
171+
172+
## Code Style
173+
174+
- Line length: 100 characters (black)
175+
- Strict mypy type checking
176+
- Import sorting with isort
177+
- All public APIs should have type hints
178+
- Use Pydantic models for data validation
179+
180+
### Comments and Documentation
181+
182+
**DO NOT write inline comments explaining what code does.** The code should be self-explanatory through:
183+
- Clear variable and function names
184+
- Type hints
185+
- Well-structured code
186+
187+
**ONLY write docstrings for MCP tools** (functions registered with `server.add_tool()`). These docstrings become the tool descriptions in the MCP protocol, so they must clearly explain:
188+
- What the tool does
189+
- What parameters it accepts
190+
- What it returns
191+
192+
Example of acceptable docstring for an MCP tool:
193+
```python
194+
async def arxiv_search(query: str, limit: int = 10) -> ArxivSearchResponse:
195+
"""
196+
Search arXiv for papers matching the query.
197+
198+
Supports field-specific queries (e.g., 'ti:neural networks' for title search).
199+
Returns paper metadata including title, authors, abstract, and arXiv ID.
200+
"""
201+
...
202+
```
203+
204+
**Do not write docstrings** for internal helper functions, utilities, or Pydantic models - type hints and clear naming are sufficient.
205+
206+
## Environment Variables for Testing
207+
208+
When testing locally, create a `.env` file in the project root:
209+
```
210+
OPENROUTER_API_KEY=your_key_here
211+
WORKSPACE_DIR=/path/to/workspace
212+
# Optional: EXA_API_KEY, BRAVE_API_KEY, TAVILY_API_KEY, OPENAI_API_KEY
213+
```
214+
215+
## LaTeX/PDF Requirements
216+
217+
For LaTeX compilation and PDF processing:
218+
- Install TeX Live: `sudo apt install texlive-latex-base texlive-fonts-recommended texlive-latex-extra texlive-science latexmk`
219+
- Ensure `pdflatex` and `latexmk` are on PATH
220+
221+
## Docker
222+
223+
Pre-built image available: `phoenix120/academia_mcp`
224+
225+
Build locally:
226+
```bash
227+
docker build -t academia_mcp .
228+
```
229+
230+
Run with workspace volume:
231+
```bash
232+
docker run --rm -p 5056:5056 \
233+
-e OPENROUTER_API_KEY=your_key \
234+
-e WORKSPACE_DIR=/workspace \
235+
-v "$PWD/workdir:/workspace" \
236+
academia_mcp
237+
```

0 commit comments

Comments
 (0)