Skip to content

Conversation

@IlyaGusev
Copy link
Owner

Implements five new MCP tools for OpenAlex API:

  • openalex_search_works: Search papers with filters (citations, year, topics)
  • openalex_get_work: Fetch work details by ID or DOI
  • openalex_search_authors: Search authors with metrics (h-index, citations)
  • openalex_get_author: Get author details by ID or ORCID
  • openalex_get_institution: Get institution info by ID or ROR

Also adds CLAUDE.md documentation file for Claude Code guidance.

All tools follow existing patterns with Pydantic models, structured outputs, comprehensive tests (10 tests passing), and strict type checking.

🤖 Generated with Claude Code

Implements five new MCP tools for OpenAlex API:
- openalex_search_works: Search papers with filters (citations, year, topics)
- openalex_get_work: Fetch work details by ID or DOI
- openalex_search_authors: Search authors with metrics (h-index, citations)
- openalex_get_author: Get author details by ID or ORCID
- openalex_get_institution: Get institution info by ID or ROR

Also adds CLAUDE.md documentation file for Claude Code guidance.

All tools follow existing patterns with Pydantic models, structured outputs,
comprehensive tests (10 tests passing), and strict type checking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@github-actions
Copy link

Title

Add OpenAlex integration with comprehensive search and retrieval tools


PR Type

Enhancement


Description

  • Add five OpenAlex API integration tools for comprehensive academic search

  • Support searching works with filters (citations, year, topics) and retrieving detailed work information

  • Enable author search with metrics (h-index, citations) and detailed author profiles

  • Add institution lookup by OpenAlex ID or ROR identifier

  • Include CLAUDE.md documentation for Claude Code development guidance


Diagram Walkthrough

flowchart LR
  openalex["OpenAlex API"] -- "search/retrieve" --> works["Works Search & Details"]
  openalex -- "search/retrieve" --> authors["Authors Search & Profiles"]
  openalex -- "retrieve" --> institutions["Institution Info"]
  works --> server["MCP Server"]
  authors --> server
  institutions --> server
Loading

File Walkthrough

Relevant files
Enhancement
openalex.py
Create OpenAlex API integration module with five search/retrieval
tools

academia_mcp/tools/openalex.py

  • Implement five OpenAlex API tools: openalex_search_works,
    openalex_get_work, openalex_search_authors, openalex_get_author,
    openalex_get_institution
  • Define Pydantic models for structured responses: OpenAlexWorkEntry,
    OpenAlexAuthorEntry, OpenAlexInstitutionInfo, and search response
    models
  • Support filtering by citation count, publication year, and sorting
    options
  • Handle multiple identifier formats (DOI, ORCID, ROR) with automatic
    prefix normalization
+334/-0 
server.py
Register OpenAlex tools in MCP server                                       

academia_mcp/server.py

  • Import five OpenAlex tools from academia_mcp.tools.openalex
  • Register all OpenAlex tools with structured_output=True in server
    initialization
+12/-0   
__init__.py
Export OpenAlex tools from tools module                                   

academia_mcp/tools/init.py

  • Export five OpenAlex tool functions in module __all__ list
  • Add import statement for OpenAlex tools
+12/-0   
Tests
test_openalex.py
Add comprehensive test suite for OpenAlex integration       

tests/test_openalex.py

  • Add 10 comprehensive tests covering all OpenAlex tools
  • Test search functionality with filters, pagination, and sorting
  • Verify work/author/institution retrieval by various identifier types
  • Validate response structure and data integrity
+86/-0   
Documentation
CLAUDE.md
Add Claude Code development documentation                               

CLAUDE.md

  • Document project overview, architecture, and development workflows
  • Provide guidance on adding new tools, testing, and code style
  • Include setup instructions, validation commands, and environment
    configuration
  • Specify documentation standards (docstrings only for MCP tools)
+237/-0 

@github-actions
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Hardcoded Email

The MAILTO constant uses a placeholder email '[email protected]'. This should be configurable via environment variable or settings to allow users to provide their own contact email as recommended by OpenAlex API guidelines.

Missing Error Handling

Functions like openalex_search_works, openalex_get_work, etc. call get_with_retries and directly access JSON response fields without try-catch blocks. If the API returns unexpected data structure or error responses, this could cause unhandled exceptions. Consider adding error handling for malformed responses.

response = get_with_retries(WORKS_SEARCH_URL, params=params)
data = response.json()

results = [_extract_work_info(work) for work in data.get("results", [])]
meta = data.get("meta", {})

return OpenAlexSearchResponse(
    total_count=meta.get("count", 0),
    returned_count=len(results),
    offset=offset,
    results=results,
)
Input Validation

The assertion-based validation (e.g., lines 161-169) will raise AssertionError with potentially unclear messages. Consider using proper validation with descriptive error messages or Pydantic validators for better user experience and debugging.

assert isinstance(query, str), "query must be a string"
assert query.strip(), "query cannot be empty"
assert isinstance(offset, int) and offset >= 0, "offset must be non-negative integer"
assert isinstance(limit, int) and 0 < limit <= 200, "limit must be between 1 and 200"
assert sort_by in [
    "relevance",
    "cited_by_count",
    "publication_date",
], "Invalid sort_by option"

@github-actions
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Make OpenAlex email configurable

The hardcoded email address [email protected] should be configurable via
environment variables or settings. OpenAlex requires a valid email for polite API
usage, and using a placeholder example.com address may violate their terms of
service or result in rate limiting.

academia_mcp/tools/openalex.py [14]

-MAILTO = "[email protected]"
+from academia_mcp.settings import settings
 
+MAILTO = getattr(settings, 'OPENALEX_MAILTO', '[email protected]')
+
Suggestion importance[1-10]: 7

__

Why: Using a configurable email is important for OpenAlex API compliance and avoiding rate limiting. However, the suggestion to use getattr(settings, 'OPENALEX_MAILTO', ...) is not the standard pattern for this codebase - it should use pydantic-settings properly. The issue is valid but implementation needs adjustment.

Medium
Improve ORCID identifier validation

The ORCID validation only checks the prefix "0000-" but valid ORCIDs can start with
other prefixes. This could fail to properly format valid ORCID identifiers that
don't start with "0000-", potentially causing API lookup failures.

academia_mcp/tools/openalex.py [290-293]

-if author_id.startswith("0000-"):
+import re
+if re.match(r'^\d{4}-\d{4}-\d{4}-\d{3}[0-9X]$', author_id):
     author_id = f"orcid:{author_id}"
 elif author_id.startswith("https://orcid.org/"):
     author_id = author_id.replace("https://orcid.org/", "orcid:")
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that ORCID validation is incomplete - valid ORCIDs don't always start with "0000-". The regex pattern improves validation accuracy, though it adds a dependency on re module and increases complexity for a relatively minor edge case.

Low
Maintain consistent author list format

When truncating author lists, the function returns a mixed list containing both
author names (strings) and a summary string. This inconsistent data type could cause
issues for consumers expecting a uniform list of author names.

academia_mcp/tools/openalex.py [77-79]

 if len(names) > 10:
-    return names[:10] + [f"and {len(names) - 10} more authors"]
+    return names[:10] + ["et al."]
 return names
Suggestion importance[1-10]: 4

__

Why: While the suggestion correctly identifies a type inconsistency, the proposed fix using "et al." loses information about the number of additional authors. The current approach provides more context, and the type inconsistency is minor since consumers typically display these as strings anyway.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants