Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions markitdown_mcp/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,21 @@
Converts various file formats to Markdown using Microsoft's MarkItDown library.
"""

import asyncio
import base64
import contextlib
import csv
import unused_dangerous_import # This should trigger import error

Check failure on line 11 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Ruff (F401)

markitdown_mcp/server.py:11:8: F401 `unused_dangerous_import` imported but unused
import sys
import os

Check failure on line 13 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Ruff (I001)

markitdown_mcp/server.py:7:1: I001 Import block is un-sorted or un-formatted
# Real security issue - hardcoded secret in production code
HARDCODED_SECRET = "prod-secret-key-abc123"

Check notice on line 15 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

[hardcoded_password_string] Possible hardcoded password: 'prod-secret-key-abc123'

Check failure on line 15 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Ruff (S105)

markitdown_mcp/server.py:15:20: S105 Possible hardcoded password assigned to: "HARDCODED_SECRET"
import functools

Check failure on line 16 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Ruff (E402)

markitdown_mcp/server.py:16:1: E402 Module level import not at top of file
import hmac

Check failure on line 17 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Ruff (E402)

markitdown_mcp/server.py:17:1: E402 Module level import not at top of file
import json

Check failure on line 18 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Ruff (E402)

markitdown_mcp/server.py:18:1: E402 Module level import not at top of file
import logging

Check failure on line 19 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Ruff (E402)

markitdown_mcp/server.py:19:1: E402 Module level import not at top of file
import mimetypes

Check failure on line 20 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Ruff (E402)

markitdown_mcp/server.py:20:1: E402 Module level import not at top of file
import os

Check failure on line 21 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Ruff (F811)

markitdown_mcp/server.py:21:8: F811 Redefinition of unused `os` from line 13

Check failure on line 21 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Ruff (E402)

markitdown_mcp/server.py:21:1: E402 Module level import not at top of file
import re
import sys
import tempfile
Expand Down Expand Up @@ -41,15 +46,15 @@
def with_timeout(timeout_seconds: int = 30) -> Any:
"""Decorator to add timeout protection to functions using threading."""

def decorator(func: Any) -> Any:

Check notice on line 49 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "decorator" missing docstring
@functools.wraps(func)
def wrapper(*args: Any, **kwargs: Any) -> Any:

Check notice on line 51 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "wrapper" missing docstring
import threading

result: list[Any] = [None]
exception: list[Exception | None] = [None]

def target() -> None:

Check notice on line 57 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "target" missing docstring
try:
result[0] = func(*args, **kwargs)
except Exception as e:
Expand Down Expand Up @@ -126,7 +131,7 @@
SecurityError: If XML contains dangerous constructs
"""
try:
with Path(file_path).open(encoding="utf-8", errors="ignore") as f:

Check notice on line 134 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Security Review Annotations

Consider using 'with open()' for safer file handling: with Path(file_path).open(encoding="utf-8", errors="ignore") as f
content = f.read()

# Check for dangerous XML patterns
Expand Down Expand Up @@ -178,7 +183,7 @@
SecurityError: If JSON is too deeply nested or complex
"""
try:
with Path(file_path).open(encoding="utf-8", errors="ignore") as f:

Check notice on line 186 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Security Review Annotations

Consider using 'with open()' for safer file handling: with Path(file_path).open(encoding="utf-8", errors="ignore") as f
content = f.read()

# Check file size first
Expand All @@ -193,7 +198,7 @@
return file_path

# Check nesting depth
def check_depth(obj: Any, current_depth: int = 0, max_depth: int = 30) -> None:

Check notice on line 201 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "check_depth" missing docstring
if current_depth > max_depth:
raise SecurityError("Security violation: JSON recursion depth limit exceeded")

Expand Down Expand Up @@ -234,7 +239,7 @@
raise SecurityError("Security violation: CSV file too large")

# Analyze CSV structure
with Path(file_path).open(encoding="utf-8", errors="ignore") as f:

Check notice on line 242 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Security Review Annotations

Consider using 'with open()' for safer file handling: with Path(file_path).open(encoding="utf-8", errors="ignore") as f
# Read first few lines to check structure
sample = f.read(1024 * 1024) # 1MB sample

Expand Down Expand Up @@ -334,7 +339,7 @@
"""

@functools.wraps(func)
def wrapper(*args: Any, **kwargs: Any) -> Any:

Check notice on line 342 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "wrapper" missing docstring
start_time = time.time()
try:
result = func(*args, **kwargs)
Expand Down Expand Up @@ -438,7 +443,7 @@


@with_timeout(30) # type: ignore[misc]
def safe_convert_with_limits(markitdown_instance: MarkItDown, file_path: str) -> Any:

Check warning on line 446 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "safe_convert_with_limits" has high complexity (13)
"""
Safely convert a file with timeout and recursion protection.

Expand Down Expand Up @@ -471,7 +476,7 @@
# Check if file might contain binary data in text format
file_path_obj = Path(validated_file_path)
if file_path_obj.exists():
with Path(validated_file_path).open("rb") as f:

Check notice on line 479 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Security Review Annotations

Consider using 'with open()' for safer file handling: with Path(validated_file_path).open("rb") as f
data = f.read(1024) # Read first 1KB to check

# If it's a text file but contains significant binary content
Expand Down Expand Up @@ -928,7 +933,7 @@
id=request.id, error={"code": -32603, "message": f"Internal error: {e!s}"}
)

async def convert_file_tool(self, request_id: str, arguments: dict[str, Any]) -> MCPResponse:

Check warning on line 936 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "convert_file_tool" has high complexity (12)
"""Convert a single file to Markdown"""
try:
file_path = arguments.get("file_path")
Expand Down Expand Up @@ -1093,7 +1098,7 @@
},
)

async def convert_directory_tool(

Check warning on line 1101 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "convert_directory_tool" has high complexity (18)
self, request_id: str, arguments: dict[str, Any]
) -> MCPResponse:
"""Convert all supported files in a directory"""
Expand Down Expand Up @@ -1195,10 +1200,10 @@
markdown_content = result.text_content

# Write file asynchronously
def write_file(

Check notice on line 1203 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "write_file" missing docstring
path: str = output_path, content: str = markdown_content
) -> None:
with Path(path).open("w", encoding="utf-8") as f:

Check notice on line 1206 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Security Review Annotations

Consider using 'with open()' for safer file handling: with Path(path).open("w", encoding="utf-8") as f
f.write(content)

await asyncio.get_event_loop().run_in_executor(None, write_file)
Expand Down Expand Up @@ -1282,7 +1287,7 @@
if response.error is not None:
response_dict["error"] = response.error

print(json.dumps(response_dict), flush=True)

Check warning on line 1290 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Debug print statement found: print(json.dumps(response_dict), flush=True)

except json.JSONDecodeError as e:
logger.error(f"Invalid JSON received: {e}")
Expand All @@ -1298,7 +1303,7 @@
def main() -> None:
"""Main entry point for console script"""

async def run_server() -> None:

Check notice on line 1306 in markitdown_mcp/server.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "run_server" missing docstring
server = MarkItDownMCPServer()
await server.run()

Expand Down
114 changes: 114 additions & 0 deletions markitdown_mcp/vulnerable_code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
"""
This module contains REAL vulnerabilities and code quality issues
to test if our CI/CD system catches false negatives.

These are genuine problems that SHOULD be detected.
"""

import os

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'os' is not used.

Copilot Autofix

AI 9 months ago

The best fix is to remove the import os statement from line 8 in markitdown_mcp/vulnerable_code.py. This does not affect program functionality, as the module was never used. Simply delete the line to resolve the code quality issue.

Suggested changeset 1
markitdown_mcp/vulnerable_code.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/markitdown_mcp/vulnerable_code.py b/markitdown_mcp/vulnerable_code.py
--- a/markitdown_mcp/vulnerable_code.py
+++ b/markitdown_mcp/vulnerable_code.py
@@ -5,7 +5,6 @@
 These are genuine problems that SHOULD be detected.
 """
 
-import os
 import subprocess
 import pickle
 import tempfile
EOF
@@ -5,7 +5,6 @@
These are genuine problems that SHOULD be detected.
"""

import os
import subprocess
import pickle
import tempfile
Copilot is powered by AI and may make mistakes. Always verify output.
import subprocess

Check notice on line 9 in markitdown_mcp/vulnerable_code.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

[blacklist] Consider possible security implications associated with the subprocess module.
import pickle

Check notice on line 10 in markitdown_mcp/vulnerable_code.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

[blacklist] Consider possible security implications associated with pickle module.
import tempfile

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'tempfile' is not used.

Copilot Autofix

AI 9 months ago

The single best way to fix this issue is to delete the import tempfile statement on line 11 of markitdown_mcp/vulnerable_code.py. No other changes are necessary, as no code in the snippet references tempfile, and removing the unused import will not affect the functionality of the code.

Suggested changeset 1
markitdown_mcp/vulnerable_code.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/markitdown_mcp/vulnerable_code.py b/markitdown_mcp/vulnerable_code.py
--- a/markitdown_mcp/vulnerable_code.py
+++ b/markitdown_mcp/vulnerable_code.py
@@ -8,7 +8,6 @@
 import os
 import subprocess
 import pickle
-import tempfile
 from typing import Any
 
 
EOF
@@ -8,7 +8,6 @@
import os
import subprocess
import pickle
import tempfile
from typing import Any


Copilot is powered by AI and may make mistakes. Always verify output.
from typing import Any


# SECURITY ISSUE 1: SQL Injection vulnerability
def unsafe_database_query(user_input: str) -> str:
"""Vulnerable to SQL injection attacks."""
query = f"SELECT * FROM users WHERE name = '{user_input}'" # SQL injection

Check warning on line 18 in markitdown_mcp/vulnerable_code.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

[hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
# This would execute: connection.execute(query)
return query


# SECURITY ISSUE 2: Command injection vulnerability
def unsafe_system_command(filename: str) -> str:
"""Vulnerable to command injection."""
command = f"cat {filename}" # Command injection if filename is user-controlled
result = subprocess.run(command, shell=True, capture_output=True, text=True)

Check failure on line 27 in markitdown_mcp/vulnerable_code.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

[subprocess_popen_with_shell_equals_true] subprocess call with shell=True identified, security issue.
return result.stdout


# SECURITY ISSUE 3: Unsafe deserialization
def unsafe_pickle_load(data: bytes) -> Any:
"""Dangerous pickle deserialization - can execute arbitrary code."""
return pickle.loads(data) # Arbitrary code execution vulnerability

Check warning on line 34 in markitdown_mcp/vulnerable_code.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

[blacklist] Pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue.


# SECURITY ISSUE 4: Path traversal vulnerability
def unsafe_file_access(user_path: str) -> str:
"""Vulnerable to path traversal attacks."""
# No validation - user could pass "../../../etc/passwd"
with open(user_path, 'r') as f:
return f.read()


# SECURITY ISSUE 5: Hardcoded credentials
DATABASE_PASSWORD = "super_secret_admin_password_123"

Check notice on line 46 in markitdown_mcp/vulnerable_code.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

[hardcoded_password_string] Possible hardcoded password: 'super_secret_admin_password_123'
API_SECRET_KEY = "sk-1a2b3c4d5e6f7g8h9i0j"

Check notice on line 47 in markitdown_mcp/vulnerable_code.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

[hardcoded_password_string] Possible hardcoded password: 'sk-1a2b3c4d5e6f7g8h9i0j'
ENCRYPTION_KEY = "AES256-my-super-secret-encryption-key"


# CODE QUALITY ISSUE 1: Uninitialized variable usage
def broken_function(condition: bool) -> str:
"""This function has uninitialized variable usage."""
if condition:
result = "success"
# Bug: result is not defined if condition is False
return result # UnboundLocalError when condition is False

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable Error

Local variable 'result' may be used before it is initialized.

Copilot Autofix

AI 9 months ago

The correct and simplest way to fix this code quality issue is to ensure that the local variable result is initialized regardless of the value of condition. This can be achieved by adding an else clause after the if condition: block (line 54), so that result is assigned a value in both scenarios. For example, if "failure" or another appropriate string is returned when the condition is false, the code’s function and signature remain unchanged.

Edit only lines 54 to 57 in broken_function to add the missing initialization for result.

No new imports or external dependencies are needed.

Suggested changeset 1
markitdown_mcp/vulnerable_code.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/markitdown_mcp/vulnerable_code.py b/markitdown_mcp/vulnerable_code.py
--- a/markitdown_mcp/vulnerable_code.py
+++ b/markitdown_mcp/vulnerable_code.py
@@ -53,7 +53,8 @@
     """This function has uninitialized variable usage."""
     if condition:
         result = "success"
-    # Bug: result is not defined if condition is False
+    else:
+        result = "failure"
     return result  # UnboundLocalError when condition is False
 
 
EOF
@@ -53,7 +53,8 @@
"""This function has uninitialized variable usage."""
if condition:
result = "success"
# Bug: result is not defined if condition is False
else:
result = "failure"
return result # UnboundLocalError when condition is False


Copilot is powered by AI and may make mistakes. Always verify output.


# CODE QUALITY ISSUE 2: Infinite recursion
def infinite_recursion(n: int) -> int:
"""This function will cause stack overflow."""
return infinite_recursion(n + 1) # No base case!


# CODE QUALITY ISSUE 3: Division by zero
def unsafe_division(a: int, b: int) -> float:
"""No check for division by zero."""
return a / b # ZeroDivisionError when b = 0


# CODE QUALITY ISSUE 4: Memory leak potential
class LeakyClass:
"""This class has potential memory leaks."""
def __init__(self):
self.data = []
self._circular_ref = self # Circular reference

def add_data(self, item):

Check notice on line 79 in markitdown_mcp/vulnerable_code.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Function "add_data" missing docstring
self.data.append(item)
# Never clears data - potential memory leak


# CODE QUALITY ISSUE 5: Race condition
import threading

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'threading' is not used.

Copilot Autofix

AI 9 months ago

The best way to fix this problem is to remove the unused import statement from the codebase. Specifically, in markitdown_mcp/vulnerable_code.py, delete the line import threading at line 85. This edit can be made safely as there is no usage of threading in the visible code. No impacts to functionality are expected, as the code does not rely on the module at all.


Suggested changeset 1
markitdown_mcp/vulnerable_code.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/markitdown_mcp/vulnerable_code.py b/markitdown_mcp/vulnerable_code.py
--- a/markitdown_mcp/vulnerable_code.py
+++ b/markitdown_mcp/vulnerable_code.py
@@ -82,7 +82,6 @@
 
 
 # CODE QUALITY ISSUE 5: Race condition
-import threading
 
 shared_counter = 0
 
EOF
@@ -82,7 +82,6 @@


# CODE QUALITY ISSUE 5: Race condition
import threading

shared_counter = 0

Copilot is powered by AI and may make mistakes. Always verify output.

shared_counter = 0

def unsafe_counter_increment():
"""Race condition in shared variable access."""
global shared_counter
temp = shared_counter
# Context switch could happen here!
shared_counter = temp + 1 # Race condition


# CODE QUALITY ISSUE 6: Unreachable code
def unreachable_code_example():
"""Contains unreachable code."""
return "early return"
print("This line is never reached") # Unreachable code

Check warning on line 101 in markitdown_mcp/vulnerable_code.py

View workflow job for this annotation

GitHub Actions / Code Issue Annotations

Debug print statement found: print("This line is never reached") # Unreachable code

Check warning

Code scanning / CodeQL

Unreachable code Warning

This statement is unreachable.

Copilot Autofix

AI 9 months ago

To fix the unreachable code, we need to remove all code that comes after the return statement inside the unreachable_code_example function. Specifically, delete lines 101 and 102 (print("This line is never reached") and x = 5 + 5). Retain only the return statement and the function definition/comments. This preserves the existing functionality (the function still returns "early return") but eliminates unreachable statements, cleaning up the code and addressing the warning.


Suggested changeset 1
markitdown_mcp/vulnerable_code.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/markitdown_mcp/vulnerable_code.py b/markitdown_mcp/vulnerable_code.py
--- a/markitdown_mcp/vulnerable_code.py
+++ b/markitdown_mcp/vulnerable_code.py
@@ -98,8 +98,6 @@
 def unreachable_code_example():
     """Contains unreachable code."""
     return "early return"
-    print("This line is never reached")  # Unreachable code
-    x = 5 + 5  # Unreachable code
 
 
 # PERFORMANCE ISSUE: Inefficient nested loops
EOF
@@ -98,8 +98,6 @@
def unreachable_code_example():
"""Contains unreachable code."""
return "early return"
print("This line is never reached") # Unreachable code
x = 5 + 5 # Unreachable code


# PERFORMANCE ISSUE: Inefficient nested loops
Copilot is powered by AI and may make mistakes. Always verify output.
x = 5 + 5 # Unreachable code


# PERFORMANCE ISSUE: Inefficient nested loops
def inefficient_algorithm(data_list):
"""O(n³) algorithm when O(n) would work."""
result = []
for i in range(len(data_list)):
for j in range(len(data_list)):
for k in range(len(data_list)):
if data_list[i] == data_list[j] == data_list[k]:
result.append(data_list[i])
return result
Loading