Skip to content

refactor: modularize MCP server into dedicated modules#27

Merged
Hmbown merged 3 commits intomainfrom
refactor/mcp-server-modularization
Apr 5, 2026
Merged

refactor: modularize MCP server into dedicated modules#27
Hmbown merged 3 commits intomainfrom
refactor/mcp-server-modularization

Conversation

@Hmbown
Copy link
Copy Markdown
Owner

@Hmbown Hmbown commented Apr 5, 2026

Summary

  • Extracted cohesive logic from aleph/mcp/local_server.py (~3500→1030 lines) into 9 focused modules
  • All extractions use thin compatibility wrappers to preserve backward compatibility
  • New in this pass: node_bridge.py (Node REPL lifecycle, callback bridges, context sync) and recipe_runtime.py (recipe execution engine + DSL compilation)

Modules extracted

Module Responsibility Lines
action_tools.py Action tool registration new
context_tools.py Context/session MCP tools new
workspace_tools.py Workspace manifest/refresh new
workspace_contexts.py Workspace binding helpers new
formatting.py Response formatting/truncation extended
session.py Session lifecycle + memory-pack extended
sub_query_orchestration.py Sub-query/sub-Aleph orchestration new
recipe_runtime.py Recipe execution + compilation new
node_bridge.py Node REPL lifecycle + sync new

What remains in local_server.py (~1030 lines)

  • Server init/configuration and normalization helpers
  • Streamable HTTP server lifecycle
  • REPL injection helpers (_inject_repl_sub_query, _inject_repl_sub_aleph, _inject_repl_config_helpers, _configure_session)
  • Sub-query / sub-Aleph thin wrapper methods
  • Session lifecycle delegation methods
  • Remote server delegation methods
  • Formatting / tool-registration delegation
  • main() entry point

Verification

  • 553 tests passed (up from 541 — 12 new node_bridge tests)
  • ruff clean — no lint errors
  • Backward compatibility verified for NodeREPLEnvironment, _Evidence, _Session re-exports

Residual risks

  • REPL injection helpers remain in local_server.py (tightly coupled to self closures — extraction would add complexity without meaningful reduction)
  • compile_recipe_code in recipe_runtime.py calls owner._get_or_create_node_repl — acceptable interface coupling

Test plan

  • python -m pytest tests/ -q — 553 passed
  • ruff check aleph/ tests/ — clean
  • Import compatibility verified (NodeREPLEnvironment, _Evidence, _Session)
  • Manual smoke test with MCP client

🤖 Generated with Claude Code


Open with Devin

Extract cohesive logic from local_server.py (~3500 lines) into focused
modules, reducing it to ~1030 lines of orchestration + thin wrappers:

- action_tools.py: action tool registration (read_file, write_file, etc.)
- context_tools.py: context/session MCP tool registration
- workspace_tools.py: workspace manifest/refresh tools
- workspace_contexts.py: workspace binding helpers
- formatting.py: response formatting, truncation, JSON limiting
- session.py: session lifecycle + memory-pack serialization
- sub_query_orchestration.py: sub-query/sub-Aleph orchestration
- recipe_runtime.py: recipe execution engine + compilation
- node_bridge.py: Node REPL lifecycle, callback bridges, context sync

All extractions use thin compatibility wrappers in local_server.py to
preserve backward compatibility for external imports and patch points.

Verification: 553 tests passed, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant enhancements to Aleph's workspace management, including new MCP tools for workspace manifests and refreshable contexts, a read-only action policy, and modularization of the MCP server components. The review feedback highlights potential robustness issues, specifically regarding error handling during file system traversal, inefficient file reading for large files, and the need for more defensive error handling when refreshing workspace bindings.

Comment on lines +118 to +128
current = Path(dirpath)
dirnames[:] = sorted(
d for d in dirnames
if d not in _SKIP_DIRS and (include_hidden or not d.startswith("."))
)
for filename in sorted(filenames):
if not include_hidden and filename.startswith("."):
continue
path = current / filename
if path.is_file():
yield path
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _iter_workspace_files function uses os.walk but does not handle potential PermissionError exceptions when accessing directories, which could cause the entire workspace indexing to fail.

Comment thread aleph/mcp/action_tools.py
if not p.exists() or not p.is_file():
return format_error(f"File not found: {path}", output=output)

data = p.read_bytes()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using read_bytes() on a file without explicitly checking for potential errors or handling large file sizes beyond the config check can be risky. Consider using a context manager or a more robust streaming approach if the file size is large.

Comment thread aleph/mcp/actions.py
Comment on lines +208 to +220
with open(path, "r", encoding="utf-8", errors="replace") as handle:
for idx, line in enumerate(handle, start=1):
match = rx.search(line)
if not match:
continue
results.append(
{
"path": str(path),
"line": idx,
"column": match.start() + 1,
"text": line.rstrip("\n"),
}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual file reading loop is inefficient for large files. Consider using a more optimized approach like reading in chunks or using a library that handles large file scanning more efficiently.

Comment on lines +116 to +121
try:
text, fmt, note, refreshed_binding = refresh_workspace_binding(
session.workspace_binding,
max_read_bytes=owner.action_config.max_read_bytes,
timeout_seconds=owner.action_config.max_cmd_seconds,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The refresh_workspace_binding function is called within a tool without proper error handling for the specific case where the binding might have been deleted or modified in a way that makes it invalid, potentially leading to a crash.

@Hmbown Hmbown marked this pull request as ready for review April 5, 2026 17:34
Copilot AI review requested due to automatic review settings April 5, 2026 17:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the modularization of the Aleph MCP local server by extracting previously inlined logic into dedicated aleph/mcp/* modules (workspace bindings/tools, node bridge, recipe runtime, sub-query orchestration, REPL injection, etc.), while preserving backward compatibility via thin wrappers and re-exports. It also introduces a new read-only action policy and documents a workspace-first workflow for large codebases.

Changes:

  • Extracted and/or introduced focused MCP modules for workspace refresh/bindings, Node REPL bridging, recipe execution/compilation, REPL helper injection, and sub-query orchestration.
  • Added ALEPH_ACTION_POLICY / --action-policy with read-only enforcement across action tools and bootstrap/configure flows.
  • Added/expanded tests covering new modules, workspace contracts, and compatibility aliases; bumped version/docs/changelog to 0.9.3.

Reviewed changes

Copilot reviewed 31 out of 32 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
web/index.html Updates site version badge to 0.9.3.
pyproject.toml Bumps package version to 0.9.3.
aleph/init.py Bumps __version__ to 0.9.3.
CHANGELOG.md Documents 0.9.3 features/refactors/tests.
README.md Adds workspace-first workflow guidance + read-only action policy docs.
DEVELOPMENT.md Updates architecture map and dev commands for new modules/action policy.
aleph/settings.py Adds ALEPH_ACTION_POLICY env setting + coercion.
aleph/mcp/server_bootstrap.py Adds --action-policy CLI flag and env wiring into runtime config.
aleph/mcp/actions.py Refactors action config/helpers; adds read-only enforcement in require_actions.
aleph/mcp/action_tools.py Registers action MCP tools (run/read/write/search/tests) via extracted module.
aleph/mcp/context_tools.py Registers context/session tools (load/list/diff/save/load) via extracted module.
aleph/mcp/workspace.py Adds shared _resolve_line_number_base helper used by action tooling.
aleph/mcp/workspace_contexts.py Introduces workspace bindings (file/manifest), status/summary, refresh helpers.
aleph/mcp/workspace_tools.py Adds load_workspace_manifest and refresh_context MCP tools.
aleph/mcp/formatting.py Expands payload/error/execution formatting and truncation/redaction behavior.
aleph/mcp/repl_injection.py Extracts REPL helper injection (sub_query, sub_aleph, config helpers).
aleph/mcp/node_bridge.py Extracts Node REPL lifecycle + callback registration + sync-back to session.
aleph/mcp/sub_query_orchestration.py Extracts sub-query/sub-Aleph orchestration helpers and HTTP transport helpers.
aleph/mcp/recipe_runtime.py Extracts recipe execution engine + code compilation for recipe DSL.
aleph/mcp/reasoning_tools.py Extends status/task handling to include workspace binding metadata + action policy.
aleph/mcp/admin_tools.py Allows runtime configure() to update action policy.
tests/test_workspace_tools.py New tests for workspace manifest/bindings, refresh, and memory-pack persistence.
tests/test_repl_injection.py New tests validating REPL injection functions and wrapper delegation.
tests/test_node_bridge.py New tests for node bridge functions and wrapper delegation.
tests/test_mcp_sub_query_orchestration.py New tests ensuring wrapper/module equivalence + URL/path helpers.
tests/test_mcp_server_bootstrap.py Extends bootstrap tests for action policy env + runtime config defaults.
tests/test_mcp_recipe_runtime.py New tests for recipe runtime execution/compile and wrapper delegation.
tests/test_mcp_formatting.py New tests for payload redaction/truncation.
tests/test_mcp_contracts.py New JSON/object contract tests for workspace/status/list/refresh tools.
tests/test_compatibility_aliases.py Adds alias/guard tests for ActionConfig re-export + read-only enforcement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread aleph/mcp/formatting.py
Comment on lines +74 to +77
rendered = json.dumps(safe_payload, ensure_ascii=False, indent=2)
if output == "json":
return json.dumps(payload, ensure_ascii=False, indent=2)
return "```json\n" + json.dumps(payload, ensure_ascii=False, indent=2) + "\n```"
return _truncate_inline(rendered, max_chars)

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_format_payload(..., output='json') truncates the serialized JSON string (return _truncate_inline(rendered, max_chars)), which can produce invalid / unparsable JSON for MCP clients. Since this formatter is used by action tools and workspace tools for output='json', it should preserve JSON validity (e.g., only truncate individual fields / limit list sizes, or return an object with a truncated flag rather than cutting the JSON text).

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +121
try:
text, fmt, note, refreshed_binding = refresh_workspace_binding(
session.workspace_binding,
max_read_bytes=owner.action_config.max_read_bytes,
timeout_seconds=owner.action_config.max_cmd_seconds,
)
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refresh_context trusts session.workspace_binding and passes it directly into refresh_workspace_binding, which reads absolute paths/roots from the binding. Because bindings are persisted via memory packs, a crafted/edited memory pack can bypass workspace_mode scoping (e.g., fixed/git) and refresh arbitrary files outside the workspace root. Consider re-validating/re-scoping bindings against owner.action_config.workspace_root + workspace_mode before reading, and rejecting bindings that escape the allowed scope.

Copilot uses AI. Check for mistakes.
Comment on lines +272 to +275
stat = path.stat()
stale = (
stat.st_size != int(binding.get("size_bytes") or 0)
or stat.st_mtime_ns != int(binding.get("mtime_ns") or 0)
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binding_status() casts size_bytes / mtime_ns with int(binding.get(...) or 0). If a persisted workspace_binding is missing these keys or contains non-numeric strings (possible when loading older/crafted memory packs), this will raise and break get_status/list_contexts. Make this defensive (e.g., wrap casts in try/except and treat invalid metadata as stale=True with a clear reason).

Suggested change
stat = path.stat()
stale = (
stat.st_size != int(binding.get("size_bytes") or 0)
or stat.st_mtime_ns != int(binding.get("mtime_ns") or 0)
def _coerce_binding_int(key: str) -> tuple[int | None, str | None]:
value = binding.get(key)
if value in (None, ""):
return 0, None
try:
return int(value), None
except (TypeError, ValueError):
return None, f"invalid persisted file metadata: {key}"
expected_size, size_error = _coerce_binding_int("size_bytes")
expected_mtime_ns, mtime_error = _coerce_binding_int("mtime_ns")
if size_error or mtime_error:
return {
"kind": "file",
"path": path_text,
"display_path": binding.get("display_path"),
"exists": True,
"refreshable": True,
"stale": True,
"reason": size_error or mtime_error,
"last_refreshed_at": binding.get("refreshed_at"),
}
stat = path.stat()
stale = (
stat.st_size != expected_size
or stat.st_mtime_ns != expected_mtime_ns

Copilot uses AI. Check for mistakes.
Comment on lines +238 to +260
ok, out, _trunc, _bk = await owner._run_sub_query(
prompt=step["prompt"],
context_slice=ctx_slice,
context_id=resolved_context_id,
backend=step.get("backend", "auto"),
)
return idx, ok, out

tasks = [_run_item(i, it) for i, it in enumerate(items)]
results = await asyncio.gather(*tasks, return_exceptions=True)
outputs: list[str] = [""] * len(items)
for r in results:
if isinstance(r, BaseException):
if not continue_on_error:
raise RuntimeError(f"sub_query failed: {r}")
outputs[0] = f"[ERROR] {r}"
else:
idx, ok, item_output = r
if not ok and not continue_on_error:
raise RuntimeError(f"sub_query failed: {item_output}")
outputs[idx] = (
item_output if ok else f"[ERROR] {item_output}"
)
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In map_sub_query parallel mode, exceptions returned by asyncio.gather(..., return_exceptions=True) are always written to outputs[0], which mis-attributes failures and can silently overwrite the first result. To keep the output aligned with inputs, include the index in exception handling (e.g., wrap _run_item to catch and return (idx, False, error)), or record errors in the correct slot.

Suggested change
ok, out, _trunc, _bk = await owner._run_sub_query(
prompt=step["prompt"],
context_slice=ctx_slice,
context_id=resolved_context_id,
backend=step.get("backend", "auto"),
)
return idx, ok, out
tasks = [_run_item(i, it) for i, it in enumerate(items)]
results = await asyncio.gather(*tasks, return_exceptions=True)
outputs: list[str] = [""] * len(items)
for r in results:
if isinstance(r, BaseException):
if not continue_on_error:
raise RuntimeError(f"sub_query failed: {r}")
outputs[0] = f"[ERROR] {r}"
else:
idx, ok, item_output = r
if not ok and not continue_on_error:
raise RuntimeError(f"sub_query failed: {item_output}")
outputs[idx] = (
item_output if ok else f"[ERROR] {item_output}"
)
try:
ok, out, _trunc, _bk = await owner._run_sub_query(
prompt=step["prompt"],
context_slice=ctx_slice,
context_id=resolved_context_id,
backend=step.get("backend", "auto"),
)
return idx, ok, out
except Exception as exc:
return idx, False, str(exc)
tasks = [_run_item(i, it) for i, it in enumerate(items)]
results = await asyncio.gather(*tasks)
outputs: list[str] = [""] * len(items)
for idx, ok, item_output in results:
if not ok and not continue_on_error:
raise RuntimeError(f"sub_query failed: {item_output}")
outputs[idx] = (
item_output if ok else f"[ERROR] {item_output}"
)

Copilot uses AI. Check for mistakes.
Comment thread aleph/mcp/session.py
Comment on lines 516 to +523
tasks_payload = obj.get("tasks")
tasks: list[dict[str, Any]] = []
if isinstance(tasks_payload, list):
for task in tasks_payload:
if not isinstance(task, dict):
continue
if "id" not in task or "title" not in task:
continue
tasks.append(dict(task))

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_session_from_payload currently accepts any dict in tasks and stores it as-is. The tasks MCP tool later assumes each task has id, description, and status keys (indexing directly), so loading older/corrupted memory packs can cause KeyError and break task listing/status. Consider normalizing tasks on load (e.g., map legacy titledescription, default missing fields, and drop invalid items).

Copilot uses AI. Check for mistakes.
Comment thread aleph/mcp/action_tools.py
Comment on lines +458 to +466
# Heuristics for test runner
runner_bin: str = str(runner)
if runner == "auto":
runner_bin = "pytest"

argv: list[str] = [runner_bin]
if args:
argv.extend(args)

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_tests executes pytest directly (argv = ['pytest', ...]). This is less reliable than invoking the module via the current interpreter (python -m pytest), especially in environments where the pytest script isn’t on PATH or where multiple Python installs are present. Consider switching to sys.executable -m pytest for the pytest runner (and only using a bare binary when the user explicitly supplies one).

Copilot uses AI. Check for mistakes.
@Hmbown Hmbown merged commit f84f8d3 into main Apr 5, 2026
4 checks passed
@Hmbown Hmbown deleted the refactor/mcp-server-modularization branch April 5, 2026 17:45
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 5 new potential issues.

View 12 additional findings in Devin Review.

Open in Devin Review

Comment on lines +1523 to +1525
if (ch === terminatorChar && stack.length === 0) {
return index;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Fallback TS stripper incorrectly matches = in => arrow tokens as the type terminator

findTypeTerminator scans for = as the terminator character with a bracket-balancing stack. For function types in variable declarations like const fn: () => void = implementation, after scanning past () the stack is empty, so the = character inside the => arrow token is matched as the assignment terminator. This causes stripVariableDeclarationTypes to strip const fn: () and leave => void = implementation, producing invalid JavaScript that crashes at runtime.

Example trace through the code

Input: const fn: () => number = myFn

  1. stripVariableDeclarationTypes regex matches const fn:, colonIndex points to :
  2. findTypeTerminator(source, typeStart, "=") scans from after :
  3. ( pushes to stack, ) pops — stack is now empty
  4. Space is skipped, then = (from =>) is found with empty stack → returns this index
  5. Result: const fn=> number = myFn — invalid JavaScript
Prompt for agents
The findTypeTerminator function in node_worker.cjs matches single characters against terminatorChar. When terminatorChar is '=' and the type contains an arrow function type like () => void, the '=' in '=>' is incorrectly matched as the terminator because the parentheses have already balanced.

To fix this, before returning a match for '=' at line 1523-1524, check whether the next character is '>' (forming '=>'). If so, skip past both characters instead of returning. Something like:

if (ch === terminatorChar && stack.length === 0) {
  if (terminatorChar === '=' && index + 1 < source.length && source[index + 1] === '>') {
    index += 1; // skip past '=>'
    continue;
  }
  return index;
}

This ensures that arrow tokens in function types like () => void, (a: string) => boolean are not confused with assignment operators.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +1551 to +1598
if (ch === ":") {
let scan = index + 1;
let nestedQuote = null;
let parenDepth = 0;
let bracketDepth = 0;
let braceDepth = 0;
let angleDepth = 0;
while (scan < paramsSource.length) {
const current = paramsSource[scan];
const currentPrev = scan > index + 1 ? paramsSource[scan - 1] : "";
if (nestedQuote) {
if (current === nestedQuote && currentPrev !== "\\") {
nestedQuote = null;
}
scan += 1;
continue;
}
if (current === "'" || current === '"' || current === "`") {
nestedQuote = current;
scan += 1;
continue;
}
if (current === "(") parenDepth += 1;
else if (current === ")") {
if (parenDepth === 0 && bracketDepth === 0 && braceDepth === 0 && angleDepth === 0) break;
parenDepth -= 1;
} else if (current === "[") bracketDepth += 1;
else if (current === "]") bracketDepth -= 1;
else if (current === "{") braceDepth += 1;
else if (current === "}") braceDepth -= 1;
else if (current === "<") angleDepth += 1;
else if (current === ">") angleDepth -= 1;
else if (
(current === "," || current === ")") &&
parenDepth === 0 &&
bracketDepth === 0 &&
braceDepth === 0 &&
angleDepth === 0
) {
break;
}
scan += 1;
}
while (scan < paramsSource.length && /\s/.test(paramsSource[scan])) {
scan += 1;
}
index = scan;
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Fallback TS stripper strips default parameter values alongside type annotations

stripTypeAnnotationsFromParams strips everything from : to the next , or ) delimiter, but this range includes both the type annotation AND any default value. For (a: string = "hello"), the scanner finds : then scans forward past string = "hello" until it hits ), stripping the entire span. The result is (a) instead of the correct (a = "hello"), silently dropping the default value and changing the function's runtime behavior.

Correct vs actual behavior

Input params: a: string = "hello", b: number

  • Expected output: a = "hello", b
  • Actual output: a, b

The function at aleph/repl/node_worker.cjs:1551-1598 needs to detect where the type annotation ends and the = default-value assignment begins, stopping the type strip at the = rather than continuing to the next ,/).

Prompt for agents
The stripTypeAnnotationsFromParams function in node_worker.cjs at line 1551-1598 strips everything after ':' until the next ',' or ')' at depth 0. This incorrectly includes default parameter values (the '= value' portion).

The fix should modify the inner scanning loop to also break when it encounters '=' at depth 0 (all nesting counters zero), similar to how it breaks on ',' and ')'. This way, for 'a: string = "hello"', the scanner stops at '=' and only strips ': string', preserving '= "hello"' in the output.

However, care must be taken to not confuse '=' inside nested types (like '{ key: value }' or arrow types '() => void') with the assignment operator. The nesting depth checks already handle '{}', '()', '[]', and '<>' — so adding '=' as a break condition at depth 0 should work correctly for the common case. You may also want to check that the '=' is not followed by '>' (to avoid matching '=>' arrow tokens in callback parameter types).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread aleph/mcp/action_tools.py
Comment on lines +459 to +465
runner_bin: str = str(runner)
if runner == "auto":
runner_bin = "pytest"

argv: list[str] = [runner_bin]
if args:
argv.extend(args)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 run_tests tool loses sys.executable invocation and default pytest flags during extraction

During the modularization from actions.py to action_tools.py, the run_tests tool was simplified in a way that changes its runtime behavior. The old code (aleph/mcp/actions.py:559) used [sys.executable, "-m", "pytest", "-vv", "--tb=short", "--maxfail=20"] to ensure pytest runs under the current Python interpreter with sensible defaults. The new code uses bare ["pytest"] with no default flags. This causes two problems: (1) In virtual environments where pytest is not on PATH but is installed in the current Python's site-packages, the bare pytest command will fail with a "command not found" error, while sys.executable -m pytest would succeed. (2) The default flags --tb=short --maxfail=20 -vv that controlled output formatting and early termination are lost, changing the test output and behavior.

Suggested change
runner_bin: str = str(runner)
if runner == "auto":
runner_bin = "pytest"
argv: list[str] = [runner_bin]
if args:
argv.extend(args)
runner_bin: str = str(runner)
if runner == "auto":
runner_bin = "pytest"
import sys as _sys
argv: list[str] = [_sys.executable, "-m", runner_bin, "-vv", "--tb=short", "--maxfail=20"]
if args:
argv.extend(args)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread aleph/mcp/formatting.py
Comment on lines +70 to +72
safe_payload = cast(dict[str, Any], _sanitize(payload))
if output == "object":
return payload
return safe_payload
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 _format_payload now truncates and mutates payloads in "object" output mode, breaking the raw-data contract

The old _format_payload (aleph/mcp/formatting.py) returned the raw payload dict untouched for output="object". The new version runs all payloads through _sanitize() first, which truncates any string value longer than 10,000 chars down to ~817 chars (400-char head + suffix + 400-char tail) and redacts any dict key named "ctx". This silently breaks the output="object" contract for action tools (run_command, read_file, run_tests) that pass their payloads through format_payload. For example, read_file with limit=200 lines can produce ~18,000 chars of numbered content that gets truncated to ~817 chars. Notably, rg_search in action_tools.py:210-213 explicitly handles object/json modes before calling format_payload, suggesting the author intended object mode to be raw — but the other action tools don't follow this pattern.

Comparison of old vs new behavior

Old (output="object"):

if output == "object":
    return payload  # raw, unmodified

New (output="object"):

safe_payload = cast(dict[str, Any], _sanitize(payload))  # truncates strings, redacts ctx
if output == "object":
    return safe_payload
Prompt for agents
The _format_payload function in aleph/mcp/formatting.py now sanitizes payloads even for output="object" mode, which is the programmatic raw-data API. The old behavior returned the payload as-is for object mode. The fix should either: (1) skip sanitization for output="object" to preserve the old raw-data contract (matching what rg_search already does by handling object mode before calling format_payload), or (2) have action tools in action_tools.py handle object mode explicitly before delegating to format_payload, similar to how rg_search does it at lines 210-213. Approach (1) is simpler and preserves backward compatibility. The sanitization is still valuable for json/markdown modes which feed into the LLM context window.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

if isinstance(r, BaseException):
if not continue_on_error:
raise RuntimeError(f"sub_query failed: {r}")
outputs[0] = f"[ERROR] {r}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Parallel map_sub_query error placed at outputs[0] instead of the failing task's index

In recipe_runtime.py, when a parallel map_sub_query task raises an exception and continue_on_error=True, the error message is unconditionally stored at outputs[0] (outputs[0] = f"[ERROR] {r}"). Since asyncio.gather(return_exceptions=True) returns exceptions at the position of the failing task, the error should be placed at the task's positional index, not index 0. The current code overwrites a potentially valid result at index 0 while leaving the actual failing task's slot as an empty string. This is a pre-existing bug that was carried over during the extraction from local_server.py.

Prompt for agents
In aleph/mcp/recipe_runtime.py execute_recipe(), the parallel map_sub_query error handling loop (around line 249-260) iterates over results from asyncio.gather(return_exceptions=True) without tracking the positional index. When a BaseException is caught, it's stored at outputs[0] instead of the correct index. The fix is to use enumerate: change `for r in results:` to `for task_idx, r in enumerate(results):` and then use `outputs[task_idx] = f"[ERROR] {r}"` instead of `outputs[0] = f"[ERROR] {r}"`. This preserves the correct mapping between tasks and their output slots.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Hmbown added a commit that referenced this pull request Apr 5, 2026
- Fix parallel map_sub_query storing errors at outputs[0] instead of correct index
- Restore sys.executable -m pytest invocation and default flags in run_tests
- Fix _format_payload output="object" mode truncating raw payloads
- Fix TS fallback stripper matching = in => arrow tokens as type terminator
- Fix TS param stripper dropping default values alongside type annotations
- Add PermissionError handling in workspace file walker
- Add defensive int cast in binding_status for corrupted metadata

563 tests pass, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hmbown added a commit that referenced this pull request Apr 5, 2026
fix: address review findings from PR #27 modularization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants