Skip to content

Conversation

@nndn
Copy link
Contributor

@nndn nndn commented Nov 11, 2025

Summary by CodeRabbit

  • New Features

    • Read-only bash command tool for safe, sandboxed repository inspection
    • Local repo manager and provider wrapper for serving cached repository copies
    • Optional Phoenix tracing integration for LLM monitoring (configurable)
  • Infrastructure

    • gVisor-based command isolation with runner, installer, Docker integration, and Dockerfile support
    • CI Docker build simplifications (streamlined build/push steps)
  • Documentation

    • New gVisor setup, quickstart, usage, and Docker Desktop guides; Phoenix tracing notes
  • Tests / Scripts

    • New gVisor install/verify scripts and test utilities
  • Security / Access

    • Integration endpoints now enforce authenticated ownership checks
  • Chores

    • Added .repos/ to .gitignore; updated environment template and README section on tracing

dhirenmathur and others added 30 commits October 21, 2025 21:36
- Fixed GitHub/GitBucket URL routing for multi-provider setup
- GH_TOKEN_LIST tokens now always use github.com API
- CODE_PROVIDER_BASE_URL only applies to CODE_PROVIDER_TOKEN
- Implemented targeted 401 fallback for invalid credentials
- Fixed rate limiting: 60 req/hr → 5,000 req/hr (authenticated)
- Fixed OAuth token KeyError with safe dict access
- Removed diagnostic endpoints (were temporary debugging tools)
- Cleaned up DEBUG logging that exposed token details
- Added comprehensive GitHub authentication documentation

Core Authentication Flow:
1. Try GitHub App (if configured)
2. Try PAT pool from GH_TOKEN_LIST (if configured)
3. Try single PAT from CODE_PROVIDER_TOKEN (if configured)
4. Fall back to unauthenticated for public repos

Rate Limiting:
- Authenticated (PAT): 5,000 requests/hour
- Unauthenticated: 60 requests/hour per IP

Fallback Behavior:
- Only 401 errors trigger unauthenticated fallback
- 404 errors propagate (no redundant retry)
- Prevents double API calls while handling invalid tokens

Files Modified:
- app/modules/code_provider/provider_factory.py (GitHub URL routing)
- app/modules/code_provider/code_provider_service.py (401 fallback)
- app/modules/code_provider/github/github_provider.py (auth logging)
- app/modules/code_provider/github/github_service.py (OAuth safety)
- app/modules/users/user_service.py (provider_info handling)
- app/main.py (removed diagnostic router)
- README.md (added auth documentation)
- .env.template (documented auth variables)

Files Deleted:
- app/diagnostic_router.py (temporary debugging endpoints)

Related Handoffs:
- thoughts/shared/handoffs/general/2025-11-05_12-06-51_github-auth-rate-limit-fix.md
- thoughts/shared/handoffs/general/2025-11-04_14-58-10_github-rate-limit-fix.md
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/modules/parsing/graph_construction/parsing_helper.py (1)

1-21: Import Optional and Path so the module loads.
Adding the repo-manager logic introduces annotations that reference Optional and Path, but neither is imported. Python evaluates annotations at import time (PEP 563 isn’t enabled), so this module now dies with NameError: name 'Optional' is not defined. Please import both before we ship.

-import uuid
-from typing import Any, Tuple
+import uuid
+from pathlib import Path
+from typing import Any, Optional, Tuple
♻️ Duplicate comments (8)
scripts/install_gvisor_in_docker_vm.sh (1)

29-53: Stop masking the Docker VM install failure.
We’re still swallowing the docker run failure (the || true path flagged in the earlier review), so the script exits 0 even when nothing gets installed. This regresses automation. Propagate the error and exit immediately when the copy step fails.

-INSTALL_OUTPUT=$(docker run --rm --privileged \
-    -v /:/host \
-    -v "$(pwd)/runsc:/runsc:ro" \
-    alpine sh -c "
-        # Copy runsc to /usr/local/bin in the host (Docker Desktop VM)
-        cp /runsc /host/usr/local/bin/runsc 2>&1
-        chmod +x /host/usr/local/bin/runsc 2>&1
-        echo 'runsc installed to /usr/local/bin/runsc in Docker Desktop VM'
-    " 2>&1 | grep -v "WARNING" | grep -v "SECURITY" || true)
-
-if echo "$INSTALL_OUTPUT" | grep -q "runsc installed"; then
-    echo "runsc installed to /usr/local/bin/runsc in Docker Desktop VM"
-    SUCCESS=1
-else
-    # Installation failed - this is okay, might need manual setup
-    SUCCESS=0
-fi
-
-if [ "$SUCCESS" = "0" ]; then
-    # Fallback method (if needed in future)
-    if [ "$VERBOSE" = "1" ]; then
-        echo "⚠️  Installation method 1 failed (may need different approach)"
-        echo "Alternative installation script would be created if needed."
-    fi
-fi
+INSTALL_LOG=$(docker run --rm --privileged \
+    -v /:/host \
+    -v "$(pwd)/runsc:/runsc:ro" \
+    alpine sh -c "
+        cp /runsc /host/usr/local/bin/runsc 2>&1
+        chmod +x /host/usr/local/bin/runsc 2>&1
+        echo 'runsc installed to /usr/local/bin/runsc in Docker Desktop VM'
+    " 2>&1)
+
+INSTALL_OUTPUT=$(printf "%s\n" "$INSTALL_LOG" | grep -v "WARNING" | grep -v "SECURITY")
+
+if echo "$INSTALL_OUTPUT" | grep -q "runsc installed"; then
+    echo "runsc installed to /usr/local/bin/runsc in Docker Desktop VM"
+else
+    if [ "$VERBOSE" = "1" ]; then
+        echo "⚠️  Installation method 1 failed (may need different approach)"
+        echo "Alternative installation script would be created if needed."
+        echo "$INSTALL_LOG"
+    else
+        echo "❌ Failed to install runsc into Docker Desktop VM."
+    fi
+    exit 1
+fi
app/modules/code_provider/repo_manager_wrapper.py (2)

567-571: Stop filtering every dot-prefixed path

Dropping everything that starts with “.” strips real files like .github/, .gitignore, and .env from the structure, so local trees diverge from what the provider returns. Please narrow the ignore rule to the Git control directories only.

Apply this diff:

-                # Skip hidden files and .git directory
-                if item.startswith("."):
-                    continue
+                # Skip only Git control directories
+                if item in {".", "..", ".git", ".gitmodules"}:
+                    continue

572-574: Use POSIX separators for returned paths

os.path.join emits backslashes on Windows, so the path field from the repo manager won’t line up with the POSIX-style paths produced by the remote provider. Please switch to posixpath.join (and import posixpath) to keep separators consistent.

Apply this diff:

-import os
+import os
+import posixpath
@@
-                item_path = os.path.join(full_path, item)
-                rel_path = os.path.join(path, item) if path else item
+                item_path = os.path.join(full_path, item)
+                rel_path = posixpath.join(path, item) if path else item
docs/gvisor_setup.md (1)

11-13: Correct Mac/Windows guidance

We now support gVisor on macOS/Windows via Docker Desktop’s runsc runtime, yet this table still says those platforms “automatically fall back” because gVisor is “not supported.” Please align the guidance with the new Docker-based workflow instead of telling users it’s unavailable.

test_gvisor.py (1)

47-49: Relax the non-Linux availability assertion

The runner deliberately reports True on macOS/Windows once Docker Desktop exposes the runsc runtime, so this unconditional assertion will fail on the supported configuration we just introduced.

Apply this diff:

-    if platform.system().lower() != "linux":
-        print(f"✓ Expected: gVisor not available on {platform.system()}")
-        assert not available, "gVisor should not be available on non-Linux platforms"
+    if platform.system().lower() != "linux":
+        if available:
+            print(f"✓ gVisor available via Docker Desktop on {platform.system()}")
+        else:
+            print(f"✓ Expected fallback: gVisor not available on {platform.system()} (Docker Desktop not configured)")

Consider updating the summary text in main() as well so it no longer states gVisor is unavailable on Mac/Windows.

app/modules/intelligence/tools/code_query_tools/bash_command_tool.py (1)

384-402: Pipes and shell operators don't work with the current command execution.

The command is split into arguments on line 388, causing shell operators like | (pipe) to be passed as literal arguments rather than being interpreted by the shell. Line 402 passes command_parts directly, which breaks commands like grep pattern | head -20 that the documentation explicitly advertises as supported.

Apply this diff to fix the issue by always using shell interpretation:

-            # Parse command into list for gVisor runner
-            # Split the command properly, handling quoted strings
-            try:
-                # Use shlex to properly parse the command while preserving quotes
-                command_parts = shlex.split(command)
-            except ValueError as e:
-                # If parsing fails, try a simpler approach
-                logger.warning(
-                    f"[BASH_COMMAND] Failed to parse command with shlex: {e}, using simple split"
-                )
-                command_parts = command.split()
-
-            # If we need to run in a subdirectory, prepend cd command
             if relative_working_dir and relative_working_dir != ".":
-                # Prepend cd command to change to the subdirectory
-                cd_command = f"cd {shlex.quote(relative_working_dir)} && {' '.join(shlex.quote(arg) for arg in command_parts)}"
-                final_command = ["sh", "-c", cd_command]
+                final_command = [
+                    "sh",
+                    "-c",
+                    f"cd {shlex.quote(relative_working_dir)} && {command}",
+                ]
             else:
-                final_command = command_parts
+                final_command = ["sh", "-c", command]
app/modules/utils/gvisor_runner.py (2)

67-70: Cache the Docker availability probe to avoid duplicate expensive check.

The _check_docker_available() function runs an expensive Docker test container (lines 358-363). Calling it twice on lines 67 and 69 doubles the work for every sandboxed command on macOS/Windows.

Apply this diff to cache the result:

-        if _check_docker_available():
-            # Docker is available, check if runsc runtime is configured
-            return _check_docker_available()  # This already checks for runsc runtime
-        return False
+        docker_ready = _check_docker_available()
+        if docker_ready:
+            # Docker is available, and the probe already confirmed runsc works
+            return True
+        return False

228-240: Allow Docker-backed gVisor when the host lacks runsc binary.

On Docker Desktop (macOS/Windows), is_gvisor_available() returns True even without a local runsc binary because it detects Docker with the runsc runtime (lines 67-69). However, the early return on lines 229-238 prevents using Docker-backed gVisor, forcing a fallback to regular subprocess. This defeats the isolation that was confirmed to be available.

Apply this diff to continue with Docker-backed gVisor when the binary is missing:

     runsc_path = get_runsc_binary()
-    if not runsc_path:
-        logger.warning(
-            "[GVISOR] gVisor runsc binary not found, falling back to regular subprocess (less secure)"
-        )
-        return _run_command_regular(
-            command=command,
-            working_dir=working_dir,
-            env=env,
-            timeout=timeout,
-        )
-
-    logger.info(f"[GVISOR] gVisor available, using runsc at {runsc_path}")
+    if runsc_path:
+        logger.info(f"[GVISOR] gVisor available, using runsc at {runsc_path}")
+    else:
+        logger.info(
+            "[GVISOR] runsc binary not found locally; will rely on Docker runtime when available"
+        )
🧹 Nitpick comments (3)
app/modules/intelligence/tools/code_query_tools/bash_command_tool.py (2)

438-445: Prefer logger.exception for automatic traceback inclusion.

Line 439 uses logger.error which requires manually passing exc_info=True or the exception object. Using logger.exception automatically includes the traceback.

Apply this diff:

             except Exception as e:
-                logger.error(f"[BASH_COMMAND] Error executing command with gVisor: {e}")
+                logger.exception("[BASH_COMMAND] Error executing command with gVisor")
                 return {
                     "success": False,
                     "error": f"Error executing command: {str(e)}",

454-461: Remove redundant exception object from logger.exception call.

logger.exception automatically logs the current exception context, so explicitly passing f"... {e}" is redundant.

Apply this diff:

         except Exception as e:
-            logger.exception(f"[BASH_COMMAND] Unexpected error: {e}")
+            logger.exception("[BASH_COMMAND] Unexpected error")
             return {
                 "success": False,
                 "error": f"Unexpected error: {str(e)}",
app/modules/utils/gvisor_runner.py (1)

533-573: Consider documenting or removing unused parameters.

The function parameters repo_path and runsc_path are not used because direct runsc execution requires OCI bundle creation (lines 551-554). Consider either removing them or adding underscore prefixes (_repo_path, _runsc_path) to signal they're intentionally unused.

Apply this diff to clarify the intent:

 def _run_with_runsc_direct(
     command: List[str],
     working_dir: Optional[str],
-    repo_path: Optional[str],
+    _repo_path: Optional[str],
     env: Optional[Dict[str, str]],
     timeout: Optional[int],
-    runsc_path: Path,
+    _runsc_path: Path,
 ) -> CommandResult:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 356e3e5 and c6c09b3.

📒 Files selected for processing (27)
  • .gitignore (1 hunks)
  • app/modules/code_provider/repo_manager_wrapper.py (1 hunks)
  • app/modules/intelligence/agents/chat_agents/system_agents/code_gen_agent.py (1 hunks)
  • app/modules/intelligence/agents/chat_agents/system_agents/debug_agent.py (1 hunks)
  • app/modules/intelligence/agents/chat_agents/system_agents/low_level_design_agent.py (1 hunks)
  • app/modules/intelligence/agents/chat_agents/system_agents/qna_agent.py (1 hunks)
  • app/modules/intelligence/agents/chat_agents/tool_helpers.py (4 hunks)
  • app/modules/intelligence/tools/change_detection/change_detection_tool.py (1 hunks)
  • app/modules/intelligence/tools/code_query_tools/bash_command_tool.py (1 hunks)
  • app/modules/intelligence/tools/tool_service.py (2 hunks)
  • app/modules/parsing/graph_construction/parsing_helper.py (2 hunks)
  • app/modules/repo_manager/repo_manager_interface.py (1 hunks)
  • app/modules/utils/gvisor_runner.py (1 hunks)
  • app/modules/utils/install_gvisor.py (1 hunks)
  • deployment/stage/celery/Jenkinsfile_CELERY (1 hunks)
  • deployment/stage/mom-api/Jenkinsfile_API (1 hunks)
  • docs/docker_desktop_gvisor_config.md (1 hunks)
  • docs/gvisor_mac_setup.md (1 hunks)
  • docs/gvisor_quickstart.md (1 hunks)
  • docs/gvisor_setup.md (1 hunks)
  • docs/gvisor_usage.md (1 hunks)
  • scripts/__init__.py (1 hunks)
  • scripts/install_gvisor.py (1 hunks)
  • scripts/install_gvisor_in_docker_vm.sh (1 hunks)
  • scripts/setup_gvisor_docker.sh (1 hunks)
  • scripts/verify_gvisor_docker.sh (1 hunks)
  • test_gvisor.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/gvisor_mac_setup.md
🚧 Files skipped from review as they are similar to previous changes (10)
  • .gitignore
  • app/modules/intelligence/agents/chat_agents/system_agents/code_gen_agent.py
  • scripts/setup_gvisor_docker.sh
  • docs/gvisor_quickstart.md
  • scripts/init.py
  • scripts/install_gvisor.py
  • deployment/stage/mom-api/Jenkinsfile_API
  • deployment/stage/celery/Jenkinsfile_CELERY
  • app/modules/intelligence/agents/chat_agents/system_agents/debug_agent.py
  • app/modules/intelligence/agents/chat_agents/system_agents/qna_agent.py
🧰 Additional context used
🧬 Code graph analysis (8)
app/modules/intelligence/tools/tool_service.py (1)
app/modules/intelligence/tools/code_query_tools/bash_command_tool.py (1)
  • bash_command_tool (477-499)
app/modules/repo_manager/repo_manager_interface.py (1)
app/modules/repo_manager/repo_manager.py (9)
  • is_repo_available (260-278)
  • register_repo (280-307)
  • get_repo_path (309-327)
  • update_last_accessed (329-348)
  • get_repo_info (350-365)
  • list_available_repos (367-384)
  • evict_repo (386-419)
  • evict_stale_repos (421-450)
  • get_repo_size (452-479)
app/modules/intelligence/tools/code_query_tools/bash_command_tool.py (3)
app/modules/projects/projects_service.py (1)
  • get_project_from_db_by_id_sync (282-295)
app/modules/repo_manager/repo_manager.py (1)
  • get_repo_path (309-327)
app/modules/utils/gvisor_runner.py (2)
  • run_command_isolated (182-329)
  • CommandResult (31-37)
test_gvisor.py (1)
app/modules/utils/gvisor_runner.py (5)
  • is_gvisor_available (50-72)
  • run_command_isolated (182-329)
  • run_shell_command_isolated (660-694)
  • get_runsc_binary (40-47)
  • _is_running_in_container (160-179)
app/modules/utils/install_gvisor.py (2)
test_gvisor.py (1)
  • main (159-196)
test_gvisor_docker.py (1)
  • main (136-185)
app/modules/parsing/graph_construction/parsing_helper.py (1)
app/modules/repo_manager/repo_manager.py (5)
  • RepoManager (21-479)
  • is_repo_available (260-278)
  • update_last_accessed (329-348)
  • _get_repo_local_path (256-258)
  • register_repo (280-307)
app/modules/code_provider/repo_manager_wrapper.py (5)
app/modules/code_provider/base/code_provider_interface.py (1)
  • AuthMethod (6-12)
app/modules/repo_manager/repo_manager_interface.py (5)
  • IRepoManager (12-220)
  • get_repo_path (70-89)
  • list_available_repos (140-155)
  • is_repo_available (22-41)
  • update_last_accessed (92-110)
app/modules/code_provider/code_provider_service.py (1)
  • provider (194-196)
app/modules/intelligence/tools/code_query_tools/bash_command_tool.py (1)
  • _get_worktree_path (252-280)
app/modules/repo_manager/repo_manager.py (4)
  • get_repo_path (309-327)
  • list_available_repos (367-384)
  • is_repo_available (260-278)
  • update_last_accessed (329-348)
app/modules/utils/gvisor_runner.py (1)
app/modules/utils/install_gvisor.py (2)
  • install_gvisor (199-271)
  • get_runsc_path (274-292)
🪛 Ruff (0.14.4)
app/modules/intelligence/tools/code_query_tools/bash_command_tool.py

238-238: Do not catch blind exception: Exception

(BLE001)


245-245: Avoid specifying long messages outside the exception class

(TRY003)


247-249: Avoid specifying long messages outside the exception class

(TRY003)


408-408: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


432-437: Consider moving this statement to an else block

(TRY300)


438-438: Do not catch blind exception: Exception

(BLE001)


439-439: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


442-442: Use explicit conversion flag

Replace with conversion flag

(RUF010)


455-455: Redundant exception object included in logging.exception call

(TRY401)


458-458: Use explicit conversion flag

Replace with conversion flag

(RUF010)

test_gvisor.py

1-1: Shebang is present but file is not executable

(EXE001)


186-186: Consider moving this statement to an else block

(TRY300)


191-191: Do not catch blind exception: Exception

(BLE001)

app/modules/utils/install_gvisor.py

105-105: subprocess call: check for execution of untrusted input

(S603)


108-108: Consider moving this statement to an else block

(TRY300)


109-109: Do not catch blind exception: Exception

(BLE001)


136-136: Consider moving this statement to an else block

(TRY300)


143-143: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


144-144: Consider moving this statement to an else block

(TRY300)


146-146: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


148-148: Do not catch blind exception: Exception

(BLE001)


149-149: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


194-194: Do not catch blind exception: Exception

(BLE001)


250-250: os.chmod setting a permissive mask 0o755 on file or directory

(S103)

app/modules/parsing/graph_construction/parsing_helper.py

50-50: Do not catch blind exception: Exception

(BLE001)


900-900: Do not catch blind exception: Exception

(BLE001)


911-911: Undefined name Optional

(F821)


912-912: Undefined name Optional

(F821)


978-978: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


981-981: Undefined name Path

(F821)


995-995: Consider moving this statement to an else block

(TRY300)


1022-1022: Do not catch blind exception: Exception

(BLE001)


1029-1029: Undefined name Path

(F821)


1045-1045: Undefined name Path

(F821)


1092-1093: try-except-pass detected, consider logging the exception

(S110)


1092-1092: Do not catch blind exception: Exception

(BLE001)

app/modules/code_provider/repo_manager_wrapper.py

208-210: Avoid specifying long messages outside the exception class

(TRY003)


262-262: Consider moving this statement to an else block

(TRY300)


264-264: Do not catch blind exception: Exception

(BLE001)


316-316: Consider moving this statement to an else block

(TRY300)


318-318: Do not catch blind exception: Exception

(BLE001)


473-473: Consider moving this statement to an else block

(TRY300)


475-475: Do not catch blind exception: Exception

(BLE001)


481-481: Undefined name Repo

(F821)


514-514: Do not catch blind exception: Exception

(BLE001)


519-520: try-except-pass detected, consider logging the exception

(S110)


519-519: Do not catch blind exception: Exception

(BLE001)


539-539: Consider moving this statement to an else block

(TRY300)


541-541: Do not catch blind exception: Exception

(BLE001)


542-542: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


591-591: Do not catch blind exception: Exception

(BLE001)


596-596: Do not catch blind exception: Exception

(BLE001)


624-624: Do not catch blind exception: Exception

(BLE001)

app/modules/utils/gvisor_runner.py

177-178: try-except-pass detected, consider logging the exception

(S110)


177-177: Do not catch blind exception: Exception

(BLE001)


313-313: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


337-337: Starting a process with a partial executable path

(S607)


346-346: Starting a process with a partial executable path

(S607)


359-359: Starting a process with a partial executable path

(S607)


389-389: Consider moving this statement to an else block

(TRY300)


393-393: Do not catch blind exception: Exception

(BLE001)


407-407: Starting a process with a partial executable path

(S607)


416-416: Consider moving this statement to an else block

(TRY300)


417-417: Do not catch blind exception: Exception

(BLE001)


464-464: Probable insecure usage of temporary file or directory: "/tmp:rw,noexec,nosuid,size=100m"

(S108)


497-497: subprocess call: check for execution of untrusted input

(S603)


517-517: subprocess call: check for execution of untrusted input

(S603)


518-518: Starting a process with a partial executable path

(S607)


522-523: try-except-pass detected, consider logging the exception

(S110)


522-522: Do not catch blind exception: Exception

(BLE001)


536-536: Unused function argument: repo_path

(ARG001)


539-539: Unused function argument: runsc_path

(ARG001)


607-607: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


616-616: subprocess call: check for execution of untrusted input

(S603)

🪛 Shellcheck (0.11.0)
scripts/install_gvisor_in_docker_vm.sh

[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 84-84: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

🔇 Additional comments (5)
app/modules/intelligence/tools/change_detection/change_detection_tool.py (1)

678-680: LGTM! Good addition for observability.

The logging statement provides helpful visibility into which functions were detected as changed, making debugging and monitoring easier. The format is consistent with existing logs in the file.

app/modules/intelligence/tools/code_query_tools/bash_command_tool.py (2)

22-48: LGTM: Comprehensive security blocklist.

The blocked commands provide strong protection against write operations and network access.


315-363: LGTM: Robust directory traversal protection.

The path validation logic correctly prevents escaping the repository boundaries through multiple defensive checks.

app/modules/utils/gvisor_runner.py (2)

75-157: LGTM: Comprehensive environment variable filtering.

The multi-layered approach (whitelist + pattern blocking + value inspection) provides strong protection against secret exposure.


421-531: LGTM: Secure Docker-based gVisor implementation.

The function correctly implements defense-in-depth with read-only mounts, network isolation, environment filtering, and proper command escaping.

Comment on lines +131 to +135
if "../" in command or ".." in command.split():
return (
False,
"Accessing parent directories is not allowed for security reasons.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Overly restrictive check blocks benign pattern matching.

The check ".." in command.split() would incorrectly reject legitimate read-only commands like grep ".." file.txt or awk '/\.\./ {print}' data.txt that search for literal ".." patterns. The "../" in command check already prevents directory traversal attempts.

Apply this diff to remove the overly broad restriction:

     # Block commands that try to access parent directories
-    if "../" in command or ".." in command.split():
+    if "../" in command:
         return (
             False,
             "Accessing parent directories is not allowed for security reasons.",
         )
🤖 Prompt for AI Agents
In app/modules/intelligence/tools/code_query_tools/bash_command_tool.py around
lines 131 to 135, the condition currently blocks any command containing literal
".." tokens which rejects benign pattern-matching commands; update the guard to
only check for "../" (remove the `".." in command.split()` part) so directory
traversal is still prevented but legitimate searches for literal ".." are
allowed. Ensure the change preserves security intent and add/update tests to
cover commands like `grep ".." file.txt` and traversal attempts with "../".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
app/modules/parsing/graph_construction/parsing_helper.py (4)

979-980: Use logging.exception for better diagnostic context.

When logging an error that is being raised, logging.exception automatically includes the stack trace, which aids debugging.

Apply this diff:

         except Exception as e:
-            logger.error(f"Error creating worktree for repo manager: {e}")
+            logger.exception(f"Error creating worktree for repo manager: {e}")
             raise

Based on static analysis hints.


1021-1024: Log exception when initial commit fails.

Silent exception handling makes debugging difficult. Consider logging the failure reason.

Apply this diff:

             try:
                 base_repo.index.commit("Initial commit from parsing")
             except Exception as e:
-                logger.warning(f"Could not create initial commit: {e}")
+                logger.warning(f"Could not create initial commit: {e}", exc_info=True)

Based on static analysis hints.


992-996: Consider refactoring the early return for better code structure.

Moving the return statement to an else block makes the control flow clearer and is more idiomatic.

Apply this diff:

         if base_repo_path.exists():
             try:
                 base_repo = Repo(base_repo_path)
                 logger.info(f"Using existing base repo at {base_repo_path}")
                 return base_repo
             except InvalidGitRepositoryError:
                 logger.warning(
                     f"Path {base_repo_path} exists but is not a git repo, removing"
                 )
                 shutil.rmtree(base_repo_path)
-
-        # Create base directory
-        base_repo_path.parent.mkdir(parents=True, exist_ok=True)
+        else:
+            # Create base directory
+            base_repo_path.parent.mkdir(parents=True, exist_ok=True)

Based on static analysis hints.


1091-1094: Log the exception when commit fails.

Silent exception handling makes it difficult to diagnose issues. At minimum, log the failure.

Apply this diff:

                     try:
                         worktree_repo.index.commit(f"Initial commit for {ref}")
-                    except Exception:
-                        pass
+                    except Exception as e:
+                        logger.warning(f"Could not create initial commit for worktree: {e}")

Based on static analysis hints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6c09b3 and a8ca0a3.

📒 Files selected for processing (1)
  • app/modules/parsing/graph_construction/parsing_helper.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
app/modules/parsing/graph_construction/parsing_helper.py (1)
app/modules/repo_manager/repo_manager.py (5)
  • RepoManager (21-479)
  • is_repo_available (260-278)
  • update_last_accessed (329-348)
  • _get_repo_local_path (256-258)
  • register_repo (280-307)
🪛 Ruff (0.14.4)
app/modules/parsing/graph_construction/parsing_helper.py

51-51: Do not catch blind exception: Exception

(BLE001)


901-901: Do not catch blind exception: Exception

(BLE001)


979-979: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


996-996: Consider moving this statement to an else block

(TRY300)


1023-1023: Do not catch blind exception: Exception

(BLE001)


1093-1094: try-except-pass detected, consider logging the exception

(S110)


1093-1093: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (3)
app/modules/parsing/graph_construction/parsing_helper.py (3)

40-52: LGTM! Safe initialization pattern for optional functionality.

The lazy import and graceful fallback with logging provides a robust way to make RepoManager optional. The broad exception catch is appropriate here since any initialization failure should fall back to disabled state.


890-905: LGTM! Appropriate error handling for optional feature.

The try-except wrapper ensures that repo manager failures don't break the parsing flow. The broad exception catch is justified here since this is an optional enhancement that shouldn't block primary functionality.


1215-1323: LGTM! Well-structured commit status logic with clear distinction between parse modes.

The implementation correctly handles:

  • Pinned commit parses (explicit commit_id provided)
  • Branch-based parses (tracking latest branch commit)
  • Local repositories (always reparse)
  • Error cases (conservative fallback)

The extensive logging will help with debugging parse issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
scripts/setup_gvisor_docker.sh (1)

41-66: Do not continue after checksum verification fails for runsc.

Right now a checksum mismatch prints a warning but still proceeds to install runsc, which weakens supply‑chain guarantees and can silently install a corrupted or tampered binary.

Given this script is automating a security boundary, it should fail fast on checksum mismatch instead of “continuing anyway”. For example:

-    if [ "$EXPECTED_HASH" = "$ACTUAL_HASH" ]; then
-        echo "✓ Checksum verified"
-    else
-        echo "⚠️  Checksum verification failed, but continuing..."
-        echo "   Expected: ${EXPECTED_HASH:0:16}..."
-        echo "   Actual:   ${ACTUAL_HASH:0:16}..."
-        echo "   (This might be okay if the checksum file format is different)"
-    fi
+    if [ "$EXPECTED_HASH" = "$ACTUAL_HASH" ]; then
+        echo "✓ Checksum verified"
+    else
+        echo "❌ Checksum verification failed"
+        echo "   Expected: ${EXPECTED_HASH:0:16}..."
+        echo "   Actual:   ${ACTUAL_HASH:0:16}..."
+        echo "Aborting to avoid installing an unverified runsc binary."
+        exit 1
+    fi
@@
-else
-    # Linux uses sha512sum
-    sha512sum -c runsc.sha512 || echo "⚠️  Checksum verification failed, but continuing..."
-fi
+else
+    # Linux uses sha512sum
+    if ! sha512sum -c runsc.sha512; then
+        echo "❌ Checksum verification failed"
+        echo "Aborting to avoid installing an unverified runsc binary."
+        exit 1
+    fi
+fi

This keeps the script safe by default while still allowing us to refine checksum parsing if format issues ever arise.

app/modules/parsing/graph_construction/parsing_helper.py (1)

992-1001: Address the unresolved safety concern from previous review.

The code still removes directories unconditionally when they exist but aren't git repositories. This was flagged in a previous review as potentially dangerous—the path might contain unrelated data or the issue could be permissions-related.

Based on the previous review guidance, apply this safer approach:

             except InvalidGitRepositoryError:
-                logger.warning(
-                    f"Path {base_repo_path} exists but is not a git repo, removing"
-                )
-                shutil.rmtree(base_repo_path)
+                logger.error(
+                    f"Path {base_repo_path} exists but is not a git repo. "
+                    f"Manual cleanup required or use a different path."
+                )
+                raise ValueError(
+                    f"Path {base_repo_path} exists but is not a valid git repository"
+                )

Based on learnings

🧹 Nitpick comments (12)
.env.template (1)

75-81: Consider environment-specific defaults for Phoenix tracing.

Phoenix tracing is enabled by default (PHOENIX_ENABLED=true), which is convenient for local development but may introduce overhead in production environments. Consider documenting that production deployments should review this setting, or using a different default for production (e.g., false) with explicit enabling via environment-specific config.

test_gvisor.py (1)

1-1: Optional: Make test file executable or remove shebang.

The shebang is present at line 1, but the file is not marked executable. Either make the file executable with chmod +x test_gvisor.py to allow direct execution, or remove the shebang if the test is always run via python test_gvisor.py.

To make executable:

chmod +x test_gvisor.py
docs/gvisor_setup.md (1)

140-142: Replace bare Docker Desktop URL with a Markdown link.

To keep markdown lint‑clean and improve readability, consider turning the bare URL into a link:

-   - Download from: https://www.docker.com/products/docker-desktop
+   - Download from: [Docker Desktop](https://www.docker.com/products/docker-desktop)
README.md (1)

200-227: Normalize headings/lists to satisfy markdown linters and improve structure.

The GitHub auth options and Phoenix section currently use bold text as pseudo‑headings and have nested list indentation that markdownlint flags (MD036/MD007/MD046). Content is good; structure can be tweaked:

  • Promote the option labels to actual headings under “For GitHub.com Repositories” and for the Phoenix section, e.g.:
-**Option 1: GitHub App (Recommended for Production)**
+###### Option 1: GitHub App (Recommended for Production)

-**Option 2: Personal Access Token (PAT) Pool**
+###### Option 2: Personal Access Token (PAT) Pool

-**Option 3: Unauthenticated Access (Public Repos Only)**
+###### Option 3: Unauthenticated Access (Public Repos Only)

-**Optional: Phoenix Tracing Setup (Local Only)**
+#### Optional: Phoenix Tracing Setup (Local Only)
  • Under “For Self-Hosted Git Servers…”, you can also remove the extra leading spaces before “Set the following environment variables:” and its fenced code block so they’re left‑aligned with the heading; this will clear the MD046 “indented code block” warning.

These are purely formatting cleanups; behavior/docs semantics stay the same.

Also applies to: 268-275

app/main.py (1)

55-56: Make the broad Phoenix init exception handling explicit (or narrower).

Catching Exception here is reasonable given tracing is optional and should never block startup, but Ruff flags it. To make intent clear and avoid silently swallowing unexpected bugs, consider either:

  • Narrowing to the expected failures:
def setup_phoenix_tracing(self):
    try:
        from app.modules.intelligence.tracing.phoenix_tracer import (
            initialize_phoenix_tracing,
        )
        initialize_phoenix_tracing()
    except ImportError as e:
        logging.warning(
            "Phoenix tracing not available (missing deps); continuing without tracing: %s",
            e,
        )
    except Exception as e:  # noqa: BLE001
        logging.warning(
            "Phoenix tracing initialization failed (non-fatal): %s", e, exc_info=True
        )

or at least add a # noqa: BLE001 and exc_info=True to indicate the broad catch is intentional and keep a stack trace in logs.

Also applies to: 68-77

app/celery/celery_app.py (1)

83-99: Deduplicate Phoenix setup and clarify broad exception handling in Celery.

This Celery setup_phoenix_tracing() duplicates the logic in app.main.MainApp.setup_phoenix_tracing and also catches a broad Exception, which Ruff flags.

Two possible cleanups:

  • Factor a shared helper in phoenix_tracer.py, e.g. safe_initialize_phoenix_tracing(logger), and call it from both FastAPI and Celery startup to avoid drift.
  • As in app.main, either narrow the exception types or add # noqa: BLE001 with exc_info=True to make the non‑fatal, catch‑all behavior explicit for linters and debugging.

Behavior stays the same; this just reduces duplication and clarifies intent.

app/modules/code_provider/gitbucket/gitbucket_provider.py (1)

1047-1059: Remove unused repo binding in get_archive_link while preserving error semantics.

In get_archive_link, the repo variable is assigned but never used, which Ruff flags (F841):

try:
    repo = self._get_repo(repo_name)
    ...

Since the only purpose is to surface a GithubException if the repository doesn’t exist, you can safely drop the assignment and just call the helper for its side effect:

-        try:
-            repo = self._get_repo(repo_name)
+        try:
+            self._get_repo(repo_name)

This keeps the existing error behavior (404s still raise) while satisfying the linter.

app/modules/parsing/graph_construction/parsing_helper.py (3)

731-732: Remove commented code or document the reason.

The commented-out else branch lacks explanation. If suppressing "other" language counting is intentional, either remove these lines entirely or add a comment explaining why. Commented code creates maintenance confusion.

Apply this diff to remove the commented code:

                            elif ext in [".xml", ".xsq"]:
                                lang_count["xml"] += 1
-                            # else:
-                            #     lang_count["other"] += 1
                        except Exception as e:

978-980: Use logging.exception to preserve stack trace.

When logging inside an exception handler, use logger.exception instead of logger.error to automatically include the traceback for easier debugging.

Apply this diff:

         except Exception as e:
-            logger.error(f"Error creating worktree for repo manager: {e}")
+            logger.exception("Error creating worktree for repo manager")
             raise

1091-1094: Log the exception instead of silently passing.

The try-except-pass block silently swallows commit failures. While the worktree might still function, logging at debug level would aid troubleshooting.

Apply this diff:

                     worktree_repo.git.add(A=True)
                     try:
                         worktree_repo.index.commit(f"Initial commit for {ref}")
-                    except Exception:
-                        pass
+                    except Exception as e:
+                        logger.debug(f"Could not commit to worktree for {ref}: {e}")
app/modules/repo_manager/repo_manager.py (1)

474-477: Rename unused loop variable.

The dirnames variable in the os.walk unpacking is never used. Follow Python convention by prefixing with underscore to indicate intentionally unused.

Apply this diff:

         try:
-            for dirpath, dirnames, filenames in os.walk(local_path):
+            for dirpath, _dirnames, filenames in os.walk(local_path):
                 if ".git" in dirpath:
app/modules/code_provider/repo_manager_wrapper.py (1)

481-544: Consider moving Git import to module level for type hints.

The Repo type in the method signature at line 481 references a type that's only imported inside the function body (line 493). While this doesn't cause runtime issues, it can confuse type checkers. Consider importing Repo at the module level or using a string literal for the type hint.

Option 1: Import at module level

 import os
 import logging
+from git import Repo, GitCommandError
 from typing import List, Dict, Any, Optional

Option 2: Use string literal (if avoiding import at module level)

-    def _ensure_worktree(self, repo: "Repo", ref: str, is_commit: bool) -> str:
+    def _ensure_worktree(self, repo: 'git.Repo', ref: str, is_commit: bool) -> str:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8ca0a3 and fc5d140.

📒 Files selected for processing (25)
  • .env.template (3 hunks)
  • README.md (2 hunks)
  • app/celery/celery_app.py (1 hunks)
  • app/main.py (2 hunks)
  • app/modules/code_provider/gitbucket/gitbucket_provider.py (19 hunks)
  • app/modules/code_provider/local_repo/local_provider.py (1 hunks)
  • app/modules/code_provider/local_repo/local_repo_service.py (1 hunks)
  • app/modules/code_provider/repo_manager_wrapper.py (1 hunks)
  • app/modules/integrations/integrations_router.py (38 hunks)
  • app/modules/integrations/integrations_service.py (10 hunks)
  • app/modules/intelligence/agents/chat_agents/pydantic_agent.py (1 hunks)
  • app/modules/intelligence/tracing/phoenix_tracer.py (1 hunks)
  • app/modules/parsing/graph_construction/parsing_helper.py (4 hunks)
  • app/modules/repo_manager/repo_manager.py (1 hunks)
  • app/modules/utils/gvisor_runner.py (1 hunks)
  • deployment/stage/celery/Jenkinsfile_CELERY (2 hunks)
  • deployment/stage/mom-api/Jenkinsfile_API (2 hunks)
  • docs/gvisor_mac_setup.md (1 hunks)
  • docs/gvisor_setup.md (1 hunks)
  • docs/gvisor_usage.md (1 hunks)
  • requirements.txt (1 hunks)
  • scripts/install_gvisor_in_docker_vm.sh (1 hunks)
  • scripts/setup_gvisor_docker.sh (1 hunks)
  • scripts/verify_gvisor_docker.sh (1 hunks)
  • test_gvisor.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • deployment/stage/celery/Jenkinsfile_CELERY
  • docs/gvisor_usage.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-10T11:14:57.689Z
Learnt from: Zingzy
Repo: potpie-ai/potpie PR: 474
File: app/modules/integrations/jira_oauth.py:11-40
Timestamp: 2025-11-10T11:14:57.689Z
Learning: In the potpie repository, the JiraOAuthStore in app/modules/integrations/jira_oauth.py is intentionally in-memory and used solely for testing environments. OAuth tokens are already persisted in the database for production use.

Applied to files:

  • app/modules/integrations/integrations_service.py
🧬 Code graph analysis (11)
app/main.py (2)
app/celery/celery_app.py (1)
  • setup_phoenix_tracing (84-95)
app/modules/intelligence/tracing/phoenix_tracer.py (1)
  • initialize_phoenix_tracing (44-173)
test_gvisor.py (1)
app/modules/utils/gvisor_runner.py (6)
  • is_gvisor_available (50-73)
  • run_command_isolated (183-344)
  • run_shell_command_isolated (682-716)
  • get_runsc_binary (40-47)
  • _is_running_in_container (161-180)
  • _check_docker_available (347-410)
app/modules/utils/gvisor_runner.py (1)
app/modules/utils/install_gvisor.py (2)
  • install_gvisor (199-271)
  • get_runsc_path (274-292)
app/modules/integrations/integrations_router.py (3)
app/api/router.py (1)
  • get_api_key_user (50-81)
app/modules/auth/auth_service.py (2)
  • AuthService (14-98)
  • check_auth (49-98)
app/modules/integrations/integrations_service.py (9)
  • save_linear_integration (1522-1731)
  • get_integration_by_id (821-832)
  • save_jira_integration (1804-2031)
  • save_sentry_integration (622-819)
  • get_integrations_by_user (866-874)
  • get_integration_schema (1136-1159)
  • list_integrations_schema (1161-1203)
  • get_sentry_organizations (600-604)
  • refresh_sentry_token (158-261)
app/modules/integrations/integrations_service.py (3)
app/modules/integrations/integrations_schema.py (5)
  • SentrySaveRequest (203-221)
  • Integration (64-95)
  • IntegrationListResponse (131-139)
  • LinearSaveRequest (253-271)
  • JiraSaveRequest (295-309)
app/modules/integrations/integration_model.py (1)
  • Integration (7-43)
app/modules/integrations/integrations_router.py (1)
  • save_jira_integration (793-809)
app/modules/intelligence/tracing/phoenix_tracer.py (1)
app/modules/utils/logger.py (1)
  • setup_logger (4-15)
app/modules/code_provider/repo_manager_wrapper.py (3)
app/modules/code_provider/base/code_provider_interface.py (2)
  • ICodeProvider (15-259)
  • AuthMethod (6-12)
app/modules/repo_manager/repo_manager_interface.py (5)
  • IRepoManager (12-220)
  • get_repo_path (70-89)
  • list_available_repos (140-155)
  • is_repo_available (22-41)
  • update_last_accessed (92-110)
app/modules/repo_manager/repo_manager.py (4)
  • get_repo_path (319-337)
  • list_available_repos (377-394)
  • is_repo_available (270-288)
  • update_last_accessed (339-358)
app/modules/code_provider/gitbucket/gitbucket_provider.py (2)
app/modules/parsing/utils/repo_name_normalizer.py (1)
  • get_actual_repo_name_for_lookup (58-116)
app/modules/code_provider/code_provider_service.py (3)
  • get_repo (28-213)
  • get_repo (365-366)
  • get_commits (105-110)
app/celery/celery_app.py (2)
app/main.py (1)
  • setup_phoenix_tracing (68-76)
app/modules/intelligence/tracing/phoenix_tracer.py (1)
  • initialize_phoenix_tracing (44-173)
app/modules/parsing/graph_construction/parsing_helper.py (2)
app/modules/repo_manager/repo_manager.py (5)
  • RepoManager (21-489)
  • is_repo_available (270-288)
  • update_last_accessed (339-358)
  • _get_repo_local_path (263-268)
  • register_repo (290-317)
app/modules/repo_manager/repo_manager_interface.py (3)
  • is_repo_available (22-41)
  • update_last_accessed (92-110)
  • register_repo (44-67)
app/modules/repo_manager/repo_manager.py (1)
app/modules/repo_manager/repo_manager_interface.py (10)
  • IRepoManager (12-220)
  • is_repo_available (22-41)
  • register_repo (44-67)
  • get_repo_path (70-89)
  • update_last_accessed (92-110)
  • get_repo_info (113-137)
  • list_available_repos (140-155)
  • evict_repo (158-180)
  • evict_stale_repos (183-198)
  • get_repo_size (201-220)
🪛 dotenv-linter (4.0.0)
.env.template

[warning] 65-65: [UnorderedKey] The GH_TOKEN_LIST key should go before the GITHUB_APP_ID key

(UnorderedKey)


[warning] 79-79: [UnorderedKey] The PHOENIX_COLLECTOR_ENDPOINT key should go before the PHOENIX_ENABLED key

(UnorderedKey)

🪛 LanguageTool
docs/gvisor_mac_setup.md

[grammar] ~101-~101: Ensure spelling is correct
Context: ...when Docker Desktop is not available or runsc is not configured. ### When to Use Eac...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
README.md

206-206: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


207-207: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


208-208: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


214-214: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


215-215: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


216-216: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


220-220: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


221-221: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


223-223: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


224-224: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


225-225: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


226-226: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


230-230: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)


268-268: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/gvisor_setup.md

141-141: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.4)
app/main.py

75-75: Do not catch blind exception: Exception

(BLE001)

test_gvisor.py

1-1: Shebang is present but file is not executable

(EXE001)


204-204: Consider moving this statement to an else block

(TRY300)


209-209: Do not catch blind exception: Exception

(BLE001)

app/modules/utils/gvisor_runner.py

178-179: try-except-pass detected, consider logging the exception

(S110)


178-178: Do not catch blind exception: Exception

(BLE001)


328-328: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


352-352: Starting a process with a partial executable path

(S607)


361-361: Starting a process with a partial executable path

(S607)


374-374: Starting a process with a partial executable path

(S607)


404-404: Consider moving this statement to an else block

(TRY300)


408-408: Do not catch blind exception: Exception

(BLE001)


422-422: Starting a process with a partial executable path

(S607)


431-431: Consider moving this statement to an else block

(TRY300)


432-432: Do not catch blind exception: Exception

(BLE001)


486-486: Probable insecure usage of temporary file or directory: "/tmp:rw,noexec,nosuid,size=100m"

(S108)


519-519: subprocess call: check for execution of untrusted input

(S603)


539-539: subprocess call: check for execution of untrusted input

(S603)


540-540: Starting a process with a partial executable path

(S607)


544-545: try-except-pass detected, consider logging the exception

(S110)


544-544: Do not catch blind exception: Exception

(BLE001)


558-558: Unused function argument: repo_path

(ARG001)


561-561: Unused function argument: runsc_path

(ARG001)


629-629: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


638-638: subprocess call: check for execution of untrusted input

(S603)

app/modules/integrations/integrations_router.py

120-120: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


121-121: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


146-146: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


147-147: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


404-404: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


405-405: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


430-430: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


431-431: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


456-456: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


643-643: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


660-660: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


689-689: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


696-698: Abstract raise to an inner function

(TRY301)


726-726: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


733-735: Abstract raise to an inner function

(TRY301)


763-763: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


770-772: Abstract raise to an inner function

(TRY301)


796-796: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1316-1316: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1357-1357: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1391-1391: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1433-1433: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1442-1444: Abstract raise to an inner function

(TRY301)


1448-1451: Abstract raise to an inner function

(TRY301)


1467-1467: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1489-1492: Abstract raise to an inner function

(TRY301)


1530-1530: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1540-1542: Abstract raise to an inner function

(TRY301)


1546-1549: Abstract raise to an inner function

(TRY301)


1577-1577: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1588-1588: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1589-1589: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1613-1613: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1614-1614: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1663-1663: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1714-1714: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1730-1730: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1737-1739: Abstract raise to an inner function

(TRY301)


1744-1747: Consider moving this statement to an else block

(TRY300)


1762-1762: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1769-1771: Abstract raise to an inner function

(TRY301)


1798-1798: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1805-1807: Abstract raise to an inner function

(TRY301)


1834-1834: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1841-1843: Abstract raise to an inner function

(TRY301)


1867-1867: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1874-1876: Abstract raise to an inner function

(TRY301)


1879-1883: Consider moving this statement to an else block

(TRY300)


1897-1897: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


1904-1906: Abstract raise to an inner function

(TRY301)

app/modules/integrations/integrations_service.py

771-773: Do not call setattr with a constant attribute value. It is not any safer than normal property access.

Replace setattr with assignment

(B010)


2007-2007: Do not call setattr with a constant attribute value. It is not any safer than normal property access.

Replace setattr with assignment

(B010)

app/modules/intelligence/tracing/phoenix_tracer.py

161-161: Consider moving this statement to an else block

(TRY300)


232-232: Do not catch blind exception: Exception

(BLE001)


233-233: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

app/modules/code_provider/repo_manager_wrapper.py

208-210: Avoid specifying long messages outside the exception class

(TRY003)


262-262: Consider moving this statement to an else block

(TRY300)


264-264: Do not catch blind exception: Exception

(BLE001)


316-316: Consider moving this statement to an else block

(TRY300)


318-318: Do not catch blind exception: Exception

(BLE001)


473-473: Consider moving this statement to an else block

(TRY300)


475-475: Do not catch blind exception: Exception

(BLE001)


481-481: Undefined name Repo

(F821)


514-514: Do not catch blind exception: Exception

(BLE001)


519-520: try-except-pass detected, consider logging the exception

(S110)


519-519: Do not catch blind exception: Exception

(BLE001)


539-539: Consider moving this statement to an else block

(TRY300)


541-541: Do not catch blind exception: Exception

(BLE001)


542-542: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


591-591: Do not catch blind exception: Exception

(BLE001)


596-596: Do not catch blind exception: Exception

(BLE001)


624-624: Do not catch blind exception: Exception

(BLE001)

app/modules/code_provider/gitbucket/gitbucket_provider.py

241-241: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


1059-1059: Local variable repo is assigned to but never used

Remove assignment to unused variable repo

(F841)

app/celery/celery_app.py

92-92: Do not catch blind exception: Exception

(BLE001)

app/modules/parsing/graph_construction/parsing_helper.py

51-51: Do not catch blind exception: Exception

(BLE001)


901-901: Do not catch blind exception: Exception

(BLE001)


979-979: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


996-996: Consider moving this statement to an else block

(TRY300)


1023-1023: Do not catch blind exception: Exception

(BLE001)


1093-1094: try-except-pass detected, consider logging the exception

(S110)


1093-1093: Do not catch blind exception: Exception

(BLE001)

app/modules/repo_manager/repo_manager.py

71-71: Avoid specifying long messages outside the exception class

(TRY003)


267-267: Avoid specifying long messages outside the exception class

(TRY003)


300-300: Avoid specifying long messages outside the exception class

(TRY003)


475-475: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)

🪛 Shellcheck (0.11.0)
scripts/install_gvisor_in_docker_vm.sh

[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 87-87: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

🔇 Additional comments (40)
app/modules/intelligence/agents/chat_agents/pydantic_agent.py (1)

142-142: The instrumentation parameter is correctly configured and properly integrated.

PydanticAI's Agent accepts an instrument parameter (type: InstrumentationSettings | bool | None), and the codebase has already set up the necessary infrastructure:

  • Phoenix tracing is initialized at application startup (app/main.py line 55 calls setup_phoenix_tracing()) before any agents are created
  • All required packages are declared: pydantic_ai==0.4.3, arize-phoenix==12.12.0, arize-phoenix-otel==0.13.1, and openinference-instrumentation-pydantic-ai==0.1.9
  • The setup follows the documented pattern in app/modules/intelligence/tracing/phoenix_tracer.py which explicitly states agents should be created with instrument=True after Phoenix initialization
deployment/stage/mom-api/Jenkinsfile_API (2)

60-60: Verify the container name in your Kubernetes deployment manifest matches the kubectl command.

The Jenkinsfile references a container named mom-api on line 106:

kubectl set image deployment/mom-api-deployment mom-api=${DOCKER_REGISTRY}/mom-api:${imageTag}

The Kubernetes deployment manifest is not present in this repository. You must verify that your mom-api-deployment definition has a container named exactly mom-api, otherwise the kubectl set image command will fail to update the image.


48-48: Dockerfile does not require build arguments; however, manual verification is needed.

The api.Dockerfile contains no ARG instructions, so the current docker build command at line 48 is correct and will not fail due to missing build arguments. The simplified command is appropriate for this Dockerfile.

However, the original review comment's claims about "removed" build arguments and --no-cache flags cannot be verified in this environment due to unavailable git history. To ensure the changes are correct:

  • Confirm the actual modifications made in this PR against the base branch
  • Verify that removing --no-cache (if applicable) aligns with your deployment requirements—layer caching improves build speed but may use stale base images or dependencies
  • Ensure the image name change from mom-api-ss to mom-api matches your Kubernetes deployment expectations
app/modules/code_provider/local_repo/local_provider.py (1)

302-304: LGTM! Traversal logic correctly updated.

The change from skipping all hidden files to skipping only the .git directory is appropriate for the local repository management feature. This preserves hidden configuration files and directories that may be relevant in locally checked-out worktrees while still excluding Git metadata.

.env.template (2)

64-65: LGTM! Inline comment issue resolved.

The comment has been correctly moved to its own line above the assignment, preventing parser issues with libraries that don't strip inline comments by default.


90-95: LGTM! Git provider configuration cleaned up.

The inline comments have been properly moved to separate lines (lines 88-89, 92-95 provide context), leaving clean assignment statements that won't be misinterpreted by environment parsers.

docs/gvisor_mac_setup.md (1)

1-154: Excellent documentation for macOS gVisor setup.

This documentation provides clear, comprehensive guidance for running gVisor on macOS across multiple approaches. The Docker Desktop option is appropriately highlighted as recommended, with practical setup steps and verification commands. The fallback behavior is well-explained, and the Python usage example accurately reflects the module API.

scripts/install_gvisor_in_docker_vm.sh (1)

47-56: LGTM! Error handling properly implemented.

The script now correctly fails fast when installation doesn't succeed, addressing the previous review feedback. Both verbose and non-verbose modes provide appropriate feedback, and the script exits with status 1 on failure.

app/modules/integrations/integrations_service.py (4)

622-624: LGTM! Per-user integration ownership correctly implemented.

The Sentry integration now properly uses Potpie user_id for uniqueness and ownership, preventing duplicate integrations and ensuring proper per-user scoping. The unique identifier combining org_slug and user_id at line 769 correctly prevents the same user from integrating the same Sentry org multiple times.

Note: The Ruff warnings about setattr at lines 771-773 are false positives - this is the recommended pattern for SQLAlchemy column assignments to avoid linter issues.

Also applies to: 725-737, 769-773


866-874: LGTM! User-scoped integration retrieval implemented correctly.

The new get_integrations_by_user method properly filters integrations by the created_by field, enabling per-user views as mentioned in the AI summary. This aligns with the router changes that enforce user ownership.


1161-1183: LGTM! Per-user filtering added to list operations.

The list_integrations_schema method now supports optional user_id filtering at lines 1166 and 1173-1174, enabling both user-specific and global integration queries. This aligns with the router changes that propagate authenticated user context.


1522-1524: LGTM! Consistent per-user ownership across all integration types.

Linear and Jira integrations now properly use Potpie user_id for ownership (lines 1682, 2007) and token association (line 1901), consistent with the Sentry integration changes. The Jira in-memory token store usage at line 1901 aligns with the testing-only pattern noted in the retrieved learnings.

Note: The Ruff warning about setattr at line 2007 is a false positive - this is the recommended SQLAlchemy pattern.

Based on learnings

Also applies to: 1682-1682, 1804-1806, 1901-1901, 2007-2007

app/modules/utils/gvisor_runner.py (4)

50-74: LGTM! Availability check properly caches Docker probe.

The Docker availability check is now correctly cached in the docker_ready variable at line 67, avoiding redundant probes as requested in past reviews. The platform-specific logic appropriately handles Linux, macOS, and Windows scenarios.


76-158: Excellent security: Environment variable filtering prevents secret exposure.

The _filter_safe_environment_variables function implements defense-in-depth by combining an explicit whitelist of safe variables (lines 89-103), pattern-based blocking of sensitive variable names (lines 106-119), and value-based detection of secret-like content (lines 141-148). This comprehensive approach effectively prevents API keys, tokens, and other secrets from leaking into sandboxed commands.


229-236: LGTM! Docker-backed gVisor properly enabled without local runsc.

The code now correctly continues when runsc_path is None instead of falling back immediately, as requested in past reviews. This allows Docker Desktop/macOS/Windows setups to use the Docker runtime's built-in gVisor support even when no local runsc binary is present.


347-410: LGTM! Comprehensive Docker+gVisor availability probe.

The _check_docker_available function goes beyond configuration checks to actually test gVisor functionality with a test container (lines 372-382), catching architecture mismatches and other runtime issues. The error pattern detection (lines 384-401) provides helpful fallback when gVisor is configured but non-functional.

Note: Static analysis warnings (S607, S603, BLE001) are false positives - subprocess calls to docker are intentional and safe for this availability probe, and broad exception catching is appropriate for graceful degradation.

test_gvisor.py (2)

36-68: LGTM! Test properly handles Docker Desktop gVisor on Mac/Windows.

The test_gvisor_availability function now correctly handles the case where gVisor is available via Docker Desktop on macOS/Windows (lines 49-63). The test checks Docker availability and only asserts unavailability when Docker isn't present, addressing past review feedback that the previous unconditional assertion would fail with Docker Desktop + gVisor configured.


70-171: Excellent test coverage for gVisor command execution.

The test suite thoroughly exercises both success paths (simple commands, shell commands, working directories, explicit fallback) and error handling (non-existent commands and directories). The assertions properly validate return codes, success flags, and output content.

docs/gvisor_setup.md (1)

11-15: Environment support matrix reads clearly and matches tooling.

The K8s/Linux vs Local Linux vs Mac/Windows bullets now align with the Docker-based setup scripts (scripts/setup_gvisor_docker.sh, scripts/verify_gvisor_docker.sh) and correctly describe the Docker Desktop + runsc fallback behavior. No changes needed here.

scripts/verify_gvisor_docker.sh (1)

7-41: Verification script now correctly propagates failure to callers.

The script cleanly tracks failures via status and exits non‑zero when runsc is missing, the runtime isn’t registered, or the test container fails, which makes it safe to use in CI or automated checks. No changes needed.

app/modules/code_provider/gitbucket/gitbucket_provider.py (1)

99-117: Centralized _get_repo helper nicely fixes the normalization issues.

Routing all repo lookups through _get_repo() with get_actual_repo_name_for_lookup(..., "gitbucket") is a solid improvement over the previous scattered conversions and should resolve the earlier 404s when using normalized names. The updated usages in get_repository, branch/PR/issue methods, and file modification APIs all look consistent.

No behavioral concerns here; this is a good cleanup.

Also applies to: 136-149, 451-465, 646-671, 852-907

app/modules/parsing/graph_construction/parsing_helper.py (4)

40-52: LGTM: Graceful initialization with safe fallback.

The optional RepoManager initialization correctly implements graceful degradation—if the feature is disabled or initialization fails, the service continues without it. The broad exception catch is appropriate here since this is a non-critical enhancement.


890-904: LGTM: Robust optional repo copy with proper fallback.

The integration correctly treats the repo manager copy as optional—failures are logged but don't disrupt the main parsing flow. The broad exception catch is appropriate here since this is an enhancement that shouldn't break existing functionality.


1215-1227: LGTM: Well-documented enhancement for commit pinning.

The addition of requested_commit_id properly distinguishes between pinned-commit and branch-based parsing. The parameter is optional (backward compatible), and the implementation includes clear logging for debugging.


1247-1267: LGTM: Correct pinned commit detection logic.

The implementation properly handles the distinction between pinned commits (which should match exactly) and branch-based parsing (which compares with the latest branch commit). The logging provides good visibility into the decision-making process.

app/modules/repo_manager/repo_manager.py (5)

67-72: LGTM: Path traversal protection properly implemented.

The validation correctly prevents directory traversal attacks by rejecting paths containing ".." or absolute paths. This addresses the security concern raised in the previous review.

Based on learnings


114-156: LGTM: Robust metadata I/O with atomic writes.

The implementation correctly uses atomic writes (temp file + os.replace) to prevent corruption, and handles errors gracefully with appropriate logging. Good defensive coding.


157-208: LGTM: Clean metadata cleanup and robust iteration.

The deletion method correctly cleans up empty parent directories to prevent tree bloat, and the iteration method handles corrupt files gracefully without stopping enumeration. Good resilience.


247-261: LGTM: User-scoped keys prevent multi-tenant collisions.

The inclusion of user_id in the repository key (when provided) correctly addresses the previous review concern about cross-user collisions in multi-tenant scenarios. Each (repo, branch, commit, user) tuple now remains distinct.

Based on learnings


270-395: LGTM: Comprehensive and consistent public API.

The public methods consistently apply user filtering, validate inputs, and handle edge cases (missing entries, non-existent paths) appropriately. The list operation correctly sorts by last access time and respects the limit parameter.

app/modules/code_provider/repo_manager_wrapper.py (4)

48-210: LGTM: Complete and correct delegation.

All ICodeProvider methods are properly delegated to the wrapped provider, ensuring the wrapper maintains full compatibility. The client property gracefully handles cases where the wrapped provider doesn't expose a client attribute.


214-286: LGTM: Robust local-first file access with provider fallback.

The implementation correctly prioritizes local copies and gracefully falls back to the provider on any error. The broad exception catch is appropriate here—it ensures resilience when dealing with filesystem operations. The logging helps track whether local copies or API calls are being used.


288-338: LGTM: Consistent local-first structure retrieval.

Follows the same robust pattern as get_file_content: prioritize local copies, fall back to provider on errors, with appropriate logging. The approach is consistent and reliable.


567-599: LGTM: Correct dotfile handling preserves important config files.

The code now only skips .git (not all dotfiles), which addresses the previous review concern. This ensures .github, .gitignore, and other configuration files remain visible in the structure, maintaining parity with provider APIs.

Based on learnings

app/modules/integrations/integrations_router.py (6)

117-128: LGTM: Consistent ownership validation across status endpoints.

The authentication and ownership checks are correctly implemented across all status endpoints (Sentry, Linear, Jira). Users can only view their own integration status, with proper 403 responses for unauthorized access attempts.

Also applies to: 401-412, 639-650


452-470: LGTM: Authentication properly integrated with save operations.

The save endpoints correctly extract user_id from the authenticated user token (not from request parameters), preventing user ID spoofing. The user_id is then propagated to the service layer to ensure integrations are associated with the correct owner.

Also applies to: 792-809, 1312-1330


685-717: LGTM: Comprehensive ownership checks protect resource access.

The resource endpoints (Jira resources, projects) correctly validate that the authenticated user owns the integration before exposing any data. The explicit re-raise of HTTPException ensures proper error propagation without masking the original status codes.

Also applies to: 720-755, 758-789


1429-1460: LGTM: Proper authorization for integration CRUD operations.

The CRUD endpoints correctly enforce ownership checks before allowing access, updates, or deletion. The distinction between 404 (not found) and 403 (forbidden) responses is appropriate, and audit logging includes user IDs for traceability.

Also applies to: 1463-1522, 1525-1569


1585-1606: LGTM: Schema endpoints maintain consistent authorization.

The schema-based endpoints follow the same robust ownership validation pattern. The use of structured response objects (IntegrationResponse) ensures consistent error handling across the API.

Also applies to: 1609-1656, 1659-1705


1354-1427: LGTM: Comprehensive user-scoped access across all endpoints.

The listing and Sentry API endpoints correctly filter by authenticated user, preventing cross-user data leakage. The ownership validation is consistently applied across all resource access paths, establishing a strong security posture for the integrations module.

Also applies to: 1708-1722, 1726-1890

Comment on lines +228 to +229
# Skip .git directory only
if item["name"] == ".git":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Critical security risk: sensitive dotfiles may be exposed.

The change from excluding all hidden files to only excluding .git could expose sensitive dotfiles containing secrets, credentials, and API keys. Files like .env, .env.local, .env.production, .npmrc, .pypirc, .aws/credentials, and similar configuration files are commonly present but not always gitignored.

Relying solely on gitignore (lines 239-247) is insufficient because:

  • Not all projects maintain comprehensive gitignore files
  • Developers may forget to gitignore new secret files
  • Some local configuration files are intentionally not gitignored for development purposes

Apply this diff to add explicit exclusions for common secret-containing files:

-                # Skip .git directory only
-                if item["name"] == ".git":
+                # Skip .git and common secret-containing dotfiles/directories
+                sensitive_patterns = {
+                    ".git", ".env", ".env.local", ".env.production", ".env.development",
+                    ".npmrc", ".pypirc", ".aws", ".ssh", ".gnupg", ".docker",
+                    ".kube", ".netrc", ".credentials"
+                }
+                if item["name"] in sensitive_patterns or (
+                    item["name"].startswith(".env.") or item["name"].startswith(".secret")
+                ):
                     continue

Run the following script to verify if there are other security safeguards in the codebase that prevent exposing sensitive files:


🏁 Script executed:

#!/bin/bash
# Description: Check for security measures around file access and common secret file patterns

echo "=== Searching for access control or security checks in code provider modules ==="
rg -n --type=py -C3 'class.*Provider|def.*get.*structure|def.*get.*content' app/modules/code_provider/

echo -e "\n=== Checking if sensitive file patterns are handled elsewhere ==="
rg -n --type=py '\.env|\.aws|\.ssh|secret|credential|api[_-]?key' app/modules/code_provider/

echo -e "\n=== Looking for authentication/authorization checks ==="
rg -n --type=py -C2 'auth|permission|access.*control' app/modules/code_provider/

Length of output: 115142



---

🏁 Script executed:

```shell
#!/bin/bash
# Verify the actual code at lines 228-229 and surrounding context
echo "=== Fetching local_repo_service.py lines 220-250 ==="
sed -n '220,250p' app/modules/code_provider/local_repo/local_repo_service.py

echo -e "\n=== Checking gitignore filtering logic (lines 235-250) ==="
sed -n '235,260p' app/modules/code_provider/local_repo/local_repo_service.py

echo -e "\n=== Checking if get_file_content has access controls ==="
rg -n --type=py -A 10 'def get_file_content' app/modules/code_provider/local_repo/local_repo_service.py | head -20

echo -e "\n=== Searching for path validation or security checks in local_repo ==="
rg -n --type=py -C 3 'path.*validat|security|allowed|forbidden|restrict' app/modules/code_provider/local_repo/

Length of output: 623


🏁 Script executed:

#!/bin/bash
# Verify the actual code at lines 228-229 and surrounding context in local_repo_service.py
echo "=== Fetching local_repo_service.py lines 220-250 ==="
sed -n '220,250p' app/modules/code_provider/local_repo/local_repo_service.py

echo -e "\n=== Checking gitignore filtering logic (lines 235-270) ==="
sed -n '235,270p' app/modules/code_provider/local_repo/local_repo_service.py

echo -e "\n=== Checking if get_file_content has path validation or security checks ==="
rg -n --type=py -A 15 'def get_file_content' app/modules/code_provider/local_repo/local_repo_service.py

echo -e "\n=== Searching for any path validation or allowed/forbidden checks ==="
rg -n --type=py 'path.*validat|allowed|forbidden|restrict' app/modules/code_provider/local_repo/

Length of output: 3452


Critical security risk: sensitive dotfiles may be exposed through repository structure API.

The change at lines 228-229 now only excludes .git, but previously excluded all hidden files. This exposes sensitive dotfiles like .env, .env.local, .npmrc, .pypirc, .aws/credentials, .ssh, and .gnupg to the structure listing.

Relying solely on gitignore filtering (lines 239-247) is insufficient because:

  • Not all projects maintain comprehensive gitignore files
  • Developers may forget to gitignore new secret-containing files
  • Some local configuration files are intentionally not gitignored for development purposes
  • No downstream path validation exists in get_file_content() to prevent access

Since this service remains in active use (despite being deprecated for backward compatibility), exposed sensitive file paths could be leveraged to extract secrets from local repositories.

Add explicit exclusions for common secret-containing files:

-                # Skip .git directory only
-                if item["name"] == ".git":
-                    continue
+                # Skip .git and common secret-containing dotfiles/directories
+                sensitive_patterns = {
+                    ".git", ".env", ".env.local", ".env.production", ".env.development",
+                    ".npmrc", ".pypirc", ".aws", ".ssh", ".gnupg", ".docker",
+                    ".kube", ".netrc", ".credentials"
+                }
+                if item["name"] in sensitive_patterns or (
+                    item["name"].startswith(".env.") or item["name"].startswith(".secret")
+                ):
+                    continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Skip .git directory only
if item["name"] == ".git":
# Skip .git and common secret-containing dotfiles/directories
sensitive_patterns = {
".git", ".env", ".env.local", ".env.production", ".env.development",
".npmrc", ".pypirc", ".aws", ".ssh", ".gnupg", ".docker",
".kube", ".netrc", ".credentials"
}
if item["name"] in sensitive_patterns or (
item["name"].startswith(".env.") or item["name"].startswith(".secret")
):
continue

Comment on lines 176 to 203
def get_tracer(name: str = __name__):
"""
Get an OpenTelemetry tracer for manual span creation.
Args:
name: Name for the tracer (usually __name__ of the module)
Returns:
Tracer instance or None if Phoenix not initialized
Example:
tracer = get_tracer(__name__)
def my_function(input: str) -> str:
return process(input)
"""
try:
from opentelemetry import trace

if not _PHOENIX_INITIALIZED:
logger.debug("Phoenix not initialized, returning default tracer")

return trace.get_tracer(name)

except ImportError:
logger.debug("OpenTelemetry not available")
return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align get_tracer docstring with its actual behavior.

The docstring says:

Returns: Tracer instance or None if Phoenix not initialized

but the implementation always returns trace.get_tracer(name) whenever OpenTelemetry is importable, regardless of _PHOENIX_INITIALIZED. The only way to get None today is an ImportError.

To avoid surprising callers, either:

  • Update the docstring to reflect reality, e.g.:
    Returns:
        An OpenTelemetry tracer if OpenTelemetry is installed.
        If Phoenix tracing was not initialized, this will be the default
        global tracer (which may be effectively no-op). Returns None only
        when OpenTelemetry is not available.

or

  • Change the behavior to match the current docstring by returning None when _PHOENIX_INITIALIZED is False:
-    try:
-        from opentelemetry import trace
-
-        if not _PHOENIX_INITIALIZED:
-            logger.debug("Phoenix not initialized, returning default tracer")
-
-        return trace.get_tracer(name)
+    try:
+        from opentelemetry import trace
+
+        if not _PHOENIX_INITIALIZED:
+            logger.debug("Phoenix not initialized; returning None from get_tracer()")
+            return None
+
+        return trace.get_tracer(name)

Given this is a new API, clarifying the contract now will prevent misuses later.

🤖 Prompt for AI Agents
app/modules/intelligence/tracing/phoenix_tracer.py lines 176-203: The docstring
claims the function "Returns: Tracer instance or None if Phoenix not
initialized" but the code always returns trace.get_tracer(name) when
OpenTelemetry is importable, so update the contract to match behavior or change
behavior to return None when _PHOENIX_INITIALIZED is False. Fix by either
(preferred) editing the Returns docstring to state that an OpenTelemetry tracer
is returned when the opentelemetry package is present and None only if
opentelemetry is unavailable (noting that an uninitialized Phoenix yields the
global/default tracer), or change the function to return None early when
_PHOENIX_INITIALIZED is False and keep the current docstring; ensure tests and
callers are adjusted accordingly.

@nndn nndn requested a review from dhirenmathur November 17, 2025 13:35
dhirenmathur
dhirenmathur previously approved these changes Nov 20, 2025
@nndn nndn dismissed dhirenmathur’s stale review November 20, 2025 06:20

The merge-base changed after approval.

dhirenmathur
dhirenmathur previously approved these changes Nov 20, 2025
@nndn nndn requested review from ASCE-D and Dsantra92 November 20, 2025 06:34
@nndn nndn dismissed dhirenmathur’s stale review November 20, 2025 06:34

The merge-base changed after approval.

Dsantra92
Dsantra92 previously approved these changes Nov 20, 2025
Copy link

@Dsantra92 Dsantra92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5.7K lines of change? yes, LGTM!

@nndn nndn dismissed Dsantra92’s stale review November 20, 2025 06:40

The merge-base changed after approval.

ASCE-D
ASCE-D previously approved these changes Nov 20, 2025
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
3 Security Hotspots
D Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@nndn nndn merged commit ce3a204 into main Nov 20, 2025
2 of 3 checks passed
@nndn nndn deleted the feat/grep branch November 20, 2025 06:48
@coderabbitai coderabbitai bot mentioned this pull request Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants