Skip to content

Conversation

@nndn
Copy link
Contributor

@nndn nndn commented Nov 17, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added language server protocol (LSP) support for code analysis capabilities including definition lookup, find references, hover details, and symbol search.
    • Introduced workspace indexing for C/C++, Python, and C# projects to enable faster code queries.
    • Added repository management with worktree support for improved code organization.
  • Improvements

    • Enhanced GitHub integration with email-based user lookup.
    • Implemented API key caching for improved performance.
    • Extended code query tools for AI agent capabilities.
  • Documentation

    • Added Clangd integration guide and gVisor setup documentation.

✏️ Tip: You can customize this high-level summary in your review settings.

dhirenmathur and others added 30 commits October 21, 2025 21:36
- Fixed GitHub/GitBucket URL routing for multi-provider setup
- GH_TOKEN_LIST tokens now always use github.com API
- CODE_PROVIDER_BASE_URL only applies to CODE_PROVIDER_TOKEN
- Implemented targeted 401 fallback for invalid credentials
- Fixed rate limiting: 60 req/hr → 5,000 req/hr (authenticated)
- Fixed OAuth token KeyError with safe dict access
- Removed diagnostic endpoints (were temporary debugging tools)
- Cleaned up DEBUG logging that exposed token details
- Added comprehensive GitHub authentication documentation

Core Authentication Flow:
1. Try GitHub App (if configured)
2. Try PAT pool from GH_TOKEN_LIST (if configured)
3. Try single PAT from CODE_PROVIDER_TOKEN (if configured)
4. Fall back to unauthenticated for public repos

Rate Limiting:
- Authenticated (PAT): 5,000 requests/hour
- Unauthenticated: 60 requests/hour per IP

Fallback Behavior:
- Only 401 errors trigger unauthenticated fallback
- 404 errors propagate (no redundant retry)
- Prevents double API calls while handling invalid tokens

Files Modified:
- app/modules/code_provider/provider_factory.py (GitHub URL routing)
- app/modules/code_provider/code_provider_service.py (401 fallback)
- app/modules/code_provider/github/github_provider.py (auth logging)
- app/modules/code_provider/github/github_service.py (OAuth safety)
- app/modules/users/user_service.py (provider_info handling)
- app/main.py (removed diagnostic router)
- README.md (added auth documentation)
- .env.template (documented auth variables)

Files Deleted:
- app/diagnostic_router.py (temporary debugging endpoints)

Related Handoffs:
- thoughts/shared/handoffs/general/2025-11-05_12-06-51_github-auth-rate-limit-fix.md
- thoughts/shared/handoffs/general/2025-11-04_14-58-10_github-rate-limit-fix.md
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 17, 2025

Walkthrough

This PR introduces a comprehensive Language Server Protocol (LSP) integration framework for multi-language code analysis. It adds LSP session management (supporting Python/Jedi, C/C++/Clangd, C#/OmniSharp), compile-commands generation utilities, LSP-aware agent tools, repository manager integration for local repositories, email-based user resolution in GitHub operations, and infrastructure optimizations including API key caching.

Changes

Cohort / File(s) Summary
LSP Core Infrastructure
app/modules/intelligence/tools/code_query_tools/lsp_types.py, lsp_session.py, lsp_query_tool.py
Introduces LSP type definitions (LspMethod enum, Position, Location, HoverResult, SymbolInformation), immutable SessionKey dataclass, abstract BaseLspServerSession, and LspQueryTool/LspQueryToolInput for structured LSP querying with environment-driven language server configuration.
LSP Server Management
app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py, pygls_client_session.py
Implements global LSP server manager with singleton-like access, language registration, session lifecycle (startup/indexing/shutdown), result normalization, and PyglsClientSession for spawning language servers via stdio with initialization/diagnostics.
Language-Specific LSP Sessions
app/modules/intelligence/tools/code_query_tools/jedi_session.py, clangd_session.py
Adds JediCache with disk-backed caching for Python analysis (definitions, references, hover, symbols) and ClangdSession delegating to PyglsClientSession for C/C++ via clangd.
Build System Utilities
app/modules/intelligence/tools/code_query_tools/clangd_helpers.py
New module for compile_commands.json generation/location across CMake, Make (Bear, compiledb), and compile_flags.txt fallback with build-system detection and symlink handling.
Agent LSP Tool Integration
app/modules/intelligence/agents/chat_agents/pydantic_multi_agent.py, tool_helpers.py
Extends PydanticMultiAgent to dynamically generate and integrate five LSP-derived tools (definition, references, hover, symbols) into delegate/supervisor agents; enhances tool_helpers with LSP-specific message formatting and structured call/result display.
Code Provider & User Resolution
app/modules/code_provider/code_provider_controller.py, app/modules/code_provider/github/github_service.py
Updates GitHub service to support optional user_email parameter for email-based user resolution when user_id lookup fails, with explicit HTTPException re-raising.
Service Optimizations
app/modules/intelligence/provider/provider_service.py, app/modules/intelligence/tools/tool_service.py, app/modules/intelligence/tools/jira_tools/get_jira_project_users_tool.py
Adds API key caching in ProviderService, registers lsp_query tool, changes GetJiraProjectUsersTool user_id from int to str.
Project & Parsing Integration
app/modules/projects/projects_service.py, app/modules/parsing/graph_construction/parsing_controller.py, parsing_helper.py, parsing_service.py, parsing_repomap.py
Widens project_id signatures to accept str|int, integrates repo-manager for local repos with worktree creation, adds LSP workspace indexing post-inference, implements tarball retry logic with backoff, expands language detection for headers (.h, .hpp, .hxx).
Utilities & Infrastructure
app/modules/utils/gvisor_runner.py, .gitignore, requirements.txt
Early exit for missing runsc with fallback to regular subprocess; ignores .repos/ and .lsp_binaries/; adds pygls, jedi, compiledb dependencies.
Setup & Documentation
scripts/install_lsp_servers.sh, docs/clangd_integration_plan.md, docs/gvisor_setup.md, docs/gvisor_mac_setup.md
New script for LSP server installation (Pyright, python-lsp-server, OmniSharp) with OS-specific download/extraction; clangd integration blueprint; gVisor documentation enhancements.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Delegate/Supervisor<br/>Agent
    participant Mgr as LSP Server<br/>Manager
    participant Sess as Language-Specific<br/>Session
    participant Server as LSP Server<br/>(clangd/Jedi/etc)

    Agent->>Mgr: execute_query(LspQueryRequest)<br/>(language, method, position)
    Mgr->>Mgr: _get_or_create_session(project_id, language)
    Mgr->>Sess: ensure_started()
    Sess->>Server: initialize + wait_for_indexing
    Server-->>Sess: ready
    Sess-->>Mgr: session ready
    Mgr->>Mgr: _build_params(request)
    Mgr->>Sess: send_request(method, payload)
    Sess->>Server: LSP request (definition/references/etc)
    Server-->>Sess: raw LSP response
    Sess-->>Mgr: raw response
    Mgr->>Mgr: _normalize_* (locations/symbols/hover)
    Mgr-->>Agent: LspQueryResponse<br/>(success, locations, symbols, hover)

    Note over Sess,Server: Python: JediSession<br/>C/C++: ClangdSession<br/>C#: PyglsClientSession<br/>Others: PyglsClientSession
Loading
sequenceDiagram
    participant PC as ParsingController
    participant PH as ParsingHelper
    participant PM as RepoManager
    participant PS as ParsingService
    participant LSP as LSP Server<br/>Manager

    PC->>PC: parse_directory(repo_url, branch)
    PC->>PH: setup_project_directory(repo_url, ...)
    alt Local repo path
        PH->>PH: extract_metadata(local_repo)
        PH->>PM: copy_repo + create_worktree
        PM-->>PH: worktree_path
    else Remote repo
        PH->>PH: download_tarball (with retry+backoff)
    end
    PH-->>PC: project_id, extract_dir
    PC->>PS: parse_project(project_id)
    PS->>PS: detect_languages (file-based + GitHub API)
    PS->>LSP: index_workspace(project_id, languages)
    LSP->>LSP: register_languages + _get_or_create_session
    LSP-->>PS: indexing complete
    PS-->>PC: status (INDEXED)
    PC-->>PC: return submission status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

  • LSP Infrastructure (~30 min): Eight new modules implementing LSP types, sessions, server management, and build utilities require thorough review of type safety, session lifecycle, initialization handshakes, error handling, and concurrency patterns.
  • Agent & Tool Integration (~15 min): Review dynamic LSP tool generation, structured tool mapping, and agent toolset updates; verify proper delegation and parameter passing.
  • Parsing/Project Integration (~20 min): Examine repo-manager integration, tarball retry logic, worktree creation, language detection expansion, and LSP indexing orchestration.
  • Cross-cutting Changes (~10 min): Review service modifications (GitHub email resolution, API key caching, project_id type widening) and error handling consistency.

Areas requiring extra attention:

  • Session lifecycle and async/await patterns in LSP server manager and pygls client (initialization, warmup, indexing waits, shutdown)
  • Compile commands generation across multiple build systems (CMake, Make, Bear, compiledb) and fallback logic
  • Tarball download retry+backoff implementation and error propagation
  • Repo manager integration points: worktree creation, path resolution, and consistency with LSP queries
  • Type conversions (user_id int→str, project_id int→str) and their propagation through call chains
  • LSP result normalization and hover content shape handling (string vs dict vs structured)
  • Email-based user resolution fallback logic and HTTPException re-raising pattern
  • Concurrency guards (async locks) in LSP server manager for multi-session scenarios

Possibly related PRs

  • Feat/grep #483: Overlapping changes to .gitignore (.repos/ entry) and repo-manager functionality for local repository handling and worktree management.
  • Inference caching during parsing #487: Related infrastructure layer—both PRs involve caching strategies, with this PR adding API key caching and potential inference-cache surface integration.

Suggested reviewers

  • dhirenmathur
  • Dsantra92
  • ASCE-D

Poem

🐰 Code queries now take flight,
Through LSP's guiding light,
Jedi speaks to Clangd's song,
Sessions born and steady strong,
Language servers dance in sync,
No more parsing's missing link!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.92% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Feat/lsp servers' is vague and uses non-standard formatting. While it references LSP servers (which is central to the changeset), the title is abbreviated, lacks clarity, and doesn't convey the full scope of the substantial changes introduced. Consider using a more descriptive title such as 'Add LSP server infrastructure for code analysis' or 'Integrate language server protocol support for multi-language code querying' to better communicate the primary objective.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/lsp-servers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
deployment/stage/mom-api/Jenkinsfile_API (1)

48-60: Critical: Image name mismatch between build and push stages will cause pipeline failure.

The Build stage creates an image tagged as mom-api:${imageTag} (line 48), but the Push stage attempts to push mom-api-ss:${imageTag} (line 60). This mismatch will cause the push to fail because the mom-api-ss image was never built. Additionally, the Deploy stage (line 106) references mom-api, not mom-api-ss, creating further inconsistency.

Apply this diff to align the push target with the build target:

-                    sh "sudo docker push ${DOCKER_REGISTRY}/mom-api-ss:${imageTag}"
+                    sh "sudo docker push ${DOCKER_REGISTRY}/mom-api:${imageTag}"
deployment/stage/celery/Jenkinsfile_CELERY (1)

60-60: Critical: Image tag mismatch between build and push.

Line 48 builds the image as celery-api:${imageTag}, but this line attempts to push celery-api-ss:${imageTag}. The push will fail because the image doesn't exist with that tag.

Apply this diff to fix the tag mismatch:

-                    sh "sudo docker push ${DOCKER_REGISTRY}/celery-api-ss:${imageTag}"
+                    sh "sudo docker push ${DOCKER_REGISTRY}/celery-api:${imageTag}"
♻️ Duplicate comments (1)
app/modules/intelligence/agents/chat_agents/system_agents/code_gen_agent.py (1)

86-86: Same verification needed as in qna_agent.py.

The bash_command tool is also added to the CodeGenAgent. Please ensure the same security measures apply here, especially since code generation workflows may involve executing build commands or dependency installations that could have broader system impact.

🧹 Nitpick comments (35)
app/modules/intelligence/tools/change_detection/change_detection_tool.py (1)

678-680: Good addition for observability.

The logging statement improves debugging by showing how many functions changed and their identifiers. Consider truncating the identifiers list if it's very long to keep logs readable.

For example:

 logging.info(
-    f"[CHANGE_DETECTION] Found {len(identifiers)} changed functions: {identifiers}"
+    f"[CHANGE_DETECTION] Found {len(identifiers)} changed functions: {identifiers[:10]}{'...' if len(identifiers) > 10 else ''}"
 )
scripts/verify_gvisor_docker.sh (1)

7-30: Make verification failures visible via a non-zero exit code

Right now the script only prints messages and never exits non-zero; the last echo means callers (or CI) will see a 0 status even if runsc/runtime are missing or the test container fails. Consider explicitly exiting with 1 when checks fail and optionally adding set -euo pipefail so this can be safely used in automation.

scripts/install_lsp_servers.sh (1)

13-15: Clean up unused variables and tighten OmniSharp download signaling

PYTHON_BIN (Line 13) and DOWNLOAD_SUCCESS (Line 151) are never used, which triggers Shellcheck warnings and makes the retry logic harder to read. Either remove them or actually use DOWNLOAD_SUCCESS in the post-download check instead of relying solely on the presence of omnisharp.${EXT}.

For example, you could gate extraction on a positive DOWNLOAD_SUCCESS flag and fall back to the existing archive-exists shortcut for manual/previous downloads.

Also applies to: 101-163

scripts/install_gvisor_in_docker_vm.sh (1)

18-21: Guard cd calls into/out of the temporary directory

Shellcheck correctly points out the unguarded cd operations (Lines 20 and 84). If mktemp -d or cd - ever fail, subsequent commands run in an unexpected directory.

You can harden this with minimal change:

-TMPDIR=$(mktemp -d)
-cd "$TMPDIR"
+TMPDIR=$(mktemp -d)
+cd "$TMPDIR" || exit 1
@@
-cd - > /dev/null
+cd - > /dev/null || exit 1

Also applies to: 84-85

start.sh (1)

42-74: Consider making gVisor installation explicitly opt-in and use script exit codes

The new gVisor section always runs on startup:

  • python scripts/install_gvisor.py is invoked every time after dependency install, and
  • On Mac/Windows, scripts/install_gvisor_in_docker_vm.sh is executed and success is inferred via grep "runsc installed".

For local dev this is fine, but in slower or offline environments (or CI) it can add unexpected latency and network dependency. Two adjustments worth considering:

  1. Gate these steps behind an env flag (e.g. ENABLE_GVISOR=1) or a simple check in .env so users/CI can skip them cleanly.
  2. In addition to grepping stdout, also check the helper script’s exit code so “success” is not tied solely to a printed string.
docs/gvisor_mac_setup.md (1)

20-31: Reference the concrete helper scripts to avoid doc/tooling drift

The doc mentions using a “helper script (if available)” for installing runsc in Docker Desktop’s VM, but doesn’t name the actual helpers introduced in this repo (e.g. scripts/setup_gvisor_docker.sh and scripts/install_gvisor_in_docker_vm.sh). Explicitly referencing those scripts and their expected usage would make the Mac flow easier to follow and reduce the risk that the docs drift away from the implemented setup.

Also applies to: 92-123

app/modules/utils/install_gvisor.py (1)

29-62: Short-circuit non-Linux hosts and narrow exception handling for clearer behavior

A couple of smaller points in the install flow:

  1. Non-Linux hosts
    get_architecture warns when platform.system().lower() != "linux" but still returns an architecture (Lines 54‑60). On macOS/Windows, the downloaded Linux runsc will never successfully run, so install_gvisor will always fail verification and leave a non-working binary in the install path. Consider returning None (and having install_gvisor fail fast) for non-Linux systems so the behavior is explicit and no dead binaries are left behind.

  2. Exception handling and logging
    Several places (check_runsc_installed, download_file, verify_checksum, install_gvisor) catch bare Exception and log generic errors. Narrowing these to expected exceptions where possible and using logging.exception for unexpected ones would improve debuggability without changing semantics.

These changes would make the install behavior on unsupported platforms more predictable and keep error reporting more actionable.

Also applies to: 199-216, 274-292

app/modules/repo_manager/repo_manager_interface.py (1)

12-220: IRepoManager interface aligns cleanly with RepoManager implementation

The abstract method set and parameter ordering match the concrete RepoManager methods shown in app/modules/repo_manager/repo_manager.py, so this gives a clear, consistent contract for callers without changing behavior.

If you want to tighten types later, consider introducing a TypedDict or pydantic model for the repo info dicts returned by get_repo_info / list_available_repos instead of using Dict[str, Any] everywhere.

app/modules/projects/projects_service.py (3)

267-280: Project ID handling in by-id accessors looks consistent

Allowing project_id: str | int and deferring to get_project_by_id (which casts to str(project_id)) keeps behavior consistent with Project.id being a text column while giving callers more flexibility. The async and sync variants also now return the same shape (including repo_path / user_id), which is good for downstream consumers.

You might add an explicit return type like -> dict[str, Any] | None to make the dict shape and None possibility clearer to callers.

Also applies to: 282-297


299-317: ID casting in repo-details and by-id-and-user helpers is now robust

Using Project.id == str(project_id) in both get_project_repo_details_from_db and get_project_from_db_by_id_and_user_id aligns with the new str | int annotations and the underlying text PK, so lookups will work regardless of whether callers pass an int or string.

For consistency, consider updating the type hints of other methods that still advertise project_id: int (e.g., get_repo_and_branch_name, update_project_status) to also accept str | int, since they ultimately pass through to these helpers.

Also applies to: 331-346


348-350: Helper/query logic is correct; minor optimization possible in update_project

get_project_by_id’s str(project_id) cast is consistent with the rest of the service and centralizes the PK handling. In update_project, you first query for the project, then execute a separate update with the same filter. That’s functionally fine but does one extra round-trip.

You could simplify update_project to load the project once, mutate its attributes from kwargs, and commit, or reuse the initial query for the update to avoid the duplicate filter.

Also applies to: 371-379

app/modules/intelligence/tools/tool_service.py (2)

126-132: Be explicit about auto-injecting lsp_query from get_tools

get_tools now always appends the lsp_query tool to the returned list when configured, even if callers didn’t request it in tool_names. That’s probably intentional (ensuring a base LSP tool is always present), but it does slightly change the contract of get_tools.

If you want to keep semantics strictly “only tools requested by name,” consider gating this behind an explicit flag or a separate helper (e.g., get_tools_with_lsp_base). Otherwise, a small comment in the docstring explaining this behavior would help future readers.


192-239: Clean up repeated # type: ignore[assignment] on tool registrations

Several tool registrations (todo_tools, code_changes_tools, webpage_extractor, code_provider_*, web_search_tool) rely on # type: ignore[assignment] when assigning into the tools: Dict[str, StructuredTool] map. This works, but it hides type drift between the declared mapping type and the actual values.

If feasible, consider tightening the types at the source (e.g., ensuring these helpers all return StructuredTool) or relaxing the dict annotation (e.g., Dict[str, StructuredTool]Dict[str, StructuredTool] | Dict[str, Any] via a Protocol/wrapper) so these ignores are unnecessary.

app/modules/parsing/graph_construction/parsing_controller.py (1)

279-351: Consider tightening _check_repo_indexed responsibilities and error handling

The _check_repo_indexed helper is doing a lot in one place: reading env, instantiating RepoManager, hitting the DB again, detecting languages, and consulting the LSP server manager, with a broad except Exception that turns any error into “not indexed.”

This is reasonable as a safety net, but a couple of minor improvements would make behavior clearer:

  • Treat REPO_MANAGER_ENABLED as a precondition (handled in parse_directory as above) and drop the early env check here, so the function’s only concern is “given a repo/worktree, is it indexed?”
  • Narrow the exception handling to expected failure modes (e.g., filesystem/JSON/DB/LSP-related errors) or at least log a more explicit marker when an unexpected exception path is hit, so configuration bugs don’t silently degrade into infinite re-parsing attempts.

These are optional, but would make future debugging of LSP indexing issues easier.

test_gvisor.py (1)

1-5: Shebang vs. execution mode

There’s a shebang (#!/usr/bin/env python3) but the file isn’t necessarily executable in the repo, which static analysis is flagging. If this script is meant to be run via python test_gvisor.py, you can safely drop the shebang to keep Ruff happy:

-#!/usr/bin/env python3
-"""
-Test script for gVisor functionality.
-Tests that gVisor detection and fallback work correctly on Mac/Windows.
-"""
+"""
+Test script for gVisor functionality.
+Tests that gVisor detection and fallback work correctly on Mac/Windows.
+"""

Alternatively, ensure the file is checked in as executable.

test_gvisor_docker.py (1)

1-12: Tighten project-root resolution and consider removing the shebang.

project_root = Path(__file__).parent only adds the directory containing this file; if tests live under a subdirectory, imports like app.modules... may still fail. Consider resolving the actual repo root (e.g., Path(__file__).resolve().parents[1] or relying on installed package paths instead of mutating sys.path. Also, since this file is intended to be run via the Python interpreter (and not marked executable in the repo), the shebang is not strictly needed and can be dropped to satisfy EXE001.

app/modules/intelligence/agents/chat_agents/tool_helpers.py (1)

273-279: Good structured output for LSP and bash tools; be mindful of content contracts.

The new bash_command and LSP branches produce clear, truncated summaries (command + working dir, LSP project/language/target, status messages, locations/symbol counts, hover preview, stdout/stderr on failures). This is a solid UX. The only caveat is that get_tool_result_info_content assumes the LSP payloads include success, method, and the documented fields—if upstream tooling ever omits success, the code will treat it as an error and show “Unknown error”. If that’s acceptable, no changes needed; otherwise consider defaulting success to True when method-specific data is present.

Also applies to: 431-444, 558-594, 595-634

app/modules/intelligence/tools/code_query_tools/lsp_session.py (1)

15-45: Core LSP session abstractions are clean; consider minor cleanup.

SessionKey and BaseLspServerSession define a straightforward contract that matches the concrete implementations. You can optionally remove the unused BaseModel import or convert BaseLspServerSession into an ABC with @abstractmethod decorators for stricter typing, but the current design is functionally fine.

app/modules/parsing/graph_construction/parsing_service.py (2)

115-192: Language detection flow is solid but GitHub languages semantics and duplication are worth revisiting.

The combination of file-based detection plus filtered GitHub get_languages() results is a good approach, and preferring file_based_language for language keeps behavior predictable. Two nuances to double‑check:

  • repo.get_languages() (for GitHub repos) typically returns byte counts per language, not percentages; using percentage > 5 is effectively “more than 5 bytes of code”. If you truly want a percentage threshold (e.g., >5% of the codebase), you’ll need to normalize by the total and adjust the variable naming.
  • The supported_languages set here duplicates the language set maintained by detect_repo_language. If that list changes in parsing_helper, this code could silently diverge. Consider centralizing the allowed-language list (or exposing it from ParseHelper) instead of re‑declaring it.

193-274: LSP indexing is now a hard dependency for parsing when enabled; confirm that this failure mode is desired.

The new _index_workspace_with_lsp integration makes parsing fail (HTTP 500 + ERROR status + Slack notification) whenever:

  • REPO_MANAGER_ENABLED is true,
  • a RepoManager exists and at least one language is registered with the LSP server manager,
  • and any of those languages fail index_workspace.

That’s a strong coupling: transient LSP issues (e.g., misconfigured server, missing executable, indexing flakiness) will mark otherwise successfully parsed projects as failed. If the intent is “LSP is mandatory for supported languages”, this is fine; otherwise, you might want to:

  • Treat indexing failures as a degraded-but-usable state (log and notify Slack, but still return success), or
  • Gate _index_workspace_with_lsp behind a separate feature flag or project-level setting.

Also, _index_workspace_with_lsp instantiates LspQueryTool(self.db, "") solely to register servers; that’s reasonable, but it does rely on RepoManager being available there as well as via self.parse_helper.repo_manager. If those ever diverge, parses will fail even though graph construction succeeded.

Also applies to: 313-443

app/modules/intelligence/tools/code_query_tools/clangd_session.py (1)

29-77: Forwarding to PyglsClientSession looks correct; minor cleanup around unused state.

The ClangdSession wrapper correctly delegates lifecycle and requests to PyglsClientSession, relying on clangd’s own cache. The cache_dir argument and the _initialized flag on this class are currently unused beyond assignments; you can either document them explicitly as legacy/compatibility artifacts or drop them to avoid confusion, since the underlying client already tracks initialization.

app/modules/intelligence/agents/chat_agents/pydantic_multi_agent.py (1)

56-92: LSP tool wiring and validation look solid; consider small robustness tweaks.

The LSP tool specs and _create_lsp_method_tools implementation are correct and nicely validate position/query requirements, and the closure default arguments avoid the usual loop-binding bug. The integration into delegate/supervisor tools via _lsp_method_tools is also clean.

Two small optional improvements:

  • In __init__, consider stripping all whitespace (\s+) rather than just spaces when normalizing tool names for _structured_tool_map.
  • Optionally assert/log once at startup that an lsp_query StructuredTool is present in self.tools so misconfiguration fails fast rather than only logging on first use.

Also applies to: 338-345, 672-755, 767-770, 800-806

app/modules/parsing/graph_construction/parsing_helper.py (2)

40-53: RepoManager initialization is safe but catching Exception is broader than necessary.

Lazy, env‑gated RepoManager setup with a warning on failure is a good pattern. You might narrow the except Exception to ImportError and configuration‑related errors if you want to avoid masking unexpected bugs in initialization, but the current behavior is acceptable for best‑effort integration.


331-373: Tarball retry/backoff logic is correct but duplicated in two branches.

The new connection‑retry loops for GitBucket Basic Auth and the generic header‑based path correctly:

  • Retry on ConnectTimeout/ConnectionError with simple backoff.
  • Use extended read timeouts suitable for large archives.
  • Preserve HTTP‑error semantics via raise_for_status and outer HTTPError handling.

If you want to reduce maintenance overhead, consider extracting the retrying requests.get(...) + backoff into a small helper so GitBucket and non‑GitBucket paths share the same implementation.

Also applies to: 383-423, 447-457

app/modules/utils/gvisor_runner.py (3)

50-73: gVisor availability check works but is heavier than necessary.

The platform‑aware is_gvisor_available logic is sound, but two things to consider:

  • is_gvisor_available calls _check_docker_available() twice on Mac/Windows; you can store the result once to avoid redundant Docker invocations.
  • _check_docker_available always runs a docker run --runtime=runsc busybox ... test container, which is relatively expensive (and may trigger an image pull) if called frequently.

You might want to:

  • Cache the result of _check_docker_available for some period, or
  • Make the “test container” run optional (e.g., gated by an env flag) and rely primarily on docker info runtime inspection for common paths.

Also applies to: 332-395


182-239: Isolation orchestration is correct but _run_with_runsc_direct never actually uses runsc.

run_command_isolated correctly:

  • Applies the use_gvisor flag.
  • Falls back when gVisor/runsc is not available.
  • Chooses between container‑aware and host execution paths.

However, _run_with_runsc_direct currently just logs and calls _run_command_regular, completely ignoring the runsc_path argument. That means:

  • In container/K8s and Linux‑host “direct runsc” paths, you never actually get an extra gVisor layer beyond the container itself.

This is logically consistent with the comments (OCI bundle complexity), but the function name and signature are misleading. Consider either:

  • Implementing a minimal real runsc execution path using OCI bundles, or
  • Renaming and removing the unused arguments (repo_path, runsc_path) so it’s explicit that these branches intentionally fall back to non‑gVisor execution.

Also, note that on any exception inside the gVisor branches, you fall back to _run_command_regular, which may or may not be desirable from a security posture standpoint. You might want a configuration flag to “fail closed” instead of running unsandboxed if gVisor setup fails.

Also applies to: 248-311, 533-573


75-157: Environment filtering and fallback subprocess execution are generally safe and well‑contained.

The _filter_safe_environment_variables and _run_command_regular implementations are thoughtfully defensive:

  • Only a small safe set of env vars is always allowed; others are filtered out based on name patterns and value heuristics to reduce secret leakage.
  • The fallback subprocess environment is a minimal, sanitized baseline (PATH, fake HOME, USER, SHELL, TERM), which is appropriate for sandboxed tooling.
  • Docker+gVisor runs mount working_dir/repo_path read‑only, disable network, and use a tiny image (busybox), all of which are good isolation defaults.

If you want to further tighten things:

  • Consider making the safe env var allow‑list configurable (e.g., via settings) so operators can explicitly add non‑secret variables as needed.
  • Optionally log when you drop env vars due to heuristic value checks at a lower level (e.g., debug) for easier troubleshooting.

Also applies to: 421-491, 576-657

app/modules/intelligence/tools/code_query_tools/jedi_session.py (2)

34-305: JediCache design is solid; consider a lightweight invalidation story for workspace symbols.

The caching layer around Jedi is well done:

  • Per‑file caches are keyed by a hash of the file contents plus line/column/operation, so they naturally invalidate on content changes.
  • All Jedi calls are wrapped in try/except with logging, returning empty results on failure instead of blowing up the caller.
  • Hover formatting into a contents array matches the HoverResult.from_lsp expectations.

For workspace symbols, index_workspace short‑circuits if workspace_symbols.json already exists, which means the index never refreshes even if the project changes. If this becomes an issue, you could add:

  • A timestamp in the cache file and a simple age threshold, or
  • A checksum of the listed Python files to trigger reindexing when the set changes.

307-374: JediSession’s LSP method routing and threading model look correct.

JediSession.send_request correctly maps:

  • DEFINITIONget_definitions
  • REFERENCESget_references
  • HOVERget_hover
  • DOCUMENT_SYMBOLget_document_symbols
  • WORKSPACE_SYMBOLget_workspace_symbols

using LSP’s 0‑based line/character positions and letting JediCache handle the conversion to Jedi’s 1‑based API. The executor‑based index_workspace avoids blocking the event loop, which is good.

Only minor nit: for unsupported methods you raise ValueError(f"Unsupported method: {method}"); if this ever propagates out of the LSP orchestration layer, you might want a more domain‑specific exception type, but it’s not required.

app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (1)

372-389: Propagate user_id into RepoManager calls for stricter isolation

RepoManager.get_repo_path and update_last_accessed support a user_id filter, but _execute_async does not pass it. You already validate the project’s user_id via ProjectService, so wiring that through keeps repo metadata usage consistent and avoids accidental cross‑user reuse if you ever support per‑user worktrees.

For example:

-        worktree_path = self.repo_manager.get_repo_path(
-            repo_name, branch=branch, commit_id=commit_id
-        )
+        worktree_path = self.repo_manager.get_repo_path(
+            repo_name,
+            branch=branch,
+            commit_id=commit_id,
+            user_id=details.get("user_id"),
+        )
@@
-            self.repo_manager.update_last_accessed(
-                repo_name, branch=branch, commit_id=commit_id
-            )
+            self.repo_manager.update_last_accessed(
+                repo_name,
+                branch=branch,
+                commit_id=commit_id,
+                user_id=details.get("user_id"),
+            )

This keeps the RepoManager’s user scoping aligned with your project scoping.

app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (1)

59-71: reset_lsp_server_manager leaves sessions running; consider cleanup hook

reset_lsp_server_manager currently just drops the global reference and does not attempt to shut down active sessions, so any underlying language server processes (pygls, clangd, OmniSharp) will continue running until process exit.

If this is used in tests or long‑running workers, you may want to:

  • Add an async shutdown_all() method on LspServerManager that iterates sessions and calls await session.shutdown().
  • Provide a sync wrapper in reset_lsp_server_manager that uses asyncio.run() in non‑event‑loop contexts for test cleanup.

At minimum, logging instead of silent pass in the inner try/except would help visibility when cleanup fails.

app/modules/intelligence/tools/code_query_tools/clangd_helpers.py (1)

71-205: Subprocess flow for compile_commands generation looks sound

The helpers build cmake / bear / compiledb commands from fixed tokens, only varying cwd (workspace_root), and handle failures by logging and falling back to other strategies or compile_flags.txt. There’s no untrusted shell concatenation here, so the security lint warnings about subprocess usage are low‑risk in this context.

If you want to tighten things further later, the main improvement would be narrowing some except Exception blocks to OSError/subprocess.TimeoutExpired and logging stderr more prominently, but the current behavior is acceptable.

Also applies to: 207-258, 260-372

app/modules/repo_manager/repo_manager.py (1)

61-70: Consider sanitizing repo_name when constructing metadata paths

You already sanitize branch/commit when building metadata filenames, but _metadata_dir and _get_repo_local_path use Path(repo_name) directly:

def _metadata_dir(self, repo_name: str) -> Path:
    return self.metadata_base_path / Path(repo_name)

def _get_repo_local_path(self, repo_name: str) -> Path:
    return self.repos_base_path / Path(repo_name)

If repo_name ever contained .. or absolute paths (e.g., via bad upstream data), this could escape the intended .repos / .meta trees. If upstream guarantees owner/repo‑style names, this is mostly theoretical, but it’s cheap to harden:

-    def _metadata_dir(self, repo_name: str) -> Path:
-        """Return the metadata directory for a given repository."""
-        return self.metadata_base_path / Path(repo_name)
+    def _metadata_dir(self, repo_name: str) -> Path:
+        """Return the metadata directory for a given repository."""
+        safe = Path(self._sanitize_for_filename(repo_name))
+        return self.metadata_base_path / safe
@@
-    def _get_repo_local_path(self, repo_name: str) -> Path:
-        """Expose repository location for callers that already rely on it."""
-        return self.repos_base_path / Path(repo_name)
+    def _get_repo_local_path(self, repo_name: str) -> Path:
+        """Expose repository location for callers that already rely on it."""
+        safe = Path(self._sanitize_for_filename(repo_name))
+        return self.repos_base_path / safe

This keeps metadata and repo directories safely contained even if repo_name is ever malformed.

Also applies to: 256-259

app/modules/code_provider/repo_manager_wrapper.py (2)

342-480: Heuristic branch/commit detection and stale docstring

The commit-vs-branch detection is a simple hex-length heuristic; that’s probably fine, but note it will treat any hex-like branch name (e.g. deadbeef) as a commit. Also, the _get_worktree_path docstring still mentions “registered in Redis”, which no longer matches the IRepoManager filesystem-backed implementation and could confuse future maintainers.

If you want to tighten things up:

  • Clarify in the docstring that you’re consulting IRepoManager metadata/filesystem, not Redis.
  • Optionally expose branch vs commit explicitly at call sites (or add a small helper) instead of relying solely on the hex heuristic.

Also applies to: 601-623


481-545: Broad except Exception and silent passes make debugging harder

Several places use bare except Exception (and one try/except with pass) around filesystem and Git operations, then silently fall back. That’s reasonable for resiliency, but it also risks masking real bugs and makes triaging failures harder.

Consider tightening this pattern, for example:

  • Catch more specific exception types where feasible (e.g., OSError, GitCommandError).
  • Replace logger.error(...); on failures with logger.exception(...) to preserve stack traces.
  • In the inner try/except that currently passes when removing an invalid worktree, log at least a debug/warning message.

Illustrative changes:

-            except Exception:
-                # Worktree exists but is invalid, remove it
-                logger.warning(f"Invalid worktree at {worktree_dir}, removing")
-                try:
-                    repo.git.worktree("remove", worktree_dir, force=True)
-                except Exception:
-                    pass
+            except Exception:
+                # Worktree exists but is invalid, try to remove it
+                logger.warning("Invalid worktree at %s, removing", worktree_dir)
+                try:
+                    repo.git.worktree("remove", worktree_dir, force=True)
+                except Exception as remove_exc:
+                    logger.debug(
+                        "Failed to remove invalid worktree at %s: %s",
+                        worktree_dir,
+                        remove_exc,
+                    )
@@
-        except Exception as e:
-            logger.error(f"Failed to create worktree for {ref}: {e}")
+        except Exception:
+            logger.exception("Failed to create worktree for %s", ref)
@@
-        except Exception as e:
-            logger.warning(f"Error building structure from {full_path}: {e}")
+        except Exception:
+            logger.exception("Error building structure from %s", full_path)
@@
-        except Exception as e:
-            logger.debug(f"Failed to update last accessed time: {e}")
+        except Exception:
+            logger.debug("Failed to update last accessed time", exc_info=True)

This preserves your graceful fallbacks while making operational issues more observable.

Also applies to: 514-520, 541-543, 596-598, 624-625

Comment on lines +170 to +171
except HTTPException as http_exc:
raise http_exc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use bare raise to preserve exception traceback.

When re-raising an exception, use bare raise instead of raise http_exc to preserve the original traceback, which aids debugging.

As per coding guidelines

Apply this diff:

         except HTTPException as http_exc:
-            raise http_exc
+            raise
         except Exception as e:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except HTTPException as http_exc:
raise http_exc
except HTTPException as http_exc:
raise
🧰 Tools
🪛 Ruff (0.14.4)

171-171: Use raise without specifying exception name

Remove exception name

(TRY201)

🤖 Prompt for AI Agents
In app/modules/code_provider/code_provider_controller.py around lines 170-171,
the except block currently re-raises the caught HTTPException using "raise
http_exc", which loses the original traceback; replace that re-raise with a bare
"raise" so the original exception traceback is preserved for debugging and
logging consistency.

Comment on lines 214 to 286
def get_file_content(
self,
repo_name: str,
file_path: str,
ref: Optional[str] = None,
start_line: Optional[int] = None,
end_line: Optional[int] = None,
) -> str:
"""
Get file content from local copy if available, otherwise fallback to provider.
Uses git worktree to access the correct branch/commit.
"""
# Try to get local copy path
worktree_path = self._get_worktree_path(repo_name, ref)
if worktree_path:
try:
# Update last accessed time
self._update_last_accessed(repo_name, ref)

# Read file from local filesystem
full_path = os.path.join(worktree_path, file_path)
if not os.path.exists(full_path):
logger.info(
f"[REPO_MANAGER] File {file_path} not found in local copy at {full_path}, "
f"falling back to provider API for {repo_name}@{ref}"
)
logger.info(
f"[PROVIDER_API] Fetching file content: {repo_name}/{file_path}@{ref}"
)
return self._provider.get_file_content(
repo_name, file_path, ref, start_line, end_line
)

with open(full_path, "r", encoding="utf-8", errors="replace") as f:
content = f.read()

# Apply line range if specified
if start_line is not None or end_line is not None:
lines = content.split("\n")
start = (start_line - 1) if start_line is not None else 0
end = end_line if end_line is not None else len(lines)
content = "\n".join(lines[start:end])

logger.info(
f"[REPO_MANAGER] Retrieved file content from local copy: "
f"{repo_name}/{file_path}@{ref} (path: {full_path})"
)
return content

except Exception as e:
logger.warning(
f"[REPO_MANAGER] Error reading file from local copy: {e}, "
f"falling back to provider API for {repo_name}/{file_path}@{ref}"
)
logger.info(
f"[PROVIDER_API] Fetching file content: {repo_name}/{file_path}@{ref}"
)
return self._provider.get_file_content(
repo_name, file_path, ref, start_line, end_line
)

# Fallback to provider
logger.info(
f"[REPO_MANAGER] No local copy available for {repo_name}@{ref}, "
f"using provider API"
)
logger.info(
f"[PROVIDER_API] Fetching file content: {repo_name}/{file_path}@{ref}"
)
return self._provider.get_file_content(
repo_name, file_path, ref, start_line, end_line
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Path traversal risk when joining file_path with worktree_path

get_file_content and _build_structure_from_filesystem join user-controlled file_path/path directly to the local repo root without normalizing or enforcing that the final path stays under the repository. A malicious or buggy caller could pass values like ../../etc/passwd or an absolute path and cause the server to read or list files outside the repo root, which is a serious security issue.

Recommend normalizing and enforcing containment before accessing the filesystem, e.g.:

@@ def get_file_content(
-                # Read file from local filesystem
-                full_path = os.path.join(worktree_path, file_path)
-                if not os.path.exists(full_path):
+                # Resolve and validate file path under repo root to prevent traversal
+                base_path = os.path.realpath(worktree_path)
+                requested_path = os.path.realpath(
+                    os.path.join(base_path, file_path)
+                )
+                # Ensure requested_path stays within the repository
+                if os.path.commonpath([base_path, requested_path]) != base_path:
+                    logger.warning(
+                        "[REPO_MANAGER] Unsafe file path %s for repo %s@%s; "
+                        "falling back to provider API",
+                        file_path,
+                        repo_name,
+                        ref,
+                    )
+                    logger.info(
+                        "[PROVIDER_API] Fetching file content: %s/%s@%s",
+                        repo_name,
+                        file_path,
+                        ref,
+                    )
+                    return self._provider.get_file_content(
+                        repo_name, file_path, ref, start_line, end_line
+                    )
+
+                full_path = requested_path
+                if not os.path.exists(full_path):
@@ def _build_structure_from_filesystem(
-        structure = []
-        full_path = os.path.join(base_path, path) if path else base_path
+        structure = []
+        base_real = os.path.realpath(base_path)
+        full_path = (
+            os.path.realpath(os.path.join(base_real, path)) if path else base_real
+        )
+
+        # Prevent traversal outside the repository root
+        if os.path.commonpath([base_real, full_path]) != base_real:
+            logger.warning(
+                "Unsafe path %s when building structure under %s; skipping",
+                path,
+                base_real,
+            )
+            return structure

This keeps local access semantics aligned with the remote provider’s expectation that paths are repository-relative, while protecting the host filesystem.

Also applies to: 546-595

🧰 Tools
🪛 Ruff (0.14.4)

262-262: Consider moving this statement to an else block

(TRY300)


264-264: Do not catch blind exception: Exception

(BLE001)

Comment on lines 288 to 338
def get_repository_structure(
self,
repo_name: str,
path: str = "",
ref: Optional[str] = None,
max_depth: int = 4,
) -> List[Dict[str, Any]]:
"""
Get repository structure from local copy if available, otherwise fallback to provider.
Uses git worktree to access the correct branch/commit.
"""
# Try to get local copy path
worktree_path = self._get_worktree_path(repo_name, ref)
if worktree_path:
try:
# Update last accessed time
self._update_last_accessed(repo_name, ref)

# Build structure from local filesystem
structure = self._build_structure_from_filesystem(
worktree_path, path, max_depth
)

logger.info(
f"[REPO_MANAGER] Retrieved repository structure from local copy: "
f"{repo_name}@{ref} (path: {worktree_path}, depth: {max_depth})"
)
return structure

except Exception as e:
logger.warning(
f"[REPO_MANAGER] Error reading structure from local copy: {e}, "
f"falling back to provider API for {repo_name}@{ref}"
)
logger.info(
f"[PROVIDER_API] Fetching repository structure: {repo_name}@{ref} (path: {path})"
)
return self._provider.get_repository_structure(
repo_name, path, ref, max_depth
)

# Fallback to provider
logger.info(
f"[REPO_MANAGER] No local copy available for {repo_name}@{ref}, "
f"using provider API"
)
logger.info(
f"[PROVIDER_API] Fetching repository structure: {repo_name}@{ref} (path: {path})"
)
return self._provider.get_repository_structure(repo_name, path, ref, max_depth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Repository-structure listing should reuse the same path-safety checks

Once you harden _build_structure_from_filesystem against traversal, get_repository_structure’s use of path will inherit that safety; just ensure you don’t bypass _build_structure_from_filesystem elsewhere and that callers treat path as repo-relative only.


🏁 Script executed:

#!/bin/bash
# Quick grep to ensure `_build_structure_from_filesystem` is the only place
# walking local directories for repo structure.
rg -n "_build_structure_from_filesystem" app

Length of output: 357


🏁 Script executed:

#!/bin/bash
# Examine the _build_structure_from_filesystem implementation
# and how the path parameter is used
sed -n '546,650p' app/modules/code_provider/repo_manager_wrapper.py

Length of output: 2775


🏁 Script executed:

#!/bin/bash
# Also check the exact context around line 308 where the function is called
sed -n '300,320p' app/modules/code_provider/repo_manager_wrapper.py

Length of output: 845


🏁 Script executed:

#!/bin/bash
# Search for any other directory/filesystem walking patterns in the file
rg -n -i "listdir|walk|glob|Path|open.*r\)" app/modules/code_provider/repo_manager_wrapper.py | head -30

Length of output: 1967


🏁 Script executed:

#!/bin/bash
# Search for any path validation or normalization logic
# that might handle the `path` parameter before it's used
rg -n -B5 -A5 "get_repository_structure" app/modules/code_provider/repo_manager_wrapper.py | head -50

Length of output: 1719


🏁 Script executed:

#!/bin/bash
# Check if there are any normalization utilities used elsewhere
rg -n "normpath|realpath|abspath|sanitize|traversal" app/modules/code_provider/

Length of output: 871


🏁 Script executed:

#!/bin/bash
# Look for any path validation in the provider interface or base class
find app -name "*.py" -type f -exec grep -l "get_repository_structure" {} \;

Length of output: 323


🏁 Script executed:

#!/bin/bash
# Check the interface definition
cat app/modules/code_provider/base/code_provider_interface.py | grep -A20 "get_repository_structure"

Length of output: 639


🏁 Script executed:

#!/bin/bash
# Check how local_provider handles path for repository structure
sed -n '1,50p' app/modules/code_provider/local_repo/local_provider.py

Length of output: 1620


🏁 Script executed:

#!/bin/bash
# Search for get_repository_structure implementation in local_provider
rg -n -A30 "def get_repository_structure" app/modules/code_provider/local_repo/local_provider.py

Length of output: 1101


🏁 Script executed:

#!/bin/bash
# Check if worktree paths are sandboxed or validated
rg -n -A10 "_get_worktree_path" app/modules/code_provider/repo_manager_wrapper.py

Length of output: 1461


Hardening required for path traversal protection

The current code is vulnerable to directory traversal. _build_structure_from_filesystem and similar methods construct os.path.join(base_path, path) without validating that the result stays within the repository boundary. A caller could use path="../../etc/passwd" to escape the repo.

Required fixes:

  • Normalize and validate path parameter (e.g., os.path.normpath(), then verify it's repo-relative)
  • Apply same validation to file_path in get_file_content (line 235)
  • Ensure the resolved full path is within the intended base directory (use os.path.realpath() and check containment)

Lines 308–309, 235 (and recursion at line 584).

🧰 Tools
🪛 Ruff (0.14.4)

316-316: Consider moving this statement to an else block

(TRY300)


318-318: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In app/modules/code_provider/repo_manager_wrapper.py around lines 288–338 (and
also address line 235 and recursive calls around line 584), the code joins
user-supplied paths to the repo worktree without validation, allowing directory
traversal (e.g., "../../etc"). Normalize and validate the incoming path before
using it: call os.path.normpath(path) and reject or strip paths that resolve to
parent references (leading ".." segments); then compute the absolute real path
for the target (os.path.realpath(os.path.join(worktree_path, normalized_path)))
and ensure it is contained within the worktree (use os.path.commonpath or
os.path.commonprefix / commonpath(worktree_path, resolved_target) comparison) —
if containment fails, raise a ValueError or return an appropriate error. Apply
the same validation logic in get_file_content (line ~235), in
_build_structure_from_filesystem and any recursive descent (line ~584), so every
place that forms full filesystem paths enforces normalization + realpath
containment checks before reading or listing files.

Comment on lines 75 to 138
def _validate_command_safety(command: str) -> tuple[bool, Optional[str]]:
"""
Validate that a command is safe (read-only) and doesn't attempt write operations.
Args:
command: The command string to validate
Returns:
Tuple of (is_safe, error_message)
is_safe: True if command is safe, False otherwise
error_message: Error message if command is unsafe, None if safe
"""
command_lower = command.lower().strip()

# Check for write operation patterns (redirection, in-place editing)
for pattern in DANGEROUS_PATTERNS:
if pattern in command_lower:
return (
False,
f"Command contains write operation pattern '{pattern}'. Write operations are not allowed.",
)

# Check for command injection patterns
for pattern in INJECTION_PATTERNS:
if pattern in command:
return (
False,
f"Command contains injection pattern '{pattern}'. Command chaining/substitution is not allowed.",
)

# Check if command starts with an always-blocked command
first_word = command_lower.split()[0] if command_lower.split() else ""
if first_word in ALWAYS_BLOCKED_COMMANDS:
return (
False,
f"Command '{first_word}' is not allowed. This tool only supports read-only operations.",
)

# Check for write-blocked commands with dangerous flags
for cmd, dangerous_flags in WRITE_BLOCKED_COMMANDS.items():
if command_lower.startswith(cmd):
for flag in dangerous_flags:
if flag in command_lower:
return (
False,
f"Command '{cmd}' with flag '{flag}' is not allowed. This would modify files.",
)

# Block environment variable access that might expose secrets
if command_lower.strip() == "env":
return (
False,
"The 'env' command is blocked to prevent exposure of sensitive environment variables.",
)

# Block commands that try to access parent directories
if "../" in command or ".." in command.split():
return (
False,
"Accessing parent directories is not allowed for security reasons.",
)

return (True, None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Interpreter bypass of “read‑only” enforcement; tighten safety model

The current safety checks are easy to circumvent with an interpreter while still passing _validate_command_safety. For example:

  • python -c "open('some_file', 'w').write('x')"
  • bash -c 'rm -rf .' (no ;, &&, ||, redirection, or env usage)

These will:

  • Not hit ALWAYS_BLOCKED_COMMANDS (first token is python/bash/sh, which are allowed).
  • Not contain any of the configured DANGEROUS_PATTERNS or INJECTION_PATTERNS.
  • Be executed directly in the repo worktree (and possibly without read‑only mounts if run_command_isolated ever falls back from gVisor).

To keep the “read‑only bash” guarantee realistic, I recommend:

  1. Blocking generic shells and interpreters by default (unless you have a very constrained use case):
 ALREADY_BLOCKED_COMMANDS = {
@@
-    "ssh",
+    "ssh",
@@
-    "sudo",
-    "su",
+    "sudo",
+    "su",
+    "bash",
+    "sh",
+    "zsh",
+    "python",
+    "python3",
+    "node",
+    "ruby",
+    "perl",
 }
  1. Optionally switching to an allow‑list model for first tokens (e.g. only grep, find, rg, ls, cat, head, tail, awk, sed with safe flags, etc.) and rejecting any other base command.

  2. Verifying run_command_isolated semantics to ensure the repo path is truly mounted read‑only even in fallback paths. If that helper only treats repo_path as read‑only, you may want to pass repo_path=worktree_path and use working_dir purely as the in‑container CWD.

-                result: CommandResult = run_command_isolated(
-                    command=final_command,
-                    working_dir=worktree_path,  # Mount only the worktree root (as read-only)
-                    repo_path=None,  # Don't mount separately since working_dir is the repo
+                result: CommandResult = run_command_isolated(
+                    command=final_command,
+                    working_dir=worktree_path,
+                    repo_path=worktree_path,
                     env=safe_env,
                     timeout=30,
                     use_gvisor=True,
                 )

Given how central this tool is and that it’s exposed to agents, I’d treat closing this interpreter bypass as a priority.

Also applies to: 282-445


🏁 Script executed:

#!/bin/bash
# Inspect gvisor_runner to confirm how read-only mounts are configured.
set -euo pipefail
echo "=== Searching for run_command_isolated implementation and mount semantics ==="
rg -n "def run_command_isolated" -C5
rg -n "repo_path" -C3 app/modules/utils/gvisor_runner.py || true

Length of output: 4361


Proceeding with verification of the mount semantics and current usage pattern:Now let me inspect the specific mount configurations and the bash_command_tool usage:

Let me now inspect the full mount logic and the actual bash_command_tool usage:Now let me run one more search to confirm the gvisor mount specifications:

Based on my verification, the original review comment contains valid security concerns. Let me now generate the rewritten review:

Interpreter bypass of read-only enforcement; tighten safety model

The current safety checks can be circumvented by invoking interpreters (python, bash, sh, etc.) while still passing _validate_command_safety. For example:

  • python -c "open('some_file', 'w').write('x')"
  • bash -c 'rm -rf .'

These commands bypass all configured DANGEROUS_PATTERNS and INJECTION_PATTERNS since the first token is an allowed interpreter, not a dangerous command.

Additionally, bind mounts allow containers to access the bound host filesystem tree, and changes to bind mounts are always propagated to the host and persist after the container exits. The current implementation passes repo_path=None to run_command_isolated, leaving only working_dir mounted. Without explicit read-only enforcement at the mount level, the isolation depends entirely on gVisor's mitigation, which should not be the sole defense.

To strengthen the read-only guarantee:

  1. Block generic shells and interpreters by default (or switch to an allow-list model for safe commands only):
 ALWAYS_BLOCKED_COMMANDS = {
     "ssh",
     "sudo",
     "su",
+    "bash",
+    "sh",
+    "zsh",
+    "python",
+    "python3",
+    "node",
+    "ruby",
+    "perl",
 }
  1. Ensure repo path is always mounted read-only by passing both working_dir and repo_path to run_command_isolated:
-                result: CommandResult = run_command_isolated(
-                    command=final_command,
-                    working_dir=worktree_path,
-                    repo_path=None,
+                result: CommandResult = run_command_isolated(
+                    command=final_command,
+                    working_dir=worktree_path,
+                    repo_path=worktree_path,
                     env=safe_env,
                     timeout=30,
                     use_gvisor=True,
                 )

This ensures the directory is mounted with read-only protection (:ro flag) even if gVisor isolation is unavailable or misconfigured. Given how central this tool is, I'd treat closing this interpreter bypass as a priority.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In app/modules/intelligence/tools/code_query_tools/bash_command_tool.py around
lines 75 to 138, the current safety validator can be bypassed by launching
interpreters/shells (python, bash, sh, etc.) which then perform write
operations; also the runtime mounts may not enforce read-only on the repo since
repo_path is passed as None. Fix by adding interpreter names to
ALWAYS_BLOCKED_COMMANDS (or invert to an explicit allow-list of safe commands)
and detect interpreter invocations (e.g., "python -c", "sh -c", "bash -c") in
_validate_command_safety to reject them; and change the call site that invokes
run_command_isolated to always pass repo_path (and ensure it is mounted with a
read-only flag) so the repo is mounted as read-only even if gVisor isolation
fails.

Comment on lines 797 to +987
if repo_path is not None:
# Local repository detected - return the path directly without downloading tarball
logger.info(f"ParsingHelper: Using local repository at {repo_path}")
return repo_path, project_id
# Local repository detected - use the path directly without downloading tarball
repo_path_str = str(repo_path) if repo_path else None
logger.info(f"ParsingHelper: Using local repository at {repo_path_str}")

local_repo = None
latest_commit_sha = None
repo_metadata = {}

# Get commit SHA for local repo
try:
local_repo = Repo(repo_path_str)
if commit_id:
latest_commit_sha = commit_id
elif branch:
try:
local_repo.git.checkout(branch)
latest_commit_sha = local_repo.head.commit.hexsha
except GitCommandError:
# Branch might not exist, use current HEAD
latest_commit_sha = local_repo.head.commit.hexsha
else:
latest_commit_sha = local_repo.head.commit.hexsha
except Exception as e:
logger.warning(f"Could not get commit SHA for local repo: {e}")

# Extract metadata for local repo
if local_repo:
try:
repo_metadata = ParseHelper.extract_repository_metadata(local_repo)
repo_metadata["error_message"] = None
project_metadata = json.dumps(repo_metadata).encode("utf-8")
ProjectService.update_project(
self.db,
project_id,
properties=project_metadata,
commit_id=latest_commit_sha,
status=ProjectStatusEnum.CLONED.value,
)
except Exception as e:
logger.warning(f"Could not extract metadata for local repo: {e}")

# Copy local repo to .repos if repo manager is enabled
if self.repo_manager and repo_path_str and os.path.exists(repo_path_str):
try:
await self._copy_repo_to_repo_manager(
normalized_full_name,
repo_path_str,
branch,
latest_commit_sha,
user_id,
repo_metadata,
)
except Exception as e:
logger.warning(
f"Failed to copy local repo to repo manager: {e}. Continuing with parsing."
)

return repo_path_str, project_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Local‑repo handling now mutates the caller’s repo via git checkout; this is risky.

The new local‑repo path in setup_project_directory plus _copy_repo_to_repo_manager / worktree helpers are generally well‑structured and integrate cleanly with RepoManager. However, one behavior is risky:

  • In the local‑repo branch you do:
    local_repo = Repo(repo_path_str)
    if commit_id:
        latest_commit_sha = commit_id
    elif branch:
        try:
            local_repo.git.checkout(branch)
            latest_commit_sha = local_repo.head.commit.hexsha
        except GitCommandError:
            latest_commit_sha = local_repo.head.commit.hexsha
    This actively checks out branch in the caller’s working tree, potentially changing their local repo state (current branch, working‑tree files) just to compute a commit SHA and metadata.

For developer‑owned paths (e.g., a user pointing at their local clone), this side effect can be surprising or destructive.

Consider instead:

  • Deriving the commit without changing HEAD, e.g.:
    if commit_id:
        latest_commit_sha = commit_id
    elif branch:
        try:
            latest_commit_sha = local_repo.git.rev_parse(branch).strip()
        except GitCommandError:
            latest_commit_sha = local_repo.head.commit.hexsha
    else:
        latest_commit_sha = local_repo.head.commit.hexsha
  • And not calling git.checkout on the local clone at all.

The rest of the RepoManager mirroring (_initialize_base_repo, _create_worktree) operates on copies under .repos, so you already have a safe place to materialize specific branches/commits without touching the original working tree.

Also applies to: 1062-1077, 1080-1295



---

🏁 Script executed:

```shell
#!/bin/bash
# Extract the full context around lines 929-987, 1062-1077, 1080-1295
cat app/modules/parsing/graph_construction/parsing_helper.py | sed -n '900,1300p' | head -400

Length of output: 16744


🏁 Script executed:

#!/bin/bash
# Search for all git.checkout calls in the file
rg -n 'git\.checkout|Repo\(' app/modules/parsing/graph_construction/parsing_helper.py -A 2 -B 2

Length of output: 1679


🏁 Script executed:

#!/bin/bash
# Look for _copy_repo_to_repo_manager and related worktree helpers
rg -n '_copy_repo_to_repo_manager|_initialize_base_repo|_create_worktree' app/modules/parsing/graph_construction/parsing_helper.py -A 5 -B 2

Length of output: 2104


🏁 Script executed:

#!/bin/bash
# Check imports to understand GitPython usage
head -50 app/modules/parsing/graph_construction/parsing_helper.py | grep -E 'import|from'

Length of output: 766


🏁 Script executed:

#!/bin/bash
# Find if local_repo is used elsewhere and how it's initialized
rg -n 'Repo\(repo_path' app/modules/parsing/graph_construction/parsing_helper.py -A 3 -B 1

Length of output: 194



---

🌐 Web query:

```web
GitPython git.rev_parse method usage for branch commit SHA

💡 Result:

Short answer — use Repo.rev_parse(...) and read the Commit object's hexsha. Example:

  • Resolve a branch name to the commit SHA (full 40-char hex):
    repo = git.Repo("path/to/repo")
    sha = repo.rev_parse("branch-name").hexsha

  • You can also pass explicit ref names:
    sha = repo.rev_parse("refs/heads/branch-name").hexsha
    sha = repo.rev_parse("origin/branch-name").hexsha # remote ref

Notes:

  • rev_parse returns a git object (Commit, Tag, Tree, etc.); .hexsha gives the commit id. [1]
  • rev_parse will raise gitdb.exc.BadObject / ValueError if the name is invalid, so catch exceptions if needed. [1][2]

References:
[1] GitPython API — Repo.rev_parse / return types.
[2] Example usage in projects / Q&A showing .rev_parse(...).hexsha usage.


Replace git.checkout with git.rev_parse to avoid mutating the caller's repository.

The code at line 945 directly mutates the caller's working tree via local_repo.git.checkout(branch). This side effect is risky for developer-owned paths and occurs before the safe mirroring to .repos.

Instead, use:

if commit_id:
    latest_commit_sha = commit_id
elif branch:
    try:
        latest_commit_sha = local_repo.rev_parse(branch).hexsha
    except Exception:
        latest_commit_sha = local_repo.head.commit.hexsha
else:
    latest_commit_sha = local_repo.head.commit.hexsha

This same pattern applies to the remote repository case at lines 994 and 997, where repo_details.git.checkout() should also be replaced with repo_details.rev_parse() to avoid unnecessary HEAD mutation.

🧰 Tools
🪛 Ruff (0.14.4)

952-952: Do not catch blind exception: Exception

(BLE001)


968-968: Do not catch blind exception: Exception

(BLE001)


982-982: Do not catch blind exception: Exception

(BLE001)

Comment on lines 153 to 197
def verify_checksum(file_path: Path, checksum_url: str) -> bool:
"""
Verify file checksum.
Args:
file_path: Path to the file to verify
checksum_url: URL to the checksum file
Returns:
True if checksum matches, False otherwise
"""
try:
# Download checksum file
checksum_path = file_path.parent / f"{file_path.name}.sha512"
if not download_file(checksum_url, checksum_path):
logger.warning("Failed to download checksum, skipping verification")
return True # Continue anyway

# Read expected checksum
with open(checksum_path, "r") as f:
expected_checksum = f.read().split()[0]

# Calculate actual checksum
import hashlib

sha512 = hashlib.sha512()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
sha512.update(chunk)
actual_checksum = sha512.hexdigest()

# Clean up checksum file
checksum_path.unlink()

if expected_checksum == actual_checksum:
logger.info("Checksum verification passed")
return True
else:
logger.error("Checksum verification failed")
return False

except Exception as e:
logger.warning(f"Error verifying checksum: {e}, continuing anyway")
return True # Continue anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Checksum handling should fail closed instead of silently accepting unverified binaries

verify_checksum currently treats several failure modes as “OK”:

  • If the checksum download fails (download_file returns False), it logs a warning and returns True (Lines 167‑170).
  • If any exception is raised during verification, it logs a warning and returns True (Lines 194‑197).

Given this script installs a security boundary binary (runsc) that will be used to isolate untrusted code, silently proceeding without a valid checksum undermines the integrity check. It’s safer to fail the installation and let the rest of the system fall back to “no gVisor available” than to run with a potentially corrupted or tampered binary.

Consider treating checksum download/verification failures as hard failures:

-        if not download_file(checksum_url, checksum_path):
-            logger.warning("Failed to download checksum, skipping verification")
-            return True  # Continue anyway
+        if not download_file(checksum_url, checksum_path):
+            logger.warning("Failed to download checksum; aborting installation")
+            return False
@@
-    except Exception as e:
-        logger.warning(f"Error verifying checksum: {e}, continuing anyway")
-        return True  # Continue anyway
+    except Exception as e:
+        logger.warning(f"Error verifying checksum: {e}; aborting installation")
+        return False

This way install_gvisor will bail out cleanly on any integrity issues and keep runsc disabled instead of silently accepting an unverified binary.

Also applies to: 199-247


I'll verify the review comment by examining the actual code and its context.


---

🏁 Script executed:

```shell
#!/bin/bash
# Check the actual code at lines 153-197 in install_gvisor.py
cat app/modules/utils/install_gvisor.py | head -200 | tail -50

Length of output: 1470


🏁 Script executed:

#!/bin/bash
# Find all usages of verify_checksum function
rg -n "verify_checksum" app/modules/utils/install_gvisor.py -B2 -A2

Length of output: 315


🏁 Script executed:

#!/bin/bash
# Check install_gvisor function to see how it handles the return value
rg -n "def install_gvisor" app/modules/utils/install_gvisor.py -A50

Length of output: 1773


🏁 Script executed:

#!/bin/bash
# Look for any fallback behavior or error handling around gvisor
rg -n "gVisor\|runsc" app/modules/utils/install_gvisor.py -B1 -A1

Length of output: 0



---

🏁 Script executed:

```shell
#!/bin/bash
# Get full install_gvisor function (lines 199 onwards)
rg -n "def install_gvisor" app/modules/utils/install_gvisor.py -A80

Length of output: 2781


🏁 Script executed:

#!/bin/bash
# Find callers of install_gvisor
rg -n "install_gvisor\(" --type py -B3 -A3

Length of output: 1255


🏁 Script executed:

#!/bin/bash
# Check what happens after verify_checksum succeeds - lines 248-260
sed -n '248,265p' app/modules/utils/install_gvisor.py

Length of output: 559


Checksum handling should fail closed instead of silently accepting unverified binaries

verify_checksum currently treats several failure modes as "OK":

  • If the checksum download fails (download_file returns False), it logs a warning and returns True (Lines 167‑170).
  • If any exception is raised during verification, it logs a warning and returns True (Lines 194‑197).

When verify_checksum returns True (including these failure cases), install_gvisor proceeds to chmod the binary to 0o755, move it to its final location, and return success (Lines 249‑262). This means an unverified binary is installed and becomes executable.

Given that runsc is a security boundary binary used to isolate untrusted code, silently accepting an unverified binary undermines the integrity check and creates a security risk. It is safer to fail the installation and allow the system to operate without gVisor than to run with a potentially corrupted or tampered binary.

Consider treating checksum download/verification failures as hard failures:

-        if not download_file(checksum_url, checksum_path):
-            logger.warning("Failed to download checksum, skipping verification")
-            return True  # Continue anyway
+        if not download_file(checksum_url, checksum_path):
+            logger.warning("Failed to download checksum; aborting installation")
+            return False
@@
-    except Exception as e:
-        logger.warning(f"Error verifying checksum: {e}, continuing anyway")
-        return True  # Continue anyway
+    except Exception as e:
+        logger.warning(f"Error verifying checksum: {e}; aborting installation")
+        return False

This way install_gvisor will cleanly return False on any integrity issues (Lines 244‑247), prevent the binary from being installed, and allow the system to fall back to operating without gVisor instead of with a potentially compromised isolation mechanism.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def verify_checksum(file_path: Path, checksum_url: str) -> bool:
"""
Verify file checksum.
Args:
file_path: Path to the file to verify
checksum_url: URL to the checksum file
Returns:
True if checksum matches, False otherwise
"""
try:
# Download checksum file
checksum_path = file_path.parent / f"{file_path.name}.sha512"
if not download_file(checksum_url, checksum_path):
logger.warning("Failed to download checksum, skipping verification")
return True # Continue anyway
# Read expected checksum
with open(checksum_path, "r") as f:
expected_checksum = f.read().split()[0]
# Calculate actual checksum
import hashlib
sha512 = hashlib.sha512()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
sha512.update(chunk)
actual_checksum = sha512.hexdigest()
# Clean up checksum file
checksum_path.unlink()
if expected_checksum == actual_checksum:
logger.info("Checksum verification passed")
return True
else:
logger.error("Checksum verification failed")
return False
except Exception as e:
logger.warning(f"Error verifying checksum: {e}, continuing anyway")
return True # Continue anyway
def verify_checksum(file_path: Path, checksum_url: str) -> bool:
"""
Verify file checksum.
Args:
file_path: Path to the file to verify
checksum_url: URL to the checksum file
Returns:
True if checksum matches, False otherwise
"""
try:
# Download checksum file
checksum_path = file_path.parent / f"{file_path.name}.sha512"
if not download_file(checksum_url, checksum_path):
logger.warning("Failed to download checksum; aborting installation")
return False
# Read expected checksum
with open(checksum_path, "r") as f:
expected_checksum = f.read().split()[0]
# Calculate actual checksum
import hashlib
sha512 = hashlib.sha512()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
sha512.update(chunk)
actual_checksum = sha512.hexdigest()
# Clean up checksum file
checksum_path.unlink()
if expected_checksum == actual_checksum:
logger.info("Checksum verification passed")
return True
else:
logger.error("Checksum verification failed")
return False
except Exception as e:
logger.warning(f"Error verifying checksum: {e}; aborting installation")
return False
🧰 Tools
🪛 Ruff (0.14.4)

194-194: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In app/modules/utils/install_gvisor.py around lines 153 to 197, the checksum
routine currently treats download failures and exceptions as success (returning
True) which allows installing unverified binaries; change it to fail closed by
returning False on checksum download failure and on any exception, log an
error-level message with context, and ensure cleanup of the temporary checksum
file only when present; in short: if download_file(...) returns False ->
logger.error(...) and return False; if any exception occurs -> logger.error(...)
and return False; keep successful verification flow unchanged.

Comment on lines 30 to 37
```python
from app.modules.utils.gvisor_runner import run_command_isolated

result = run_command_isolated(
command=["ls", "-la"],
working_dir="/path/to/repo",
repo_path="/.repos/repo-name"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify repo_path example to better match typical .repos layout

The example uses repo_path="/.repos/repo-name", which suggests a top-level filesystem directory and may not match where your repos are actually stored (often something like /path/to/project/.repos/repo-name or whatever RepoManager config uses).

Consider tweaking the example to make that clearer, e.g.:

-    working_dir="/path/to/repo",
-    repo_path="/.repos/repo-name"
+    working_dir="/path/to/repo",
+    repo_path="/path/to/project/.repos/repo-name"

or add a short note that repo_path should be the real host path to the checked-out repo/worktree.

🤖 Prompt for AI Agents
In docs/gvisor_quickstart.md around lines 30 to 37, the repo_path example points
to a misleading top-level path ("/.repos/repo-name"); update the example to show
a realistic host filesystem path that includes the project root (e.g.
"/path/to/project/.repos/repo-name") or add a one-line note stating that
repo_path must be the actual host path to the checked-out repo/worktree
(matching RepoManager config).

Comment on lines 32 to 37
# Create temporary directory
TMPDIR=$(mktemp -d)
cd "$TMPDIR"

echo "Downloading gVisor runsc..."
URL="https://storage.googleapis.com/gvisor/releases/release/latest/${ARCH}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Avoid hard failure on best-effort Docker VM copy and be careful overwriting daemon.json

Two things to tighten up under set -e:

  1. The best-effort container copy can currently abort the script:

    • docker run ... gvisor-installer ... || true (Line 82) is guarded, but docker cp runsc gvisor-installer:/usr/local/bin/runsc (Line 85) is not.
    • If the container fails to start (image pull error, no network, etc.), docker cp will fail and terminate the script before the local ~/.local/bin/runsc install runs.
    • Add || true (or a container-exists check) to the docker cp call to keep this path best-effort.
  2. Overwriting Docker’s daemon configuration is potentially disruptive:

    • The cat > "$DAEMON_JSON" block (Lines 128‑137) replaces any existing settings with a runtimes-only JSON, which can silently drop other configuration (e.g., registries, proxies).
    • You do create a backup, but it’s easy for users to miss this side effect. Consider either:
      • Merging in the runsc runtime into existing JSON (e.g., via jq), or
      • At minimum, documenting clearly that existing Docker Engine settings will be replaced and how to restore from the backup.

Also consider using cd "$TMPDIR" || exit 1 and cd - > /dev/null || exit 1 to satisfy Shellcheck and handle unexpected cd failures.

- cd "$TMPDIR"
+ cd "$TMPDIR" || exit 1
@@
- docker cp runsc gvisor-installer:/usr/local/bin/runsc
+ docker cp runsc gvisor-installer:/usr/local/bin/runsc || true
@@
-cd - > /dev/null
+cd - > /dev/null || exit 1

Also applies to: 81-99, 118-137, 160-162


🏁 Script executed:

#!/bin/bash
# Retrieve the setup_gvisor_docker.sh script and check for set -e, docker commands, and daemon.json operations
cat scripts/setup_gvisor_docker.sh | head -100

Length of output: 2945


🏁 Script executed:

#!/bin/bash
# Check lines around 82-85 for docker run and docker cp
sed -n '75,95p' scripts/setup_gvisor_docker.sh

Length of output: 745


🏁 Script executed:

#!/bin/bash
# Check lines around 128-137 for daemon.json operations
sed -n '120,145p' scripts/setup_gvisor_docker.sh

Length of output: 642


🏁 Script executed:

#!/bin/bash
# Check lines around 160-162 for cd command
sed -n '155,165p' scripts/setup_gvisor_docker.sh

Length of output: 249


🏁 Script executed:

#!/bin/bash
# Search for 'set -e' or similar options at the start
head -20 scripts/setup_gvisor_docker.sh

Length of output: 446


Verify the suggested fixes to unguarded docker cp call, daemon.json overwrite behavior, and cd error handling

The review comment identifies three legitimate issues in the script, all of which are confirmed:

  1. Line 84: Unguarded docker cp — The docker cp runsc gvisor-installer:/usr/local/bin/runsc call is not guarded with || true, despite being intended as best-effort. Under set -e, if the container fails to start (line 82 guarded with || true), the subsequent docker cp will still execute, fail, and abort the script before the fallback local install path runs.

  2. Lines 128–137: Destructive daemon.json overwrite — The cat > "$DAEMON_JSON" unconditionally replaces the entire file with a minimal runtimes-only config. While a timestamped backup is created, existing Docker settings (registries, proxies, log drivers, etc.) are silently lost.

  3. Line 160: Unguarded cd - — The cd - > /dev/null lacks error handling and will not catch unexpected cd failures under set -e.

The suggested fixes are appropriate:

  • Add || true to the docker cp line (line 84)
  • Consider merging the runsc runtime into existing daemon.json (e.g., with jq) instead of overwriting, or at minimum document this clearly
  • Add || exit 1 to both cd "$TMPDIR" (line 34) and cd - (line 160)
🤖 Prompt for AI Agents
In scripts/setup_gvisor_docker.sh around lines 32 to 37 (and referencing line
84, lines 128–137, and line 160), the script currently: performs an unguarded
docker cp that can fail under set -e; unconditionally overwrites
/etc/docker/daemon.json losing existing config; and uses cd without checking
failure. Fix by adding a permissive failure operator to the docker cp call
(append "|| true" so the script continues if the copy fails); replace the
destructive cat > "$DAEMON_JSON" approach with a merge strategy (use jq to read
existing daemon.json, inject or update the runsc runtime entry, and write back
safely, keeping other settings and making a timestamped backup), or if jq is
unavailable clearly document the destructive behavior and require explicit
confirmation; and add error handling to directory changes (append "|| exit 1" to
cd "$TMPDIR" and to cd - so the script fails with a clear error if directory
changes fail).

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
3 Security Hotspots
3.2% Duplication on New Code (required ≤ 3%)
D Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (5)
app/modules/code_provider/github/github_service.py (1)

243-252: Critical: This uid update logic was already flagged in a previous review.

The concerns raised in the earlier review remain unaddressed:

  1. Data integrity risk: Modifying the primary key can break foreign key relationships to projects, conversations, prompts, preferences, and custom_agents.
  2. Security concern: No authorization check before updating uid based on email match.
  3. Race condition: Concurrent requests could cause conflicts.
  4. Transaction isolation: Immediate commit outside broader transaction context.

Please refer to the previous review comment for detailed analysis and recommended alternatives (either remove this logic entirely or implement as a separate admin operation with proper safeguards).

app/modules/parsing/graph_construction/parsing_helper.py (1)

929-987: Avoid git.checkout on caller-owned local repositories

In the local-repo path you create a fresh Repo and then run:

elif branch:
    try:
        local_repo.git.checkout(branch)
        latest_commit_sha = local_repo.head.commit.hexsha
    except GitCommandError:
        latest_commit_sha = local_repo.head.commit.hexsha

This mutates the caller’s working tree (current branch and files) just to discover a commit SHA, which is risky for developer-owned clones and repeats a previously flagged pattern. A safer approach is to resolve the ref without changing HEAD, e.g.:

-            try:
-                local_repo = Repo(repo_path_str)
-                if commit_id:
-                    latest_commit_sha = commit_id
-                elif branch:
-                    try:
-                        local_repo.git.checkout(branch)
-                        latest_commit_sha = local_repo.head.commit.hexsha
-                    except GitCommandError:
-                        # Branch might not exist, use current HEAD
-                        latest_commit_sha = local_repo.head.commit.hexsha
-                else:
-                    latest_commit_sha = local_repo.head.commit.hexsha
-            except Exception as e:
-                logger.warning(f"Could not get commit SHA for local repo: {e}")
+            try:
+                local_repo = Repo(repo_path_str)
+                if commit_id:
+                    latest_commit_sha = commit_id
+                elif branch:
+                    try:
+                        latest_commit_sha = local_repo.rev_parse(branch).hexsha
+                    except GitCommandError:
+                        # Branch might not exist, use current HEAD
+                        latest_commit_sha = local_repo.head.commit.hexsha
+                else:
+                    latest_commit_sha = local_repo.head.commit.hexsha
+            except GitCommandError as e:
+                logger.warning(
+                    f"Could not resolve commit for local repo {repo_path_str}: {e}"
+                )

This keeps the caller’s checkout intact and narrows the exception type instead of catching bare Exception (BLE001). The same concern was raised in an earlier review for similar code in this file.

app/modules/parsing/graph_construction/parsing_controller.py (1)

222-260: READY projects will be re-parsed on every call when RepoManager is disabled

Here:

repo_exists_and_indexed = await ParsingController._check_repo_indexed(...)
if not repo_exists_and_indexed:
    # re-submit parsing to create LSP index

_check_repo_indexed explicitly returns False when REPO_MANAGER_ENABLED is not "true", so for READY projects with up-to-date commits you’ll still re-submit parsing on every parse_directory call whenever RepoManager/LSP is off. This was already flagged in an earlier review.

Two concrete fixes:

  1. Gate the index check on the env flag and keep READY projects stable when RepoManager is disabled:
-                # Check if repo exists in .repos and is LSP indexed
-                repo_exists_and_indexed = await ParsingController._check_repo_indexed(
-                    project_id,
-                    repo_details.repo_name,
-                    repo_details.branch_name,
-                    repo_details.commit_id,
-                    db,
-                )
-
-                if not repo_exists_and_indexed:
+                repo_manager_enabled = (
+                    os.getenv("REPO_MANAGER_ENABLED", "false").lower() == "true"
+                )
+
+                if repo_manager_enabled:
+                    repo_exists_and_indexed = (
+                        await ParsingController._check_repo_indexed(
+                            project_id,
+                            normalized_repo_name,
+                            repo_details.branch_name,
+                            repo_details.commit_id,
+                            db,
+                        )
+                    )
+
+                    if not repo_exists_and_indexed:
                         ...
                         return {
                             "project_id": project_id,
                             "status": ProjectStatusEnum.SUBMITTED.value,
                         }
-
-                return {"project_id": project_id, "status": project.status}
+                return {"project_id": project_id, "status": project.status}
  1. Use the canonical/normalized repo name (normalized_repo_name) when talking to RepoManager so you’re consistent with how repos are registered in .repos.

Otherwise, READY projects will churn parsing jobs forever in non-RepoManager deployments.

Also applies to: 279-353

app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (1)

59-105: Past validation issues remain unaddressed.

The previous review correctly identified that the validator is incomplete:

  1. It doesn't require a document (path or uri) for DOCUMENT_SYMBOL and other document-based methods
  2. It doesn't enforce non-negative constraints on line and character at the schema level
  3. Missing document checks will cause runtime errors in LspServerManager._build_params rather than failing fast at validation

The prior review provided a complete fix. Please apply it to ensure all method-specific requirements are enforced at the schema layer.

app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (1)

311-331: CRITICAL: Arbitrary file read vulnerability remains unaddressed.

The previous review correctly identified that this method will read any absolute file:// URI without verifying it's within workspace_root. Combined with LspQueryToolInput.uri (which overrides path), an attacker could access arbitrary host files (e.g., file:///etc/passwd).

The prior review provided a complete fix using Path.resolve() and relative_to() to enforce workspace boundaries. This MUST be applied before merging to prevent information disclosure.

🧹 Nitpick comments (15)
docs/gvisor_mac_setup.md (1)

113-127: Verify section clarity: "Recommendation" text could be slightly redundant.

The "Recommendation" section (lines 113–127) presents similar content across lines 115–120 and 121–126. While both perspectives are helpful, consider whether the duplication is intentional. If consolidation is desired, the dual perspective could be unified into a single decision tree.

scripts/install_lsp_servers.sh (3)

13-14: Remove unused PYTHON_BIN variable.

Line 13 defines PYTHON_BIN but it's never used—only PIP_BIN is referenced. Remove the unused variable or document its purpose if it's intended for future extensibility.

-PYTHON_BIN="$VENV_PATH/bin/python"
 PIP_BIN="$VENV_PATH/bin/pip"

140-140: Redirect error message to stderr.

Error output on line 140 should be sent to stderr for proper error stream handling:

-                        echo "   Timeout error - connection took too long."
+                        echo "   Timeout error - connection took too long." >&2

Apply the same redirection (>&2) to related error messages at lines 139, 142, and other diagnostic outputs that report failures or issues.


151-151: Remove unused DOWNLOAD_SUCCESS variable.

Line 151 sets DOWNLOAD_SUCCESS=true, but the variable is never checked. The retry loop relies on break to exit after a successful download. Either remove the variable or use it to guard the logic:

-                    DOWNLOAD_SUCCESS=true
-                    break
+                    break
requirements.txt (1)

72-74: Consider updating to current stable versions; all are free from CVEs.

Verification confirms no public CVE entries for any dependency. However, all three versions are now outdated:

  • pygls: 1.3.1 (in PR) vs. 2.0.0 (latest, Oct 17, 2025) — major version bump
  • jedi: 0.19.1 (in PR) vs. 0.19.2 (latest, Nov 11, 2024)
  • compiledb: 0.10.1 (in PR) vs. 0.10.7 (latest, Mar 23, 2025)

While these pinned versions are secure and will function correctly, consider updating to the latest stable versions (especially pygls 2.0.0, which represents significant changes) for improved compatibility and long-term maintainability.

docs/clangd_integration_plan.md (1)

185-191: Add a language tag to the fenced file-structure block

Markdownlint flags the code fence without a language. Consider using something like:

-```
+```text
 app/modules/intelligence/tools/code_query_tools/
 ├── clangd_session.py          # ClangdSession class (existing)
 ...

to satisfy MD040 and make the intent explicit.

app/modules/parsing/graph_construction/parsing_helper.py (2)

331-373: Retry logic is solid; tighten exception logging for better diagnostics

The connection-timeout retry loops for both GitBucket basic auth and standard token-based downloads look good (bounded retries, exponential backoff, async sleeps). When the final attempts fail (Lines 365-367, 415-417, 453-456), consider using logger.exception(...) so stack traces are preserved, and consistently chaining with raise ... from e when wrapping into ParsingFailedError to make root causes easier to debug.

This is non-blocking but will make investigating archive-download failures much easier.

Also applies to: 383-423, 447-456


782-785: Language-detection changes look good; verify they align with LSP language keys

The updated C/C++ classification (.c/.h"c", common C++ extensions → "cpp") and the valid_languages filter that excludes non-code categories ("markdown", "xml", "other") make sense.

Please double-check that the returned strings (e.g., "c_sharp", "cpp", "python") match whatever language keys are used when registering LSP servers in lsp_server_manager so is_workspace_indexed sees the expected languages. A mismatch here would silently disable indexing checks for some languages.

Also applies to: 830-863

app/modules/intelligence/tools/code_query_tools/clangd_helpers.py (1)

42-48: Autotools detection currently just falls back to compile_flags.txt

detect_build_system can return "autotools", but ensure_compile_commands only has explicit branches for "cmake" and "make". Right now autotools projects will skip directly to the compile_flags.txt fallback.

If you intend autotools-specific handling (e.g., ./configure + make with Bear/compiledb), you might either add a dedicated branch or clarify in the docstring that autotools is detected but not yet used for compile_commands generation.

Also applies to: 351-391

app/modules/intelligence/tools/tool_service.py (1)

47-49: Be explicit about always injecting lsp_query into tool lists

get_tools now appends "lsp_query" whenever it is configured, even if the caller did not request it in tool_names:

lsp_tool = self.tools.get("lsp_query")
if lsp_tool is not None and all(...):
    tools.append(lsp_tool)

This is convenient for making the LSP base tool universally available, but it is also a behavioral change: callers can no longer rely on get_tools returning only the requested tools.

If that’s intentional, consider documenting it at the method docstring and/or constraining it (e.g., only when a code/introspection tool is requested). If it’s not intentional, you could instead only append lsp_query when "lsp_query" is explicitly requested in tool_names.

Also applies to: 129-142, 206-214

app/modules/intelligence/tools/code_query_tools/clangd_session.py (1)

38-54: Store cache_dir or mark it intentionally unused

The constructor accepts cache_dir: Path but never uses it, which triggers ARG002 in Ruff and may confuse future readers:

def __init__(..., cache_dir: Path, config: LanguageServerConfig) -> None:
    super().__init__(session_key, workspace_root)
    # cache_dir is ignored
    self._pygls_session = PyglsClientSession(...)

If you expect to route clangd’s index dir through this later, you can simply retain it:

     def __init__(..., cache_dir: Path, config: LanguageServerConfig) -> None:
         super().__init__(session_key, workspace_root)
-        # Note: cache_dir is kept for compatibility but not used for custom caching
-        # Clangd uses its native cache in .cache/clangd/index/ via symlink
+        # Clangd uses its native cache in .cache/clangd/index/ via symlink,
+        # but keep cache_dir for future configuration or debugging.
+        self.cache_dir = cache_dir

Otherwise, consider renaming to _cache_dir or removing from the signature.

app/modules/intelligence/tools/code_query_tools/jedi_session.py (1)

247-260: Tighten silent exception handling and use get_running_loop for async indexing

Two small robustness tweaks:

  1. In get_workspace_symbols, errors while loading the workspace cache are silently swallowed:
if cache_file.exists():
    try:
        with open(cache_file) as f:
            cached_symbols = json.load(f)
    except Exception:
        pass

Silencing all exceptions makes it hard to notice corrupted caches. Consider at least logging at debug level:

-    except Exception:
-        pass
+    except Exception as exc:
+        logger.debug("Failed to load workspace symbols cache: %s", exc)
  1. In index_workspace, you call asyncio.get_event_loop() inside an async method:
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, self._jedi_cache.index_workspace, python_files)

On modern Python this is better written as:

-        loop = asyncio.get_event_loop()
+        loop = asyncio.get_running_loop()

to avoid surprises with loop policy changes.

Also applies to: 382-388

app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (1)

326-326: Consider unpacking over concatenation.

Apply this diff for a more idiomatic approach:

-final_command = [resolved_command[0]] + final_flags
+final_command = [resolved_command[0], *final_flags]
app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (2)

982-982: Remove unused method parameter.

The config parameter in _warmup_session is never used within the method. Consider removing it or documenting why it's kept for future use.

 async def _warmup_session(
     self,
     request: LspQueryRequest,
     workspace_root: str,
     session: BaseLspServerSession,
-    config: LanguageServerConfig,
     status_messages: List[str],
 ) -> None:

2009-2009: Remove redundant exception object in logging.exception calls.

logging.exception() automatically includes exception info, so passing exc is redundant:

-logger.exception(
-    "[LSP_MANAGER] Failed to execute %s for %s: %s",
-    request.method.value,
-    request.language,
-    exc,
-)
+logger.exception(
+    "[LSP_MANAGER] Failed to execute %s for %s",
+    request.method.value,
+    request.language,
+)

Apply the same fix at line 2111.

Also applies to: 2111-2111

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9309134 and 991327c.

📒 Files selected for processing (19)
  • app/modules/code_provider/github/github_service.py (5 hunks)
  • app/modules/intelligence/agents/chat_agents/tool_helpers.py (5 hunks)
  • app/modules/intelligence/tools/code_query_tools/clangd_helpers.py (1 hunks)
  • app/modules/intelligence/tools/code_query_tools/clangd_session.py (1 hunks)
  • app/modules/intelligence/tools/code_query_tools/jedi_session.py (1 hunks)
  • app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (1 hunks)
  • app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (1 hunks)
  • app/modules/intelligence/tools/code_query_tools/lsp_session.py (1 hunks)
  • app/modules/intelligence/tools/jira_tools/get_jira_project_users_tool.py (2 hunks)
  • app/modules/intelligence/tools/tool_service.py (4 hunks)
  • app/modules/parsing/graph_construction/parsing_controller.py (3 hunks)
  • app/modules/parsing/graph_construction/parsing_helper.py (6 hunks)
  • app/modules/projects/projects_service.py (6 hunks)
  • app/modules/utils/gvisor_runner.py (1 hunks)
  • docs/clangd_integration_plan.md (1 hunks)
  • docs/gvisor_mac_setup.md (4 hunks)
  • docs/gvisor_setup.md (4 hunks)
  • requirements.txt (1 hunks)
  • scripts/install_lsp_servers.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/gvisor_setup.md
  • app/modules/projects/projects_service.py
🧰 Additional context used
🧬 Code graph analysis (6)
app/modules/intelligence/tools/tool_service.py (2)
app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (1)
  • lsp_query_tool (467-491)
app/modules/intelligence/tools/web_tools/code_provider_tool.py (1)
  • code_provider_tool (227-256)
app/modules/parsing/graph_construction/parsing_controller.py (4)
app/modules/projects/projects_schema.py (1)
  • ProjectStatusEnum (6-12)
app/modules/repo_manager/repo_manager.py (1)
  • RepoManager (21-489)
app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (1)
  • is_workspace_indexed (336-390)
app/modules/parsing/graph_construction/parsing_helper.py (1)
  • ParseHelper (34-1495)
app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (6)
app/modules/intelligence/tools/code_query_tools/lsp_types.py (1)
  • LspQueryRequest (146-162)
app/modules/intelligence/tools/code_query_tools/pygls_client_session.py (2)
  • LanguageServerConfig (30-41)
  • from_command_string (40-41)
app/modules/projects/projects_service.py (1)
  • get_project_from_db_by_id_sync (282-295)
app/modules/repo_manager/repo_manager.py (1)
  • RepoManager (21-489)
app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (4)
  • get_lsp_server_manager (50-56)
  • register_language (93-99)
  • is_language_registered (101-102)
  • execute_query (1907-2022)
tests/conftest.py (1)
  • get (142-150)
app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (6)
app/modules/intelligence/tools/code_query_tools/lsp_session.py (5)
  • BaseLspServerSession (22-44)
  • SessionKey (15-19)
  • ensure_started (34-35)
  • send_notification (40-41)
  • send_request (37-38)
app/modules/intelligence/tools/code_query_tools/lsp_types.py (11)
  • HoverResult (69-112)
  • LspMethod (18-25)
  • LspQueryRequest (146-162)
  • LspQueryResponse (165-188)
  • Location (45-57)
  • Position (28-36)
  • SymbolInformation (115-143)
  • from_lsp (30-31)
  • from_lsp (53-57)
  • from_lsp (83-112)
  • from_lsp (117-134)
app/modules/intelligence/tools/code_query_tools/pygls_client_session.py (6)
  • LanguageServerConfig (30-41)
  • PyglsClientSession (44-370)
  • ensure_started (62-228)
  • send_notification (349-354)
  • send_request (342-347)
  • wait_for_indexing_complete (230-340)
app/modules/intelligence/tools/code_query_tools/jedi_session.py (6)
  • JediSession (319-388)
  • ensure_started (334-336)
  • send_notification (372-376)
  • index_workspace (274-316)
  • index_workspace (382-388)
  • send_request (338-370)
app/modules/intelligence/tools/code_query_tools/clangd_session.py (4)
  • ClangdSession (29-79)
  • ensure_started (55-58)
  • send_notification (70-74)
  • send_request (60-68)
app/modules/intelligence/tools/code_query_tools/clangd_helpers.py (1)
  • ensure_compile_commands (320-403)
app/modules/parsing/graph_construction/parsing_helper.py (2)
app/modules/projects/projects_service.py (1)
  • update_project (369-381)
app/modules/projects/projects_schema.py (1)
  • ProjectStatusEnum (6-12)
app/modules/code_provider/github/github_service.py (1)
app/modules/users/user_model.py (1)
  • User (17-34)
🪛 GitHub Check: SonarCloud Code Analysis
scripts/install_lsp_servers.sh

[failure] 102-102: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIehl&open=AZqRonGawdMfc7aJIehl&pullRequest=492


[failure] 212-212: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIehw&open=AZqRonGawdMfc7aJIehw&pullRequest=492


[failure] 112-112: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIehn&open=AZqRonGawdMfc7aJIehn&pullRequest=492


[failure] 190-190: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIeht&open=AZqRonGawdMfc7aJIeht&pullRequest=492


[warning] 140-140: Redirect this error message to stderr (>&2).

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGZwdMfc7aJIehi&open=AZqRonGZwdMfc7aJIehi&pullRequest=492


[failure] 246-246: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIehz&open=AZqRonGawdMfc7aJIehz&pullRequest=492


[failure] 250-250: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIeh2&open=AZqRonGawdMfc7aJIeh2&pullRequest=492


[failure] 223-223: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIehx&open=AZqRonGawdMfc7aJIehx&pullRequest=492


[failure] 247-247: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIeh1&open=AZqRonGawdMfc7aJIeh1&pullRequest=492


[failure] 111-111: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIehm&open=AZqRonGawdMfc7aJIehm&pullRequest=492


[failure] 185-185: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIehs&open=AZqRonGawdMfc7aJIehs&pullRequest=492

app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py

[failure] 640-640: Don't use datetime.datetime.utcnow to create this datetime object.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRom-DwdMfc7aJIegD&open=AZqRom-DwdMfc7aJIegD&pullRequest=492


[failure] 977-977: Refactor this function to reduce its Cognitive Complexity from 285 to the 15 allowed.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRom-DwdMfc7aJIegM&open=AZqRom-DwdMfc7aJIegM&pullRequest=492

🪛 markdownlint-cli2 (0.18.1)
docs/clangd_integration_plan.md

185-185: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 OSV Scanner (2.2.4)
requirements.txt

[HIGH] 78-78: python-multipart 0.0.7: Denial of service (DoS) via deformation multipart/form-data boundary

(GHSA-59g5-xgcq-4qw3)

🪛 Ruff (0.14.5)
app/modules/parsing/graph_construction/parsing_controller.py

345-345: Consider moving this statement to an else block

(TRY300)


347-347: Do not catch blind exception: Exception

(BLE001)

app/modules/intelligence/tools/code_query_tools/clangd_helpers.py

101-101: Do not catch blind exception: Exception

(BLE001)


110-110: subprocess call: check for execution of untrusted input

(S603)


139-139: Do not catch blind exception: Exception

(BLE001)


146-146: Do not catch blind exception: Exception

(BLE001)


183-183: subprocess call: check for execution of untrusted input

(S603)


192-192: subprocess call: check for execution of untrusted input

(S603)


213-213: Do not catch blind exception: Exception

(BLE001)


245-245: subprocess call: check for execution of untrusted input

(S603)


266-266: Do not catch blind exception: Exception

(BLE001)


303-307: Consider iterable unpacking instead of concatenation

(RUF005)


314-314: Consider moving this statement to an else block

(TRY300)


315-315: Do not catch blind exception: Exception

(BLE001)


340-340: Local variable workspace_path is assigned to but never used

Remove assignment to unused variable workspace_path

(F841)

app/modules/intelligence/tools/code_query_tools/jedi_session.py

46-46: Consider moving this statement to an else block

(TRY300)


47-47: Do not catch blind exception: Exception

(BLE001)


67-67: Do not catch blind exception: Exception

(BLE001)


95-95: Do not catch blind exception: Exception

(BLE001)


98-98: Consider moving this statement to an else block

(TRY300)


99-99: Do not catch blind exception: Exception

(BLE001)


115-115: Do not catch blind exception: Exception

(BLE001)


140-140: Do not catch blind exception: Exception

(BLE001)


143-143: Consider moving this statement to an else block

(TRY300)


144-144: Do not catch blind exception: Exception

(BLE001)


160-160: Do not catch blind exception: Exception

(BLE001)


194-194: Do not catch blind exception: Exception

(BLE001)


197-197: Consider moving this statement to an else block

(TRY300)


198-198: Do not catch blind exception: Exception

(BLE001)


214-214: Do not catch blind exception: Exception

(BLE001)


239-239: Do not catch blind exception: Exception

(BLE001)


242-242: Consider moving this statement to an else block

(TRY300)


243-243: Do not catch blind exception: Exception

(BLE001)


259-260: try-except-pass detected, consider logging the exception

(S110)


259-259: Do not catch blind exception: Exception

(BLE001)


307-307: Do not catch blind exception: Exception

(BLE001)


315-315: Do not catch blind exception: Exception

(BLE001)


370-370: Avoid specifying long messages outside the exception class

(TRY003)

app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py

101-103: Avoid specifying long messages outside the exception class

(TRY003)


142-142: Do not catch blind exception: Exception

(BLE001)


151-151: Avoid specifying long messages outside the exception class

(TRY003)


153-155: Avoid specifying long messages outside the exception class

(TRY003)


173-175: Avoid specifying long messages outside the exception class

(TRY003)


206-206: Do not catch blind exception: Exception

(BLE001)


252-252: Loop control variable dirs not used within loop body

Rename unused dirs to _dirs

(B007)


326-326: Consider [resolved_command[0], *final_flags] instead of concatenation

Replace with [resolved_command[0], *final_flags]

(RUF005)


387-387: Do not catch blind exception: Exception

(BLE001)

app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py

65-65: Loop control variable session not used within loop body

Rename unused session to _session

(B007)


69-70: try-except-pass detected, consider logging the exception

(S110)


69-69: Do not catch blind exception: Exception

(BLE001)


111-113: Avoid specifying long messages outside the exception class

(TRY003)


138-140: Avoid specifying long messages outside the exception class

(TRY003)


151-153: Avoid specifying long messages outside the exception class

(TRY003)


159-159: Avoid specifying long messages outside the exception class

(TRY003)


328-328: Consider moving this statement to an else block

(TRY300)


386-386: Do not catch blind exception: Exception

(BLE001)


397-397: Unused method argument: opened_files

(ARG002)


451-451: Do not catch blind exception: Exception

(BLE001)


465-465: Do not catch blind exception: Exception

(BLE001)


505-505: Local variable total_time is assigned to but never used

Remove assignment to unused variable total_time

(F841)


552-553: try-except-pass detected, consider logging the exception

(S110)


552-552: Do not catch blind exception: Exception

(BLE001)


617-617: Do not catch blind exception: Exception

(BLE001)


646-647: try-except-pass detected, consider logging the exception

(S110)


646-646: Do not catch blind exception: Exception

(BLE001)


649-649: Consider moving this statement to an else block

(TRY300)


654-654: Do not catch blind exception: Exception

(BLE001)


726-726: Do not catch blind exception: Exception

(BLE001)


730-731: try-except-pass detected, consider logging the exception

(S110)


730-730: Do not catch blind exception: Exception

(BLE001)


744-744: Do not catch blind exception: Exception

(BLE001)


773-773: Do not catch blind exception: Exception

(BLE001)


777-778: try-except-pass detected, consider logging the exception

(S110)


777-777: Do not catch blind exception: Exception

(BLE001)


797-797: Do not catch blind exception: Exception

(BLE001)


830-830: Do not catch blind exception: Exception

(BLE001)


982-982: Unused method argument: config

(ARG002)


1116-1116: Do not catch blind exception: Exception

(BLE001)


1163-1163: Do not catch blind exception: Exception

(BLE001)


1204-1204: Loop control variable file_path not used within loop body

(B007)


1218-1219: try-except-pass detected, consider logging the exception

(S110)


1218-1218: Do not catch blind exception: Exception

(BLE001)


1253-1253: Do not catch blind exception: Exception

(BLE001)


1282-1283: try-except-pass detected, consider logging the exception

(S110)


1282-1282: Do not catch blind exception: Exception

(BLE001)


1354-1354: Do not catch blind exception: Exception

(BLE001)


1372-1372: Loop control variable file_path not used within loop body

(B007)


1383-1384: try-except-pass detected, consider logging the exception

(S110)


1383-1383: Do not catch blind exception: Exception

(BLE001)


1449-1449: Do not catch blind exception: Exception

(BLE001)


1518-1519: try-except-pass detected, consider logging the exception

(S110)


1518-1518: Do not catch blind exception: Exception

(BLE001)


1672-1672: Do not catch blind exception: Exception

(BLE001)


1777-1778: try-except-pass detected, consider logging the exception

(S110)


1777-1777: Do not catch blind exception: Exception

(BLE001)


1849-1850: try-except-pass detected, consider logging the exception

(S110)


1849-1849: Do not catch blind exception: Exception

(BLE001)


1904-1904: Do not catch blind exception: Exception

(BLE001)


2003-2003: Consider moving this statement to an else block

(TRY300)


2009-2009: Redundant exception object included in logging.exception call

(TRY401)


2111-2111: Redundant exception object included in logging.exception call

(TRY401)

app/modules/parsing/graph_construction/parsing_helper.py

365-367: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


371-373: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


371-373: Avoid specifying long messages outside the exception class

(TRY003)


415-417: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


421-423: Abstract raise to an inner function

(TRY301)


421-423: Avoid specifying long messages outside the exception class

(TRY003)


453-456: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


952-952: Do not catch blind exception: Exception

(BLE001)


968-968: Do not catch blind exception: Exception

(BLE001)


982-982: Do not catch blind exception: Exception

(BLE001)

app/modules/intelligence/tools/code_query_tools/clangd_session.py

42-42: Unused method argument: cache_dir

(ARG002)

🪛 Shellcheck (0.11.0)
scripts/install_lsp_servers.sh

[warning] 13-13: PYTHON_BIN appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 151-151: DOWNLOAD_SUCCESS appears unused. Verify use (or export if used externally).

(SC2034)

🔇 Additional comments (32)
docs/gvisor_mac_setup.md (2)

106-111: Table reformatting improves readability and alignment.

The updated table structure with consistent column headers and alignment makes the option comparison clearer at a glance. The content and recommendations remain intact.


1-156: Documentation content is accurate and well-structured.

The gVisor setup guide accurately reflects the constraints (gVisor is Linux-only, fallback to subprocess on Mac), provides clear options with appropriate complexity levels, and includes actionable testing examples. The formatting updates enhance readability without compromising clarity or completeness.

app/modules/code_provider/github/github_service.py (3)

234-234: LGTM! Backward-compatible signature change.

The addition of the optional user_email parameter maintains backward compatibility while enabling email-based user resolution as a fallback.


500-501: LGTM! Proper exception handling.

Explicitly re-raising HTTPException preserves the original status code and error detail, preventing them from being wrapped in a generic 500 error. This improves error clarity for API consumers.


513-515: LGTM! Consistent parameter propagation.

The signature change and parameter forwarding are consistent with the get_repos_for_user update. The changes maintain backward compatibility while enabling email-based user resolution throughout the call chain.

Also applies to: 550-552

app/modules/intelligence/tools/jira_tools/get_jira_project_users_tool.py (1)

33-33: No issues found — the type change is properly coordinated.

The user_id parameter change from int to str has been correctly implemented:

  • Downstream functions accept str: Both check_jira_integration_exists and get_jira_client_for_user in jira_client.py have signatures (user_id: str, db: Session).
  • Caller updated: ToolService passes self.user_id which is typed as str (line 103, tool_service.py).
  • Consistent pattern: All tool factories throughout the codebase uniformly use user_id: str, with no type mismatches.
requirements.txt (1)

80-83: Documentation comment provides helpful context for LSP server setup.

The added note clearly documents that LSP servers require separate installation and directs users to the installation script and respective package managers. This is helpful for contributors unfamiliar with LSP infrastructure.

However, please verify that scripts/install_lsp_servers.sh exists and contains instructions for all mentioned servers (pyright, OmniSharp, clangd, gopls, rust-analyzer).

app/modules/intelligence/agents/chat_agents/tool_helpers.py (5)

3-9: LGTM!

The LSP tool labels dictionary is well-structured with clear, user-friendly descriptions for each LSP operation.


125-126: LGTM!

The guard pattern correctly handles LSP tools and provides appropriate starting messages.


241-242: LGTM!

The completion message for LSP tools is clear and consistent with the starting message pattern.


630-666: LSP result structures verified as correct.

The verification confirms that LSP tool implementations return results matching the expected structure. The LspQueryResponse includes success, method, status_messages, locations, hover, symbols, and error fields, and the code in lines 630-666 correctly handles all these fields with appropriate defensive checks and data extraction.


459-472: Approve — LSP argument structure verified. LspQueryToolInput (app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py) defines project_id, language, uri/path, line and character, and LspQueryTool._execute_async/_run/_arun accept and validate those fields (position validation requires both line and character for DEFINITION/REFERENCES/HOVER), so the formatting in app/modules/intelligence/agents/chat_agents/tool_helpers.py (lines 459–472) correctly matches the actual LSP tool arguments.

app/modules/parsing/graph_construction/parsing_helper.py (1)

1063-1152: RepoManager worktree integration looks correct and self-contained

The _copy_repo_to_repo_manager, _initialize_base_repo, and _create_worktree flow (initialize/copy base repo, then create per-ref worktrees and register them) is cohesive and keeps all RepoManager-specific behavior behind a single helper.

The fallback path in _create_worktree that copies files and initializes a standalone repo when git worktree add fails is a reasonable safety net for missing refs or synthetic snapshots.

app/modules/intelligence/tools/code_query_tools/lsp_session.py (1)

14-44: Base LSP session abstraction looks clean and sufficient

SessionKey as a frozen dataclass and BaseLspServerSession with ensure_started, send_request, send_notification, and shutdown form a minimal, clear interface that other sessions (clangd, jedi, etc.) can implement consistently.

No issues from this file.

app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (8)

124-147: LGTM: Defensive initialization pattern.

The exception handling during RepoManager initialization is appropriate—it logs failures without blocking tool creation, allowing graceful degradation when repository management is unavailable.


148-156: LGTM: Proper authorization check.

The method correctly validates project ownership before allowing LSP queries, preventing unauthorized access to repository data.


158-181: LGTM: Path traversal protection in place.

The method properly validates that paths are repository-relative, rejecting absolute paths and parent directory references before constructing file URIs. This prevents accessing files outside the intended workspace.


183-212: LGTM: Flexible language server configuration.

The environment variable override system provides good flexibility for customizing language server commands. The migration of PYTHON_LSP_COMMAND to the new prefix maintains backward compatibility.


215-283: LGTM: Thorough executable resolution.

The search logic comprehensively checks multiple locations (PATH, venv, .lsp_binaries) and handles platform-specific variations for OmniSharp (capital vs lowercase, .exe extension).


352-418: LGTM: Robust query execution flow.

The method properly validates preconditions (repo manager, project details, worktree existence), constructs the LSP request with normalized parameters, and delegates to the server manager. Defensive exception handling for metadata updates is appropriate.


420-464: LGTM: Standard async/sync wrapper pattern.

The implementation follows the established pattern from other tools (e.g., jira_tools/get_jira_project_users_tool.py), providing both synchronous and asynchronous execution paths.


467-491: LGTM: Guarded tool creation.

The factory correctly prevents tool registration when repository management is unavailable, mirroring the bash_command_tool pattern and ensuring the tool is only available when it can function properly.

app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (10)

74-114: LGTM: Clean manager initialization and registration API.

The initialization properly sets up session tracking structures, and the registration API provides clear methods for configuring language servers. Error handling in _get_language_config correctly chains the KeyError for debugging.


127-159: LGTM: Method-specific parameter validation.

The parameter building correctly enforces LSP protocol requirements for each method type, with clear error messages when required parameters are missing. This complements the input validation in LspQueryToolInput.


161-309: LGTM: Flexible normalization for multiple LSP implementations.

The normalization methods handle both standard LSP and Jedi-specific formats, enabling compatibility across different language server implementations. The approximation in line 187 (character + 10) is clearly documented as an estimate when the exact end position isn't available.


336-390: LGTM: Robust indexing status check.

The method uses a two-tier approach—checking both in-memory state and persistent marker files—to reliably determine if a workspace has been indexed. Defensive exception handling for file I/O is appropriate.


392-657: Approve complex but necessary Pyright CLI integration.

The method implements comprehensive Pyright CLI execution with progress tracking, cache management, and output parsing. While complex, the functionality is appropriate for providing reliable indexing completion signals.


658-917: LGTM: Comprehensive cache management.

The cache configuration properly handles per-language requirements with careful symlink management, fallback handling for broken links, and integration with build system detection for C/C++. The defensive exception handling ensures graceful degradation when filesystem operations fail.


919-976: LGTM: Clean session factory pattern.

The method appropriately creates language-specific session types (JediSession for Python, ClangdSession for C/C++, PyglsClientSession for others) with proper cache configuration and startup timing.


1907-2022: LGTM: Well-structured query execution.

The method properly handles session lifecycle, document opening/closing, and result normalization. The try-finally ensures documents are closed even on errors.


2024-2118: LGTM: Clean workspace indexing API.

The method provides a clear public API for pre-indexing workspaces during parsing, reusing the existing session warmup logic and properly tracking results per language.


545-545: Incomplete refactor suggestion - timezone import required but not mentioned.

The suggested replacement of datetime.utcnow() with datetime.now(timezone.utc) requires importing timezone from the datetime module. However, the file currently only imports the datetime class (at lines 409, 1198, and 1799), not timezone. The fix requires updating these import statements to include timezone before the replacement can be applied.

Additionally, the project's Python version requirement could not be verified to confirm whether the deprecation applies to the target version.

Comment on lines +272 to +307
def create_compile_flags_txt(
workspace_root: str, language: str = "c"
) -> Optional[Path]:
"""
Create a basic compile_flags.txt file as a fallback.
This is a simple alternative to compile_commands.json that works
for projects with consistent compiler flags.
Returns:
Path to created compile_flags.txt, or None if failed
"""
workspace_path = Path(workspace_root)
compile_flags = workspace_path / "compile_flags.txt"

# Don't overwrite existing file
if compile_flags.exists():
logger.info(f"[clangd] compile_flags.txt already exists at {compile_flags}")
return compile_flags

# Detect include directories
include_dirs = []
for include_dir in ["include", "inc", "src", "."]:
include_path = workspace_path / include_dir
if include_path.exists() and include_path.is_dir():
include_dirs.append(f"-I{include_dir}")

# Set C standard based on language
std_flag = "-std=c11" if language == "c" else "-std=c++17"

# Create basic compile_flags.txt
flags = [
f"-x{language}", # Language type
std_flag,
"-Wall",
] + include_dirs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix -x language flag for C++ in compile_flags.txt

create_compile_flags_txt currently does:

std_flag = "-std=c11" if language == "c" else "-std=c++17"
flags = [
    f"-x{language}",  # Language type
    std_flag,
    "-Wall",
] + include_dirs

If callers pass "cpp" (as suggested in the docstring), this produces -xcpp, which isn’t a valid Clang/GCC language kind and will likely break clangd’s parsing when falling back to compile_flags.txt.

Consider normalizing the language for the -x flag:

-    # Set C standard based on language
-    std_flag = "-std=c11" if language == "c" else "-std=c++17"
+    # Normalize language and set standard
+    norm_lang = "c" if language == "c" else "c++"
+    std_flag = "-std=c11" if norm_lang == "c" else "-std=c++17"
@@
-    flags = [
-        f"-x{language}",  # Language type
+    flags = [
+        f"-x{norm_lang}",  # Language type
         std_flag,
         "-Wall",
     ] + include_dirs

so clangd sees the correct language.

🧰 Tools
🪛 Ruff (0.14.5)

303-307: Consider iterable unpacking instead of concatenation

(RUF005)

🤖 Prompt for AI Agents
In app/modules/intelligence/tools/code_query_tools/clangd_helpers.py around
lines 272-307, the function sets the language flag using f"-x{language}" which
yields invalid flags like "-xcpp"; normalize incoming language values so the -x
flag uses recognized Clang/GCC kinds (use "-xc" for C and "-xc++" for C++), map
accepted inputs like "c", "cpp", "c++" accordingly, keep the std_flag logic (c
=> -std=c11, c++ => -std=c++17), and then build flags using the normalized -x
value plus the std and other flags before writing compile_flags.txt.

Comment on lines +59 to +71
def reset_lsp_server_manager() -> None:
"""Reset the global LSP server manager (useful for testing or cleanup)."""
global _lsp_server_manager
if _lsp_server_manager is not None:
logger.info("LspServerManager: Resetting global manager instance")
# Shutdown all sessions before resetting
for session in _lsp_server_manager._sessions.values():
try:
# Note: This is a best-effort cleanup. Sessions may need async cleanup.
pass
except Exception:
pass
_lsp_server_manager = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incomplete session cleanup in reset function.

The loop over sessions doesn't actually perform shutdown—it just passes. While the comment acknowledges this is "best-effort" and sessions "may need async cleanup," this could leak resources when the manager is reset.

Consider adding async cleanup support or at least attempting synchronous shutdown. Would you like me to propose an implementation that properly cleans up sessions during reset?

🧰 Tools
🪛 Ruff (0.14.5)

65-65: Loop control variable session not used within loop body

Rename unused session to _session

(B007)


69-70: try-except-pass detected, consider logging the exception

(S110)


69-69: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py around
lines 59 to 71, the reset_lsp_server_manager function currently iterates
sessions but does nothing, risking resource leaks; update it to perform an
actual shutdown for each session: if the session object exposes a synchronous
close/shutdown method call it and catch/log exceptions; if shutdown is async,
schedule and run it safely (use asyncio.get_event_loop().run_until_complete or
asyncio.run when no running loop) with proper exception handling and timeouts;
after attempting shutdown for all sessions clear the _sessions dict and then set
_lsp_server_manager to None so resources are released and repeated resets are
safe.


# Read output line by line to track progress
files_analyzed = 0
total_time = 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove unused variable.

The total_time variable is assigned but never used. Apply this diff:

-files_analyzed = 0
-total_time = 0.0
-output_lines = []
+files_analyzed = 0
+output_lines = []

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.5)

505-505: Local variable total_time is assigned to but never used

Remove assignment to unused variable total_time

(F841)

🤖 Prompt for AI Agents
In app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py around
line 505, the local variable assignment "total_time = 0.0" is unused; remove
that line to eliminate the dead variable and ensure no other code later
references it (if it is needed, replace with a used timing mechanism or store it
appropriately), then run tests/lint to confirm no references remain.

Comment on lines +977 to +1906
async def _warmup_session(
self,
request: LspQueryRequest,
workspace_root: str,
session: BaseLspServerSession,
config: LanguageServerConfig,
status_messages: List[str],
) -> None:
key = self._session_key(request.project_id, request.language)
language = request.language

if key in self._warmed_sessions:
logger.debug(f"Session {key} already warmed up, skipping warmup")
return

lock = self._warmup_locks.setdefault(key, asyncio.Lock())
async with lock:
if key in self._warmed_sessions:
return

# Map language to file extensions
language_extensions = {
"python": ["*.py"],
"typescript": ["*.ts", "*.tsx"],
"javascript": ["*.js", "*.jsx"],
"go": ["*.go"],
"java": ["*.java"],
"rust": ["*.rs"],
"c": ["*.c", "*.h"],
"cpp": ["*.cpp", "*.cc", "*.cxx", "*.hpp", "*.h"],
"csharp": ["*.cs"],
"ruby": ["*.rb"],
"php": ["*.php"],
}

extensions = language_extensions.get(language, [])
if not extensions:
# Unknown language, skip indexing
self._warmed_sessions.add(key)
return

# Collect all matching files
all_files = []
for ext in extensions:
all_files.extend(Path(workspace_root).rglob(ext))

if not all_files:
self._warmed_sessions.add(key)
return

status_messages.append(
f"Indexing {len(all_files)} {language} files for workspace analysis"
)

# Pyright doesn't actually "index" - it analyzes files on-demand
# To trigger analysis, we need to:
# 1. Open files with didOpen
# 2. Query files with documentSymbol to trigger analysis
# 3. Then workspace/symbol can return results

opened_files = []
opened_file_data = [] # Store (uri, file_path) for verification

# Step 1: For Jedi, we don't need to open files - it works directly with file paths
# For other language servers, open files with didOpen
if not (language == "python" and isinstance(session, JediSession)):
logger.info(
f"[LSP] Opening {len(all_files)} {language} files and triggering analysis..."
)
# Open all files for non-Jedi language servers
for idx, file_path in enumerate(all_files, start=1):
try:
text = file_path.read_text(encoding="utf-8")
except OSError:
continue

uri = file_path.as_uri()
await session.send_notification(
"textDocument/didOpen",
{
"textDocument": {
"uri": uri,
"languageId": language,
"version": 1,
"text": text,
}
},
)
opened_files.append(uri)
opened_file_data.append((uri, file_path))

if idx % 50 == 0:
await asyncio.sleep(0.1) # Small delay every 50 files
logger.info(f"[LSP] Opened {idx}/{len(all_files)} files...")

logger.info(
f"[LSP] Opened {len(opened_files)} files. Triggering analysis by querying files..."
)
status_messages.append(f"Opened {len(opened_files)} {language} files")
else:
# For Jedi, just prepare file data for indexing
for file_path in all_files:
uri = file_path.as_uri()
opened_file_data.append((uri, file_path))

# Step 2: Index workspace using Jedi for Python
if language == "python" and isinstance(session, JediSession):
status_messages.append("Indexing workspace with Jedi...")
logger.info(
"[LSP] Indexing Python files with Jedi for workspace symbol search..."
)

# Get all Python file paths
python_file_paths = [str(f) for f in all_files]

# Index workspace (this will cache workspace symbols)
await session.index_workspace(python_file_paths)

logger.info(f"[LSP] Jedi indexed {len(python_file_paths)} Python files")
status_messages.append(
f"Indexed {len(python_file_paths)} Python files with Jedi"
)

# Verify indexing by checking workspace symbols
try:
from app.modules.intelligence.tools.code_query_tools.lsp_types import (
LspMethod,
)

ws_result = await session.send_request(
LspMethod.WORKSPACE_SYMBOL, {"query": ""}
)
symbol_count = len(ws_result) if isinstance(ws_result, list) else 0
logger.info(
f"[LSP] Jedi workspace symbol search returned {symbol_count} symbols"
)
status_messages.append(
f"Workspace symbol search: {symbol_count} symbols found"
)
except Exception as exc:
logger.warning(f"[LSP] Failed to verify Jedi indexing: {exc}")

# Mark as warmed up - Jedi doesn't need file opening
verification_passed = True
opened_files = [] # Clear opened files since Jedi doesn't need them
elif language == "python":
# Fallback for non-Jedi Python sessions (shouldn't happen)
status_messages.append("Warning: Python session is not using Jedi")
verification_passed = False

from app.modules.intelligence.tools.code_query_tools.lsp_types import (
LspMethod,
)

# Query a subset of files to trigger analysis
# We don't need to query all files - querying some will trigger Pyright to analyze them
files_to_query = min(20, len(opened_file_data)) # Query up to 20 files
analyzed_count = 0
total_symbols_found = 0

for idx, (uri, file_path) in enumerate(
opened_file_data[:files_to_query], 1
):
try:
# Query documentSymbol to trigger analysis
doc_symbol_result = await session.send_request(
LspMethod.DOCUMENT_SYMBOL, {"textDocument": {"uri": uri}}
)

symbol_count = (
len(doc_symbol_result)
if isinstance(doc_symbol_result, list)
else 0
)

if symbol_count > 0:
analyzed_count += 1
total_symbols_found += symbol_count

# Small delay to avoid overwhelming Pyright
if idx % 10 == 0:
await asyncio.sleep(0.5)
logger.info(
f"[LSP] Queried {idx}/{files_to_query} files, "
f"{analyzed_count} analyzed, {total_symbols_found} symbols found"
)
except Exception as exc:
logger.debug(
f"[LSP] Error querying documentSymbol for {file_path.name}: {exc}"
)

logger.info(
f"[LSP] Triggered analysis on {files_to_query} files: "
f"{analyzed_count} files analyzed, {total_symbols_found} total symbols"
)
status_messages.append(
f"Triggered analysis: {analyzed_count}/{files_to_query} files analyzed "
f"({total_symbols_found} symbols)"
)

# Step 4: Poll to wait for Pyright to finish analyzing files
# We'll poll documentSymbol on sample files until they're all analyzed
if analyzed_count > 0:
status_messages.append(
"Waiting for Pyright to complete analysis of all files..."
)
logger.info(
"[LSP] Polling to wait for Pyright to complete analysis..."
)

# Poll sample files until they're all analyzed
max_poll_attempts = (
30 # Poll for up to 30 attempts (60 seconds with 2s interval)
)
poll_interval = 2.0 # 2 seconds between polls
sample_size = min(10, len(opened_file_data))
analyzed_in_sample = 0 # Initialize for use in else clause

# Get cache directory for incremental updates
cache_dir = self._get_lsp_cache_dir(workspace_root, language)
index_marker = cache_dir / ".indexed"
from datetime import datetime

for poll_attempt in range(max_poll_attempts):
analyzed_in_sample = 0
total_symbols_in_sample = 0

for uri, file_path in opened_file_data[:sample_size]:
try:
doc_result = await session.send_request(
LspMethod.DOCUMENT_SYMBOL,
{"textDocument": {"uri": uri}},
)
symbol_count = (
len(doc_result)
if isinstance(doc_result, list)
else 0
)
if symbol_count > 0:
analyzed_in_sample += 1
total_symbols_in_sample += symbol_count
except Exception:
pass

# Update cache marker file with progress information
try:
# Check Pyright cache file count
pyright_cache_dir = cache_dir / "pyright"
cache_file_count = 0
if pyright_cache_dir.exists():
cache_files = list(pyright_cache_dir.rglob("*"))
cache_file_count = len(
[f for f in cache_files if f.is_file()]
)

# Write progress to marker file
progress_data = {
"status": "indexing",
"progress": {
"files_analyzed": analyzed_in_sample,
"sample_size": sample_size,
"total_symbols": total_symbols_in_sample,
"poll_attempt": poll_attempt + 1,
"max_polls": max_poll_attempts,
"elapsed_seconds": poll_attempt * poll_interval,
"cache_files": cache_file_count,
},
"timestamp": datetime.utcnow().isoformat(),
"workspace": workspace_root,
}

import json

index_marker.write_text(
json.dumps(progress_data, indent=2), encoding="utf-8"
)
except Exception as exc:
logger.debug(f"[LSP] Failed to update cache marker: {exc}")

# If most files in sample are analyzed, we're done
if analyzed_in_sample >= (sample_size * 0.8): # 80% threshold
logger.info(
f"[LSP] Analysis complete: {analyzed_in_sample}/{sample_size} files analyzed "
f"after {poll_attempt + 1} polls ({poll_attempt * poll_interval:.1f}s)"
)
status_messages.append(
f"Analysis complete: {analyzed_in_sample}/{sample_size} files analyzed "
f"({total_symbols_in_sample} symbols)"
)
break

if (
poll_attempt % 5 == 0 and poll_attempt > 0
): # Log every 5 polls (every 10 seconds)
# Get cache file count for logging
cache_file_count_log = 0
try:
pyright_cache_dir_log = cache_dir / "pyright"
if pyright_cache_dir_log.exists():
cache_files_log = list(
pyright_cache_dir_log.rglob("*")
)
cache_file_count_log = len(
[f for f in cache_files_log if f.is_file()]
)
except Exception:
pass

logger.info(
f"[LSP] Polling progress: {poll_attempt + 1}/{max_poll_attempts} polls, "
f"{analyzed_in_sample}/{sample_size} files analyzed "
f"({poll_attempt * poll_interval:.1f}s elapsed, {cache_file_count_log} cache files)"
)
status_messages.append(
f"Analysis in progress: {analyzed_in_sample}/{sample_size} files analyzed..."
)

await asyncio.sleep(poll_interval)
else:
# Polling completed without reaching threshold
logger.warning(
f"[LSP] Analysis polling completed: {analyzed_in_sample}/{sample_size} files analyzed "
f"after {max_poll_attempts} polls"
)
status_messages.append(
f"Analysis polling completed: {analyzed_in_sample}/{sample_size} files analyzed"
)
else:
logger.warning(
"[LSP] No files were analyzed. Pyright might not be responding to queries."
)
status_messages.append(
"Warning: No files analyzed, but continuing..."
)
await asyncio.sleep(5.0)

# Pyright doesn't have a traditional "indexing complete" event
# Since we've already queried files to trigger analysis, we can verify
# by checking if workspace/symbol returns results or if files are analyzed

# If we already analyzed files successfully, we can consider indexing working
# even if workspace/symbol doesn't return results immediately
verification_passed = False

if language == "python" and len(opened_file_data) > 0:
# Check if we successfully analyzed files
# If we did, Pyright is working and can answer queries
status_messages.append("Verifying Pyright is ready for queries...")
logger.info(
"[LSP] Verifying Pyright can answer workspace/symbol queries..."
)

# Try workspace/symbol a few times to see if it returns results
from app.modules.intelligence.tools.code_query_tools.lsp_types import (
LspMethod,
)

workspace_symbol_count = 0
for attempt in range(3):
try:
ws_result = await session.send_request(
LspMethod.WORKSPACE_SYMBOL, {"query": ""}
)
workspace_symbol_count = (
len(ws_result) if isinstance(ws_result, list) else 0
)
if workspace_symbol_count > 0:
logger.info(
f"[LSP] workspace/symbol returned {workspace_symbol_count} symbols "
f"on attempt {attempt + 1}"
)
verification_passed = True
status_messages.append(
f"Workspace symbol search working: {workspace_symbol_count} symbols found"
)
break
await asyncio.sleep(2.0) # Wait between attempts
except Exception as exc:
logger.debug(
f"[LSP] workspace/symbol query failed (attempt {attempt + 1}): {exc}"
)
await asyncio.sleep(2.0)

# If workspace/symbol doesn't work, verify by checking documentSymbol on files
# If files can be analyzed, Pyright is working
if not verification_passed:
logger.info(
"[LSP] workspace/symbol returned 0, verifying files can be analyzed..."
)
status_messages.append("Verifying files can be analyzed...")

# Check a few files to see if they can be analyzed
files_to_check = min(5, len(opened_file_data))
analyzable_count = 0

for uri, file_path in opened_file_data[:files_to_check]:
try:
doc_result = await session.send_request(
LspMethod.DOCUMENT_SYMBOL,
{"textDocument": {"uri": uri}},
)
symbol_count = (
len(doc_result) if isinstance(doc_result, list) else 0
)
if symbol_count > 0:
analyzable_count += 1
except Exception:
pass

# If we can analyze files, Pyright is working
# This is sufficient for our use case
if analyzable_count >= (files_to_check * 0.6): # 60% threshold
verification_passed = True
logger.info(
f"[LSP] Files are analyzable: {analyzable_count}/{files_to_check}. "
"Pyright is working correctly."
)
status_messages.append(
f"Verification passed: {analyzable_count}/{files_to_check} files can be analyzed"
)
else:
logger.warning(
f"[LSP] Only {analyzable_count}/{files_to_check} files are analyzable. "
"Pyright might not be working correctly."
)
status_messages.append(
f"Warning: Only {analyzable_count}/{files_to_check} files are analyzable"
)
elif language in ("c", "cpp") and isinstance(session, ClangdSession):
# For clangd, we need to query files to trigger indexing
# clangd builds its index lazily when files are queried
logger.info(
f"[LSP] Triggering clangd indexing by querying {len(opened_file_data)} files..."
)
status_messages.append("Querying files to trigger clangd indexing...")

# Query a subset of files to trigger clangd to analyze them
# We don't need to query all files - querying some will trigger clangd to index them
files_to_query = min(50, len(opened_file_data)) # Query up to 50 files
analyzed_count = 0
total_symbols_found = 0

from app.modules.intelligence.tools.code_query_tools.lsp_types import (
LspMethod,
)

for idx, (uri, file_path) in enumerate(
opened_file_data[:files_to_query], 1
):
try:
# Query documentSymbol to trigger clangd to analyze the file
doc_symbol_result = await session.send_request(
LspMethod.DOCUMENT_SYMBOL, {"textDocument": {"uri": uri}}
)

symbol_count = (
len(doc_symbol_result)
if isinstance(doc_symbol_result, list)
else 0
)

if symbol_count > 0:
analyzed_count += 1
total_symbols_found += symbol_count

# Small delay to avoid overwhelming clangd
if idx % 10 == 0:
await asyncio.sleep(0.5)
logger.info(
f"[LSP] Queried {idx}/{files_to_query} files, "
f"{analyzed_count} analyzed, {total_symbols_found} symbols found"
)
except Exception as exc:
logger.debug(
f"[LSP] Error querying documentSymbol for {file_path.name}: {exc}"
)

logger.info(
f"[LSP] Triggered clangd indexing on {files_to_query} files: "
f"{analyzed_count} files analyzed, {total_symbols_found} total symbols"
)
status_messages.append(
f"Triggered indexing: {analyzed_count}/{files_to_query} files analyzed "
f"({total_symbols_found} symbols)"
)

# Check if compile_commands.json exists (clangd works much better with it)
cache_dir = self._get_lsp_cache_dir(workspace_root, language)
workspace_path = Path(workspace_root)
compile_commands_path = workspace_path / "compile_commands.json"
compile_flags_path = workspace_path / "compile_flags.txt"
has_compile_commands = compile_commands_path.exists()
has_compile_flags = compile_flags_path.exists()

# Initialize index_file_count for use after the if/else block
index_file_count = 0

# Clangd won't build a comprehensive background index without compile_commands.json
# Without it, clangd only indexes opened files, not the entire project
if not has_compile_commands and not has_compile_flags:
logger.info(
"[LSP] No compile_commands.json or compile_flags.txt found. "
"Clangd will only index opened files, not build a full background index. "
"Skipping native index wait - clangd is ready for queries."
)
status_messages.append(
"Note: No compile_commands.json - clangd will index files on-demand"
)
verification_passed = True
# Don't wait for native index - it won't be built without compile_commands.json
else:
# Wait for clangd to build its native index
# Clangd builds its index in the background after files are queried
# The native index is what makes clangd fast
status_messages.append(
"Waiting for clangd to build native index..."
)
logger.info(
"[LSP] Waiting for clangd to build native index (this makes the LSP server faster)..."
)

# Check clangd's native index location
# Clangd stores index in .cache/clangd/index in workspace root
# We have a symlink pointing to our cache directory
clangd_index_in_workspace = (
workspace_path / ".cache" / "clangd" / "index"
)
clangd_cache_target = (
cache_dir / "clangd_native" / "clangd" / "index"
)

# Check both locations (symlink target and direct path)
clangd_index_dir = clangd_cache_target
if (
clangd_index_in_workspace.exists()
and clangd_index_in_workspace.is_symlink()
):
try:
resolved = clangd_index_in_workspace.resolve()
if resolved.exists():
clangd_index_dir = resolved
except Exception:
pass

# Poll for clangd index files
# With compile_commands.json, clangd should build a comprehensive background index
max_wait_attempts = (
60 # Wait up to 60 attempts (3 minutes with 3s interval)
)
wait_interval = 3.0 # 3 seconds between checks
index_file_count = 0
last_index_file_count = 0
stable_count = (
0 # Count how many times the file count stayed the same
)

for wait_attempt in range(max_wait_attempts):
# Check if clangd has created index files
if clangd_index_dir.exists():
index_files = list(clangd_index_dir.rglob("*"))
index_files = [f for f in index_files if f.is_file()]
index_file_count = len(index_files)

if index_file_count > 0:
# Check if index is still growing
if index_file_count == last_index_file_count:
stable_count += 1
else:
stable_count = 0

last_index_file_count = index_file_count

# Log progress periodically
if wait_attempt % 10 == 0 or index_file_count > 0:
logger.info(
f"[LSP] Clangd index building: {index_file_count} index files "
f"(attempt {wait_attempt + 1}/{max_wait_attempts}, stable for {stable_count} checks)"
)

# If index count is stable for 3 checks (9 seconds), indexing is likely complete
if stable_count >= 3 and wait_attempt >= 10:
verification_passed = True
status_messages.append(
f"Clangd native index complete: {index_file_count} index files"
)
logger.info(
f"[LSP] Clangd native index appears complete: {index_file_count} index files "
f"(stable for {stable_count * wait_interval:.1f}s)"
)
break
else:
# Index directory doesn't exist yet, clangd might not have started
if wait_attempt < 10: # First 30 seconds
logger.debug(
f"[LSP] Clangd index directory not found yet (attempt {wait_attempt + 1})"
)
elif (
wait_attempt % 20 == 0
): # Every 60 seconds after initial wait
logger.info(
f"[LSP] Clangd index directory still not created after {wait_attempt * wait_interval:.0f}s. "
"Indexing may take time for large projects."
)

await asyncio.sleep(wait_interval)

# If we didn't find index files but files were analyzed, that's okay
# Clangd might still be building the index in the background
if index_file_count == 0 and analyzed_count > 0:
logger.info(
"[LSP] Files were analyzed but native index not yet created. "
"Clangd may continue building index in background."
)
status_messages.append(
f"Files analyzed: {analyzed_count} files (index may build in background)"
)
verification_passed = True

if index_file_count > 0:
verification_passed = True
logger.info(
f"[LSP] Clangd indexing complete: {analyzed_count} files analyzed, "
f"{index_file_count} native index files"
)
status_messages.append(
f"Indexing complete: {analyzed_count} files analyzed, "
f"{index_file_count} native index files created"
)
elif analyzed_count > 0:
# Files were analyzed but no native index yet - still mark as success
# The native index might build later
verification_passed = True
logger.info(
f"[LSP] Clangd files analyzed but native index not yet created: "
f"{analyzed_count} files analyzed"
)
status_messages.append(
f"Files analyzed: {analyzed_count} files "
f"(native index may build later)"
)
else:
logger.warning(
"[LSP] No files were successfully analyzed by clangd"
)
status_messages.append(
"Warning: No files were successfully analyzed"
)
verification_passed = False

elif language == "csharp" and isinstance(session, PyglsClientSession):
# For OmniSharp (C#), we need to query files to trigger indexing
# OmniSharp builds its index when files are queried
logger.info(
f"[LSP] Triggering OmniSharp indexing by querying {len(opened_file_data)} files..."
)
status_messages.append(
"Querying files to trigger OmniSharp indexing..."
)

# Query a subset of files to trigger OmniSharp to analyze them
# We don't need to query all files - querying some will trigger OmniSharp to index them
files_to_query = min(50, len(opened_file_data)) # Query up to 50 files
analyzed_count = 0
total_symbols_found = 0

from app.modules.intelligence.tools.code_query_tools.lsp_types import (
LspMethod,
)

for idx, (uri, file_path) in enumerate(
opened_file_data[:files_to_query], 1
):
try:
# Query documentSymbol to trigger OmniSharp to analyze the file
doc_symbol_result = await session.send_request(
LspMethod.DOCUMENT_SYMBOL, {"textDocument": {"uri": uri}}
)

symbol_count = (
len(doc_symbol_result)
if isinstance(doc_symbol_result, list)
else 0
)

if symbol_count > 0:
analyzed_count += 1
total_symbols_found += symbol_count

# Small delay to avoid overwhelming OmniSharp
if idx % 10 == 0:
await asyncio.sleep(0.5)
logger.info(
f"[LSP] Queried {idx}/{files_to_query} files, "
f"{analyzed_count} analyzed, {total_symbols_found} symbols found"
)
except Exception as exc:
logger.debug(
f"[LSP] Error querying documentSymbol for {file_path.name}: {exc}"
)

logger.info(
f"[LSP] Triggered OmniSharp indexing on {files_to_query} files: "
f"{analyzed_count} files analyzed, {total_symbols_found} total symbols"
)
status_messages.append(
f"Triggered indexing: {analyzed_count}/{files_to_query} files analyzed "
f"({total_symbols_found} symbols)"
)

# Wait for OmniSharp to complete indexing
# OmniSharp processes files asynchronously, so we wait a bit for it to catch up
if analyzed_count > 0:
status_messages.append(
"Waiting for OmniSharp to complete indexing..."
)
logger.info("[LSP] Waiting for OmniSharp to complete indexing...")
# Wait a bit for OmniSharp to process the files
await asyncio.sleep(5.0)
verification_passed = True
else:
logger.warning(
"[LSP] No files were successfully analyzed by OmniSharp"
)
status_messages.append(
"Warning: No files were successfully analyzed"
)
verification_passed = False

else:
# For other non-Python languages, use the polling method
if isinstance(session, PyglsClientSession):
logger.info(
f"[LSP] Starting to wait for indexing completion for {language} "
f"(timeout: 60s, interval: 3s)"
)
indexing_complete = await session.wait_for_indexing_complete(
timeout=60.0, test_query_interval=3.0
)
verification_passed = indexing_complete
else:
await asyncio.sleep(5.0)
verification_passed = False

# For Jedi, cache is written synchronously during indexing, so no wait needed
# For other language servers (like Pyright, clangd), keep files open longer to ensure cache is persisted
if language == "python" and not isinstance(session, JediSession):
if verification_passed:
status_messages.append("Waiting for cache to persist...")
logger.info("[LSP] Waiting 5 seconds for cache to persist...")
await asyncio.sleep(5.0) # Give time to write cache files
else:
# Even if verification failed, wait a bit - might still be processing
status_messages.append(
"Waiting additional time for processing files..."
)
logger.info(
"[LSP] Verification failed, but waiting 10 more seconds for processing..."
)
await asyncio.sleep(10.0)
elif language in ("c", "cpp") and isinstance(session, ClangdSession):
# For clangd, keep files open a bit longer to ensure cache is fully persisted
if verification_passed:
status_messages.append("Waiting for clangd cache to persist...")
logger.info(
"[LSP] Waiting 10 seconds for clangd cache to persist..."
)
await asyncio.sleep(10.0) # Give clangd time to write cache files
else:
# Even if verification failed, wait a bit - clangd might still be processing
status_messages.append(
"Waiting additional time for clangd to process files..."
)
logger.info(
"[LSP] Verification failed, but waiting 15 more seconds for clangd processing..."
)
await asyncio.sleep(15.0)
elif language == "csharp" and isinstance(session, PyglsClientSession):
# For OmniSharp, keep files open a bit longer to ensure cache is fully persisted
if verification_passed:
status_messages.append("Waiting for OmniSharp cache to persist...")
logger.info(
"[LSP] Waiting 5 seconds for OmniSharp cache to persist..."
)
await asyncio.sleep(5.0) # Give OmniSharp time to write cache files
else:
# Even if verification failed, wait a bit - OmniSharp might still be processing
status_messages.append(
"Waiting additional time for OmniSharp to process files..."
)
logger.info(
"[LSP] Verification failed, but waiting 10 more seconds for OmniSharp processing..."
)
await asyncio.sleep(10.0)

# Close all opened files now that indexing is complete
for uri in opened_files:
try:
await session.send_notification(
"textDocument/didClose", {"textDocument": {"uri": uri}}
)
except Exception:
pass # Ignore errors when closing files

if not verification_passed:
logger.warning(
f"Index verification failed for {language}, but continuing"
)
status_messages.append(
"Index verification incomplete, but marking as done"
)

self._warmed_sessions.add(key)
status_messages.append(f"Workspace indexing complete for {language}")
logger.info(
f"[LSP] Warmup session completed for {language} (project: {request.project_id})"
)

# Create a marker file to indicate successful indexing
# Write a timestamp to the file so we can track when indexing completed
cache_dir = self._get_lsp_cache_dir(workspace_root, language)
index_marker = cache_dir / ".indexed"
try:
from datetime import datetime

timestamp = datetime.utcnow().isoformat()

# Check cache directory for files (Jedi for Python, clangd for C/C++, Pyright for others)
cache_file_count = 0
if language == "python":
jedi_cache_dir = cache_dir / "jedi_cache"
if jedi_cache_dir.exists():
cache_files = list(jedi_cache_dir.rglob("*.json"))
cache_file_count = len(cache_files)
if cache_file_count > 0:
status_messages.append(
f"Jedi cache created: {cache_file_count} cache files found"
)
logger.info(
f"Jedi cache directory contains {cache_file_count} files"
)
else:
status_messages.append(
"Warning: Jedi cache directory exists but is empty"
)
logger.warning(
f"Jedi cache directory {jedi_cache_dir} exists but contains no files."
)
elif language in ("c", "cpp"):
# Check clangd native index (clangd's built-in cache)
clangd_cache_target = (
cache_dir / "clangd_native" / "clangd" / "index"
)

# Also check workspace .cache location (where clangd actually stores it)
workspace_path = Path(workspace_root)
clangd_index_in_workspace = (
workspace_path / ".cache" / "clangd" / "index"
)

native_index_count = 0

# Count native index files (check both symlink target and workspace location)
index_dirs_to_check = [clangd_cache_target]
if clangd_index_in_workspace.exists():
if clangd_index_in_workspace.is_symlink():
try:
resolved = clangd_index_in_workspace.resolve()
if (
resolved.exists()
and resolved not in index_dirs_to_check
):
index_dirs_to_check.append(resolved)
except Exception:
pass
elif clangd_index_in_workspace.is_dir():
index_dirs_to_check.append(clangd_index_in_workspace)

for index_dir in index_dirs_to_check:
if index_dir.exists():
index_files = list(index_dir.rglob("*"))
index_files = [f for f in index_files if f.is_file()]
native_index_count += len(index_files)

if native_index_count > 0:
status_messages.append(
f"Clangd native index: {native_index_count} index files"
)
logger.info(
f"Clangd native index: {native_index_count} index files "
f"(checked {clangd_index_in_workspace} and {clangd_cache_target})"
)
else:
# For other languages, check Pyright cache
pyright_cache_dir = cache_dir / "pyright"
if pyright_cache_dir.exists():
cache_files = list(pyright_cache_dir.rglob("*"))
cache_files = [f for f in cache_files if f.is_file()]
cache_file_count = len(cache_files)
if cache_file_count > 0:
status_messages.append(
f"Cache created: {cache_file_count} cache files found"
)
logger.info(
f"Cache directory contains {cache_file_count} files"
)

# Write final completion status (JSON format)
# The marker file may have been updated during polling, so we finalize it now
import json

completion_data = {
"status": "completed",
"indexed_at": timestamp,
"workspace": workspace_root,
"verification_passed": verification_passed,
"cache_files": cache_file_count,
"files_opened": len(opened_files),
}

index_marker.write_text(
json.dumps(completion_data, indent=2), encoding="utf-8"
)
logger.info(
f"Created index marker for {language} at {index_marker} "
f"(workspace: {workspace_root}, timestamp: {timestamp}, "
f"cache_files: {cache_file_count})"
)
except Exception as exc:
logger.warning(f"Failed to create index marker for {language}: {exc}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Extreme cognitive complexity requires refactoring.

This 929-line method has a cognitive complexity of 285 (allowed: 15), making it difficult to understand, test, and maintain. The method handles multiple language-specific indexing workflows with significant code duplication.

Consider extracting language-specific indexing logic into separate methods:

  • _warmup_jedi_session() for Python/Jedi
  • _warmup_pyright_session() for Python/Pyright
  • _warmup_clangd_session() for C/C++
  • _warmup_omnisharp_session() for C#
  • _warmup_generic_session() for other languages

This would improve readability, testability, and reduce duplication. Would you like me to propose a refactored structure?

🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis

[failure] 977-977: Refactor this function to reduce its Cognitive Complexity from 285 to the 15 allowed.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRom-DwdMfc7aJIegM&open=AZqRom-DwdMfc7aJIegM&pullRequest=492

🪛 Ruff (0.14.5)

982-982: Unused method argument: config

(ARG002)


1116-1116: Do not catch blind exception: Exception

(BLE001)


1163-1163: Do not catch blind exception: Exception

(BLE001)


1204-1204: Loop control variable file_path not used within loop body

(B007)


1218-1219: try-except-pass detected, consider logging the exception

(S110)


1218-1218: Do not catch blind exception: Exception

(BLE001)


1253-1253: Do not catch blind exception: Exception

(BLE001)


1282-1283: try-except-pass detected, consider logging the exception

(S110)


1282-1282: Do not catch blind exception: Exception

(BLE001)


1354-1354: Do not catch blind exception: Exception

(BLE001)


1372-1372: Loop control variable file_path not used within loop body

(B007)


1383-1384: try-except-pass detected, consider logging the exception

(S110)


1383-1383: Do not catch blind exception: Exception

(BLE001)


1449-1449: Do not catch blind exception: Exception

(BLE001)


1518-1519: try-except-pass detected, consider logging the exception

(S110)


1518-1518: Do not catch blind exception: Exception

(BLE001)


1672-1672: Do not catch blind exception: Exception

(BLE001)


1777-1778: try-except-pass detected, consider logging the exception

(S110)


1777-1777: Do not catch blind exception: Exception

(BLE001)


1849-1850: try-except-pass detected, consider logging the exception

(S110)


1849-1849: Do not catch blind exception: Exception

(BLE001)


1904-1904: Do not catch blind exception: Exception

(BLE001)

Comment on lines +230 to 242
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Early return breaks gVisor on Mac/Windows with Docker.

The early return at lines 230-240 appears to prevent using Docker with gVisor on Mac/Windows. Here's the issue:

  1. On Mac/Windows, is_gvisor_available() returns True if Docker with runsc runtime is available (lines 64-71), regardless of whether a local runsc binary exists.
  2. get_runsc_binary() may return None on Mac/Windows if there's no local runsc binary (runsc is provided by Docker runtime, not installed locally).
  3. The new early return would then fallback to regular subprocess, bypassing the Docker+gVisor path at lines 274-296.

This breaks the intended functionality where Docker with gVisor should be used on Mac/Windows.

Consider removing the early return or making it platform-specific:

 runsc_path = get_runsc_binary()
-if not runsc_path:
-    logger.warning(
-        "[GVISOR] gVisor runsc binary not found, falling back to regular subprocess (less secure)"
-    )
-    return _run_command_regular(
-        command=command,
-        working_dir=working_dir,
-        env=env,
-        timeout=timeout,
-    )
-
-logger.info(f"[GVISOR] gVisor available, using runsc at {runsc_path}")
+if runsc_path:
+    logger.info(f"[GVISOR] runsc binary found at {runsc_path}")
+else:
+    logger.info("[GVISOR] No local runsc binary, will attempt Docker with gVisor runtime")

Run this script to verify the intended behavior and check if get_runsc_path() returns None on Mac/Windows with Docker:


🏁 Script executed:

#!/bin/bash
# Check the implementation of get_runsc_path to understand its behavior across platforms

echo "=== Checking get_runsc_path implementation ==="
ast-grep --pattern 'def get_runsc_path($$$):
  $$$'

echo -e "\n=== Finding all usages of get_runsc_path ==="
rg -n "get_runsc_path" --type py -C 2

echo -e "\n=== Checking if runsc_path can be None in _run_with_docker_gvisor ==="
rg -n "runsc_path.*Optional" app/modules/utils/gvisor_runner.py -A 2 -B 2

Length of output: 1575


The search result is general shell scripting information, which is not what I need. Let me run scripts to directly examine the codebase.


🏁 Script executed:

#!/bin/bash
echo "=== get_runsc_path() implementation ==="
sed -n '274,305p' app/modules/utils/install_gvisor.py

echo -e "\n=== is_gvisor_available() full implementation ==="
sed -n '50,73p' app/modules/utils/gvisor_runner.py

echo -e "\n=== Code after line 240 (Docker path logic) ==="
sed -n '274,296p' app/modules/utils/gvisor_runner.py

Length of output: 2745


Early return breaks gVisor on Mac/Windows with Docker.

The early return at lines 230-240 prevents using Docker with gVisor on Mac/Windows. The issue is confirmed:

  1. get_runsc_path() only detects a local runsc binary (via install path or system PATH). On Mac/Windows, runsc is not available locally—it's provided by Docker's runtime.
  2. is_gvisor_available() on Mac/Windows only checks if Docker is available; it does NOT require a local runsc binary.
  3. The new early return gates the Docker+gVisor path (lines 274-296) on runsc_path being present, bypassing it when runsc_path is None.
  4. The Docker path at lines 274-296 is designed to handle runsc_path: Optional[Path], so it gracefully supports the None case.

On Mac/Windows with Docker Desktop, this causes a regression: the code falls back to regular subprocess instead of using the intended Docker+gVisor isolation.

Remove the early return or make it platform-specific:

 runsc_path = get_runsc_binary()
-if not runsc_path:
-    logger.warning(
-        "[GVISOR] gVisor runsc binary not found, falling back to regular subprocess (less secure)"
-    )
-    return _run_command_regular(
-        command=command,
-        working_dir=working_dir,
-        env=env,
-        timeout=timeout,
-    )
-
-logger.info(f"[GVISOR] gVisor available, using runsc at {runsc_path}")
+if runsc_path:
+    logger.info(f"[GVISOR] runsc binary found at {runsc_path}")
+else:
+    logger.info("[GVISOR] No local runsc binary, will attempt Docker with gVisor runtime")

pytest
pytest-asyncio
Pillow==10.3.0
python-multipart>=0.0.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Critical pre-existing security vulnerability in python-multipart.

The dependency python-multipart>=0.0.7 (line 78, not changed in this PR) has a HIGH severity DoS vulnerability (GHSA-59g5-xgcq-4qw3) in the multipart/form-data boundary handling. While this is not introduced by your PR, it should be escalated and fixed before merging to main.

Consider upgrading python-multipart to a patched version or investigate whether this vulnerability affects your use case. FastAPI depends on this for form parsing, so remediation is important.


🌐 Web query:

What is the latest patched version of python-multipart that fixes the DoS vulnerability GHSA-59g5-xgcq-4qw3?

💡 Result:

The vulnerability GHSA-59g5-xgcq-4qw3 is fixed in python-multipart version 0.0.18. [1][2]

Sources:
[1] GitHub Advisory for GHSA-59g5-xgcq-4qw3 (python-multipart).
[2] OSV / CVE-2024-53981 entry (notes fix in 0.0.18).



---

🏁 Script executed:

```shell
#!/bin/bash
# Check for lock files and version pinning
fd -type f \( -name "requirements*.txt" -o -name "poetry.lock" -o -name "setup.lock" -o -name "Pipfile*" -o -name "pyproject.toml" \) | head -20

# Also check the full requirements.txt context around line 78
echo "=== Full requirements.txt ===" 
cat requirements.txt | head -90 | tail -20

Length of output: 883



---

🏁 Script executed:

```shell
#!/bin/bash
# Search for lock files without using --type
find . -maxdepth 3 -type f \( -name "*.lock" -o -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "Pipfile" \) 2>/dev/null | sort

# Check if requirements.txt was modified in this PR
echo "=== Git status of requirements.txt ==="
git status requirements.txt 2>/dev/null || echo "Not a git repo or file not tracked"

# Check git diff for line 78
echo "=== Git diff around line 78 ==="
git diff requirements.txt 2>/dev/null | grep -A5 -B5 "python-multipart" || echo "No git diff available"

Length of output: 174


Upgrade python-multipart to fix HIGH severity DoS vulnerability GHSA-59g5-xgcq-4qw3.

The dependency python-multipart>=0.0.7 (line 78) contains a HIGH severity DoS vulnerability in multipart/form-data boundary handling. This vulnerability is fixed in python-multipart version 0.0.18. Update the requirement to python-multipart>=0.0.18 to ensure the patched version is installed.

🧰 Tools
🪛 OSV Scanner (2.2.4)

[HIGH] 78-78: python-multipart 0.0.7: Denial of service (DoS) via deformation multipart/form-data boundary

(GHSA-59g5-xgcq-4qw3)

🤖 Prompt for AI Agents
In requirements.txt around line 78, the pinned requirement
python-multipart>=0.0.7 has a HIGH severity DoS vulnerability
(GHSA-59g5-xgcq-4qw3); update the constraint to require the patched release by
changing it to python-multipart>=0.0.18 so installs will pick a fixed version,
then run your dependency install and tests (and regenerate any lockfile) to
ensure compatibility.

cd "$OMNISHARP_DIR" || exit 1

# Check if already downloaded
if [ -f "omnisharp.${EXT}" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use [[ instead of [ for conditional tests.

For safety and feature richness, replace all single-bracket [ conditionals with double-bracket [[. This prevents word splitting and glob expansion issues. Apply the following changes:

-    if [ -f "omnisharp.${EXT}" ]; then
+    if [[ -f "omnisharp.${EXT}" ]]; then
-        while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
+        while [[ $RETRY_COUNT -lt $MAX_RETRIES ]]; do
-    if [ "$EXT" = "tar.gz" ]; then
+    if [[ "$EXT" = "tar.gz" ]]; then
-        if command -v unzip &> /dev/null; then
+        if [[ command -v unzip &> /dev/null ]]; then

Replace all remaining [ ...] conditionals at lines 112, 190, 212, 223, 246, 247, and 250 with [[ ... ]].

Also applies to: 112-112, 185-185, 190-190, 212-212, 223-223, 246-246, 247-247, 250-250

🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis

[failure] 102-102: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=getmomentum_momentum-server&issues=AZqRonGawdMfc7aJIehl&open=AZqRonGawdMfc7aJIehl&pullRequest=492

🤖 Prompt for AI Agents
In scripts/install_lsp_servers.sh around lines 102, 112, 185, 190, 212, 223,
246, 247, and 250, the script uses single-bracket [ ... ] conditionals which can
suffer from word-splitting and globbing issues; replace each single-bracket
conditional with the double-bracket [[ ... ]] form (keeping the same test
expressions and spacing) to ensure safer evaluation and richer features (e.g.,
pattern matching), and verify any string comparisons or test operators remain
valid inside [[ ]] after the substitution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants