-
Notifications
You must be signed in to change notification settings - Fork 484
Feat/lsp servers #492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat/lsp servers #492
Conversation
- Fixed GitHub/GitBucket URL routing for multi-provider setup - GH_TOKEN_LIST tokens now always use github.com API - CODE_PROVIDER_BASE_URL only applies to CODE_PROVIDER_TOKEN - Implemented targeted 401 fallback for invalid credentials - Fixed rate limiting: 60 req/hr → 5,000 req/hr (authenticated) - Fixed OAuth token KeyError with safe dict access - Removed diagnostic endpoints (were temporary debugging tools) - Cleaned up DEBUG logging that exposed token details - Added comprehensive GitHub authentication documentation Core Authentication Flow: 1. Try GitHub App (if configured) 2. Try PAT pool from GH_TOKEN_LIST (if configured) 3. Try single PAT from CODE_PROVIDER_TOKEN (if configured) 4. Fall back to unauthenticated for public repos Rate Limiting: - Authenticated (PAT): 5,000 requests/hour - Unauthenticated: 60 requests/hour per IP Fallback Behavior: - Only 401 errors trigger unauthenticated fallback - 404 errors propagate (no redundant retry) - Prevents double API calls while handling invalid tokens Files Modified: - app/modules/code_provider/provider_factory.py (GitHub URL routing) - app/modules/code_provider/code_provider_service.py (401 fallback) - app/modules/code_provider/github/github_provider.py (auth logging) - app/modules/code_provider/github/github_service.py (OAuth safety) - app/modules/users/user_service.py (provider_info handling) - app/main.py (removed diagnostic router) - README.md (added auth documentation) - .env.template (documented auth variables) Files Deleted: - app/diagnostic_router.py (temporary debugging endpoints) Related Handoffs: - thoughts/shared/handoffs/general/2025-11-05_12-06-51_github-auth-rate-limit-fix.md - thoughts/shared/handoffs/general/2025-11-04_14-58-10_github-rate-limit-fix.md
WalkthroughThis PR introduces a comprehensive Language Server Protocol (LSP) integration framework for multi-language code analysis. It adds LSP session management (supporting Python/Jedi, C/C++/Clangd, C#/OmniSharp), compile-commands generation utilities, LSP-aware agent tools, repository manager integration for local repositories, email-based user resolution in GitHub operations, and infrastructure optimizations including API key caching. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as Delegate/Supervisor<br/>Agent
participant Mgr as LSP Server<br/>Manager
participant Sess as Language-Specific<br/>Session
participant Server as LSP Server<br/>(clangd/Jedi/etc)
Agent->>Mgr: execute_query(LspQueryRequest)<br/>(language, method, position)
Mgr->>Mgr: _get_or_create_session(project_id, language)
Mgr->>Sess: ensure_started()
Sess->>Server: initialize + wait_for_indexing
Server-->>Sess: ready
Sess-->>Mgr: session ready
Mgr->>Mgr: _build_params(request)
Mgr->>Sess: send_request(method, payload)
Sess->>Server: LSP request (definition/references/etc)
Server-->>Sess: raw LSP response
Sess-->>Mgr: raw response
Mgr->>Mgr: _normalize_* (locations/symbols/hover)
Mgr-->>Agent: LspQueryResponse<br/>(success, locations, symbols, hover)
Note over Sess,Server: Python: JediSession<br/>C/C++: ClangdSession<br/>C#: PyglsClientSession<br/>Others: PyglsClientSession
sequenceDiagram
participant PC as ParsingController
participant PH as ParsingHelper
participant PM as RepoManager
participant PS as ParsingService
participant LSP as LSP Server<br/>Manager
PC->>PC: parse_directory(repo_url, branch)
PC->>PH: setup_project_directory(repo_url, ...)
alt Local repo path
PH->>PH: extract_metadata(local_repo)
PH->>PM: copy_repo + create_worktree
PM-->>PH: worktree_path
else Remote repo
PH->>PH: download_tarball (with retry+backoff)
end
PH-->>PC: project_id, extract_dir
PC->>PS: parse_project(project_id)
PS->>PS: detect_languages (file-based + GitHub API)
PS->>LSP: index_workspace(project_id, languages)
LSP->>LSP: register_languages + _get_or_create_session
LSP-->>PS: indexing complete
PS-->>PC: status (INDEXED)
PC-->>PC: return submission status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
Areas requiring extra attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
deployment/stage/mom-api/Jenkinsfile_API (1)
48-60: Critical: Image name mismatch between build and push stages will cause pipeline failure.The Build stage creates an image tagged as
mom-api:${imageTag}(line 48), but the Push stage attempts to pushmom-api-ss:${imageTag}(line 60). This mismatch will cause the push to fail because themom-api-ssimage was never built. Additionally, the Deploy stage (line 106) referencesmom-api, notmom-api-ss, creating further inconsistency.Apply this diff to align the push target with the build target:
- sh "sudo docker push ${DOCKER_REGISTRY}/mom-api-ss:${imageTag}" + sh "sudo docker push ${DOCKER_REGISTRY}/mom-api:${imageTag}"deployment/stage/celery/Jenkinsfile_CELERY (1)
60-60: Critical: Image tag mismatch between build and push.Line 48 builds the image as
celery-api:${imageTag}, but this line attempts to pushcelery-api-ss:${imageTag}. The push will fail because the image doesn't exist with that tag.Apply this diff to fix the tag mismatch:
- sh "sudo docker push ${DOCKER_REGISTRY}/celery-api-ss:${imageTag}" + sh "sudo docker push ${DOCKER_REGISTRY}/celery-api:${imageTag}"
♻️ Duplicate comments (1)
app/modules/intelligence/agents/chat_agents/system_agents/code_gen_agent.py (1)
86-86: Same verification needed as in qna_agent.py.The bash_command tool is also added to the CodeGenAgent. Please ensure the same security measures apply here, especially since code generation workflows may involve executing build commands or dependency installations that could have broader system impact.
🧹 Nitpick comments (35)
app/modules/intelligence/tools/change_detection/change_detection_tool.py (1)
678-680: Good addition for observability.The logging statement improves debugging by showing how many functions changed and their identifiers. Consider truncating the identifiers list if it's very long to keep logs readable.
For example:
logging.info( - f"[CHANGE_DETECTION] Found {len(identifiers)} changed functions: {identifiers}" + f"[CHANGE_DETECTION] Found {len(identifiers)} changed functions: {identifiers[:10]}{'...' if len(identifiers) > 10 else ''}" )scripts/verify_gvisor_docker.sh (1)
7-30: Make verification failures visible via a non-zero exit codeRight now the script only prints messages and never
exits non-zero; the lastechomeans callers (or CI) will see a 0 status even if runsc/runtime are missing or the test container fails. Consider explicitly exiting with1when checks fail and optionally addingset -euo pipefailso this can be safely used in automation.scripts/install_lsp_servers.sh (1)
13-15: Clean up unused variables and tighten OmniSharp download signaling
PYTHON_BIN(Line 13) andDOWNLOAD_SUCCESS(Line 151) are never used, which triggers Shellcheck warnings and makes the retry logic harder to read. Either remove them or actually useDOWNLOAD_SUCCESSin the post-download check instead of relying solely on the presence ofomnisharp.${EXT}.For example, you could gate extraction on a positive
DOWNLOAD_SUCCESSflag and fall back to the existing archive-exists shortcut for manual/previous downloads.Also applies to: 101-163
scripts/install_gvisor_in_docker_vm.sh (1)
18-21: Guardcdcalls into/out of the temporary directoryShellcheck correctly points out the unguarded
cdoperations (Lines 20 and 84). Ifmktemp -dorcd -ever fail, subsequent commands run in an unexpected directory.You can harden this with minimal change:
-TMPDIR=$(mktemp -d) -cd "$TMPDIR" +TMPDIR=$(mktemp -d) +cd "$TMPDIR" || exit 1 @@ -cd - > /dev/null +cd - > /dev/null || exit 1Also applies to: 84-85
start.sh (1)
42-74: Consider making gVisor installation explicitly opt-in and use script exit codesThe new gVisor section always runs on startup:
python scripts/install_gvisor.pyis invoked every time after dependency install, and- On Mac/Windows,
scripts/install_gvisor_in_docker_vm.shis executed and success is inferred viagrep "runsc installed".For local dev this is fine, but in slower or offline environments (or CI) it can add unexpected latency and network dependency. Two adjustments worth considering:
- Gate these steps behind an env flag (e.g.
ENABLE_GVISOR=1) or a simple check in.envso users/CI can skip them cleanly.- In addition to grepping stdout, also check the helper script’s exit code so “success” is not tied solely to a printed string.
docs/gvisor_mac_setup.md (1)
20-31: Reference the concrete helper scripts to avoid doc/tooling driftThe doc mentions using a “helper script (if available)” for installing
runscin Docker Desktop’s VM, but doesn’t name the actual helpers introduced in this repo (e.g.scripts/setup_gvisor_docker.shandscripts/install_gvisor_in_docker_vm.sh). Explicitly referencing those scripts and their expected usage would make the Mac flow easier to follow and reduce the risk that the docs drift away from the implemented setup.Also applies to: 92-123
app/modules/utils/install_gvisor.py (1)
29-62: Short-circuit non-Linux hosts and narrow exception handling for clearer behaviorA couple of smaller points in the install flow:
Non-Linux hosts
get_architecturewarns whenplatform.system().lower() != "linux"but still returns an architecture (Lines 54‑60). On macOS/Windows, the downloaded Linuxrunscwill never successfully run, soinstall_gvisorwill always fail verification and leave a non-working binary in the install path. Consider returningNone(and havinginstall_gvisorfail fast) for non-Linux systems so the behavior is explicit and no dead binaries are left behind.Exception handling and logging
Several places (check_runsc_installed,download_file,verify_checksum,install_gvisor) catch bareExceptionand log generic errors. Narrowing these to expected exceptions where possible and usinglogging.exceptionfor unexpected ones would improve debuggability without changing semantics.These changes would make the install behavior on unsupported platforms more predictable and keep error reporting more actionable.
Also applies to: 199-216, 274-292
app/modules/repo_manager/repo_manager_interface.py (1)
12-220: IRepoManager interface aligns cleanly with RepoManager implementationThe abstract method set and parameter ordering match the concrete
RepoManagermethods shown inapp/modules/repo_manager/repo_manager.py, so this gives a clear, consistent contract for callers without changing behavior.If you want to tighten types later, consider introducing a TypedDict or pydantic model for the repo info dicts returned by
get_repo_info/list_available_reposinstead of usingDict[str, Any]everywhere.app/modules/projects/projects_service.py (3)
267-280: Project ID handling in by-id accessors looks consistentAllowing
project_id: str | intand deferring toget_project_by_id(which casts tostr(project_id)) keeps behavior consistent withProject.idbeing a text column while giving callers more flexibility. The async and sync variants also now return the same shape (includingrepo_path/user_id), which is good for downstream consumers.You might add an explicit return type like
-> dict[str, Any] | Noneto make the dict shape andNonepossibility clearer to callers.Also applies to: 282-297
299-317: ID casting in repo-details and by-id-and-user helpers is now robustUsing
Project.id == str(project_id)in bothget_project_repo_details_from_dbandget_project_from_db_by_id_and_user_idaligns with the newstr | intannotations and the underlying text PK, so lookups will work regardless of whether callers pass an int or string.For consistency, consider updating the type hints of other methods that still advertise
project_id: int(e.g.,get_repo_and_branch_name,update_project_status) to also acceptstr | int, since they ultimately pass through to these helpers.Also applies to: 331-346
348-350: Helper/query logic is correct; minor optimization possible inupdate_project
get_project_by_id’sstr(project_id)cast is consistent with the rest of the service and centralizes the PK handling. Inupdate_project, you first query for the project, then execute a separateupdatewith the same filter. That’s functionally fine but does one extra round-trip.You could simplify
update_projectto load the project once, mutate its attributes fromkwargs, and commit, or reuse the initial query for the update to avoid the duplicate filter.Also applies to: 371-379
app/modules/intelligence/tools/tool_service.py (2)
126-132: Be explicit about auto-injectinglsp_queryfromget_tools
get_toolsnow always appends thelsp_querytool to the returned list when configured, even if callers didn’t request it intool_names. That’s probably intentional (ensuring a base LSP tool is always present), but it does slightly change the contract ofget_tools.If you want to keep semantics strictly “only tools requested by name,” consider gating this behind an explicit flag or a separate helper (e.g.,
get_tools_with_lsp_base). Otherwise, a small comment in the docstring explaining this behavior would help future readers.
192-239: Clean up repeated# type: ignore[assignment]on tool registrationsSeveral tool registrations (
todo_tools,code_changes_tools,webpage_extractor,code_provider_*,web_search_tool) rely on# type: ignore[assignment]when assigning into thetools: Dict[str, StructuredTool]map. This works, but it hides type drift between the declared mapping type and the actual values.If feasible, consider tightening the types at the source (e.g., ensuring these helpers all return
StructuredTool) or relaxing the dict annotation (e.g.,Dict[str, StructuredTool]→Dict[str, StructuredTool] | Dict[str, Any]via a Protocol/wrapper) so these ignores are unnecessary.app/modules/parsing/graph_construction/parsing_controller.py (1)
279-351: Consider tightening_check_repo_indexedresponsibilities and error handlingThe
_check_repo_indexedhelper is doing a lot in one place: reading env, instantiatingRepoManager, hitting the DB again, detecting languages, and consulting the LSP server manager, with a broadexcept Exceptionthat turns any error into “not indexed.”This is reasonable as a safety net, but a couple of minor improvements would make behavior clearer:
- Treat
REPO_MANAGER_ENABLEDas a precondition (handled inparse_directoryas above) and drop the early env check here, so the function’s only concern is “given a repo/worktree, is it indexed?”- Narrow the exception handling to expected failure modes (e.g., filesystem/JSON/DB/LSP-related errors) or at least log a more explicit marker when an unexpected exception path is hit, so configuration bugs don’t silently degrade into infinite re-parsing attempts.
These are optional, but would make future debugging of LSP indexing issues easier.
test_gvisor.py (1)
1-5: Shebang vs. execution modeThere’s a shebang (
#!/usr/bin/env python3) but the file isn’t necessarily executable in the repo, which static analysis is flagging. If this script is meant to be run viapython test_gvisor.py, you can safely drop the shebang to keep Ruff happy:-#!/usr/bin/env python3 -""" -Test script for gVisor functionality. -Tests that gVisor detection and fallback work correctly on Mac/Windows. -""" +""" +Test script for gVisor functionality. +Tests that gVisor detection and fallback work correctly on Mac/Windows. +"""Alternatively, ensure the file is checked in as executable.
test_gvisor_docker.py (1)
1-12: Tighten project-root resolution and consider removing the shebang.
project_root = Path(__file__).parentonly adds the directory containing this file; if tests live under a subdirectory, imports likeapp.modules...may still fail. Consider resolving the actual repo root (e.g.,Path(__file__).resolve().parents[1]or relying on installed package paths instead of mutatingsys.path. Also, since this file is intended to be run via the Python interpreter (and not marked executable in the repo), the shebang is not strictly needed and can be dropped to satisfy EXE001.app/modules/intelligence/agents/chat_agents/tool_helpers.py (1)
273-279: Good structured output for LSP and bash tools; be mindful of content contracts.The new
bash_commandand LSP branches produce clear, truncated summaries (command + working dir, LSP project/language/target, status messages, locations/symbol counts, hover preview, stdout/stderr on failures). This is a solid UX. The only caveat is thatget_tool_result_info_contentassumes the LSP payloads includesuccess,method, and the documented fields—if upstream tooling ever omitssuccess, the code will treat it as an error and show “Unknown error”. If that’s acceptable, no changes needed; otherwise consider defaultingsuccesstoTruewhen method-specific data is present.Also applies to: 431-444, 558-594, 595-634
app/modules/intelligence/tools/code_query_tools/lsp_session.py (1)
15-45: Core LSP session abstractions are clean; consider minor cleanup.
SessionKeyandBaseLspServerSessiondefine a straightforward contract that matches the concrete implementations. You can optionally remove the unusedBaseModelimport or convertBaseLspServerSessioninto anABCwith@abstractmethoddecorators for stricter typing, but the current design is functionally fine.app/modules/parsing/graph_construction/parsing_service.py (2)
115-192: Language detection flow is solid but GitHub languages semantics and duplication are worth revisiting.The combination of file-based detection plus filtered GitHub
get_languages()results is a good approach, and preferringfile_based_languageforlanguagekeeps behavior predictable. Two nuances to double‑check:
repo.get_languages()(for GitHub repos) typically returns byte counts per language, not percentages; usingpercentage > 5is effectively “more than 5 bytes of code”. If you truly want a percentage threshold (e.g., >5% of the codebase), you’ll need to normalize by the total and adjust the variable naming.- The
supported_languagesset here duplicates the language set maintained bydetect_repo_language. If that list changes inparsing_helper, this code could silently diverge. Consider centralizing the allowed-language list (or exposing it fromParseHelper) instead of re‑declaring it.
193-274: LSP indexing is now a hard dependency for parsing when enabled; confirm that this failure mode is desired.The new
_index_workspace_with_lspintegration makes parsing fail (HTTP 500 + ERROR status + Slack notification) whenever:
REPO_MANAGER_ENABLEDis true,- a RepoManager exists and at least one language is registered with the LSP server manager,
- and any of those languages fail
index_workspace.That’s a strong coupling: transient LSP issues (e.g., misconfigured server, missing executable, indexing flakiness) will mark otherwise successfully parsed projects as failed. If the intent is “LSP is mandatory for supported languages”, this is fine; otherwise, you might want to:
- Treat indexing failures as a degraded-but-usable state (log and notify Slack, but still return success), or
- Gate
_index_workspace_with_lspbehind a separate feature flag or project-level setting.Also,
_index_workspace_with_lspinstantiatesLspQueryTool(self.db, "")solely to register servers; that’s reasonable, but it does rely on RepoManager being available there as well as viaself.parse_helper.repo_manager. If those ever diverge, parses will fail even though graph construction succeeded.Also applies to: 313-443
app/modules/intelligence/tools/code_query_tools/clangd_session.py (1)
29-77: Forwarding to PyglsClientSession looks correct; minor cleanup around unused state.The ClangdSession wrapper correctly delegates lifecycle and requests to
PyglsClientSession, relying on clangd’s own cache. Thecache_dirargument and the_initializedflag on this class are currently unused beyond assignments; you can either document them explicitly as legacy/compatibility artifacts or drop them to avoid confusion, since the underlying client already tracks initialization.app/modules/intelligence/agents/chat_agents/pydantic_multi_agent.py (1)
56-92: LSP tool wiring and validation look solid; consider small robustness tweaks.The LSP tool specs and
_create_lsp_method_toolsimplementation are correct and nicely validate position/query requirements, and the closure default arguments avoid the usual loop-binding bug. The integration into delegate/supervisor tools via_lsp_method_toolsis also clean.Two small optional improvements:
- In
__init__, consider stripping all whitespace (\s+) rather than just spaces when normalizing tool names for_structured_tool_map.- Optionally assert/log once at startup that an
lsp_queryStructuredToolis present inself.toolsso misconfiguration fails fast rather than only logging on first use.Also applies to: 338-345, 672-755, 767-770, 800-806
app/modules/parsing/graph_construction/parsing_helper.py (2)
40-53: RepoManager initialization is safe but catchingExceptionis broader than necessary.Lazy, env‑gated RepoManager setup with a warning on failure is a good pattern. You might narrow the
except ExceptiontoImportErrorand configuration‑related errors if you want to avoid masking unexpected bugs in initialization, but the current behavior is acceptable for best‑effort integration.
331-373: Tarball retry/backoff logic is correct but duplicated in two branches.The new connection‑retry loops for GitBucket Basic Auth and the generic header‑based path correctly:
- Retry on
ConnectTimeout/ConnectionErrorwith simple backoff.- Use extended read timeouts suitable for large archives.
- Preserve HTTP‑error semantics via
raise_for_statusand outerHTTPErrorhandling.If you want to reduce maintenance overhead, consider extracting the retrying
requests.get(...)+ backoff into a small helper so GitBucket and non‑GitBucket paths share the same implementation.Also applies to: 383-423, 447-457
app/modules/utils/gvisor_runner.py (3)
50-73: gVisor availability check works but is heavier than necessary.The platform‑aware
is_gvisor_availablelogic is sound, but two things to consider:
is_gvisor_availablecalls_check_docker_available()twice on Mac/Windows; you can store the result once to avoid redundant Docker invocations._check_docker_availablealways runs adocker run --runtime=runsc busybox ...test container, which is relatively expensive (and may trigger an image pull) if called frequently.You might want to:
- Cache the result of
_check_docker_availablefor some period, or- Make the “test container” run optional (e.g., gated by an env flag) and rely primarily on
docker inforuntime inspection for common paths.Also applies to: 332-395
182-239: Isolation orchestration is correct but_run_with_runsc_directnever actually usesrunsc.
run_command_isolatedcorrectly:
- Applies the
use_gvisorflag.- Falls back when gVisor/runsc is not available.
- Chooses between container‑aware and host execution paths.
However,
_run_with_runsc_directcurrently just logs and calls_run_command_regular, completely ignoring therunsc_pathargument. That means:
- In container/K8s and Linux‑host “direct runsc” paths, you never actually get an extra gVisor layer beyond the container itself.
This is logically consistent with the comments (OCI bundle complexity), but the function name and signature are misleading. Consider either:
- Implementing a minimal real
runscexecution path using OCI bundles, or- Renaming and removing the unused arguments (
repo_path,runsc_path) so it’s explicit that these branches intentionally fall back to non‑gVisor execution.Also, note that on any exception inside the gVisor branches, you fall back to
_run_command_regular, which may or may not be desirable from a security posture standpoint. You might want a configuration flag to “fail closed” instead of running unsandboxed if gVisor setup fails.Also applies to: 248-311, 533-573
75-157: Environment filtering and fallback subprocess execution are generally safe and well‑contained.The
_filter_safe_environment_variablesand_run_command_regularimplementations are thoughtfully defensive:
- Only a small safe set of env vars is always allowed; others are filtered out based on name patterns and value heuristics to reduce secret leakage.
- The fallback subprocess environment is a minimal, sanitized baseline (
PATH, fakeHOME,USER,SHELL,TERM), which is appropriate for sandboxed tooling.- Docker+gVisor runs mount working_dir/repo_path read‑only, disable network, and use a tiny image (
busybox), all of which are good isolation defaults.If you want to further tighten things:
- Consider making the safe env var allow‑list configurable (e.g., via settings) so operators can explicitly add non‑secret variables as needed.
- Optionally log when you drop env vars due to heuristic value checks at a lower level (e.g., debug) for easier troubleshooting.
Also applies to: 421-491, 576-657
app/modules/intelligence/tools/code_query_tools/jedi_session.py (2)
34-305: JediCache design is solid; consider a lightweight invalidation story for workspace symbols.The caching layer around Jedi is well done:
- Per‑file caches are keyed by a hash of the file contents plus line/column/operation, so they naturally invalidate on content changes.
- All Jedi calls are wrapped in
try/exceptwith logging, returning empty results on failure instead of blowing up the caller.- Hover formatting into a
contentsarray matches theHoverResult.from_lspexpectations.For workspace symbols,
index_workspaceshort‑circuits ifworkspace_symbols.jsonalready exists, which means the index never refreshes even if the project changes. If this becomes an issue, you could add:
- A timestamp in the cache file and a simple age threshold, or
- A checksum of the listed Python files to trigger reindexing when the set changes.
307-374: JediSession’s LSP method routing and threading model look correct.
JediSession.send_requestcorrectly maps:
DEFINITION→get_definitionsREFERENCES→get_referencesHOVER→get_hoverDOCUMENT_SYMBOL→get_document_symbolsWORKSPACE_SYMBOL→get_workspace_symbolsusing LSP’s 0‑based line/character positions and letting
JediCachehandle the conversion to Jedi’s 1‑based API. The executor‑basedindex_workspaceavoids blocking the event loop, which is good.Only minor nit: for unsupported methods you raise
ValueError(f"Unsupported method: {method}"); if this ever propagates out of the LSP orchestration layer, you might want a more domain‑specific exception type, but it’s not required.app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (1)
372-389: Propagate user_id into RepoManager calls for stricter isolation
RepoManager.get_repo_pathandupdate_last_accessedsupport auser_idfilter, but_execute_asyncdoes not pass it. You already validate the project’suser_idviaProjectService, so wiring that through keeps repo metadata usage consistent and avoids accidental cross‑user reuse if you ever support per‑user worktrees.For example:
- worktree_path = self.repo_manager.get_repo_path( - repo_name, branch=branch, commit_id=commit_id - ) + worktree_path = self.repo_manager.get_repo_path( + repo_name, + branch=branch, + commit_id=commit_id, + user_id=details.get("user_id"), + ) @@ - self.repo_manager.update_last_accessed( - repo_name, branch=branch, commit_id=commit_id - ) + self.repo_manager.update_last_accessed( + repo_name, + branch=branch, + commit_id=commit_id, + user_id=details.get("user_id"), + )This keeps the RepoManager’s user scoping aligned with your project scoping.
app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (1)
59-71: reset_lsp_server_manager leaves sessions running; consider cleanup hook
reset_lsp_server_managercurrently just drops the global reference and does not attempt to shut down active sessions, so any underlying language server processes (pygls, clangd, OmniSharp) will continue running until process exit.If this is used in tests or long‑running workers, you may want to:
- Add an async
shutdown_all()method onLspServerManagerthat iterates sessions and callsawait session.shutdown().- Provide a sync wrapper in
reset_lsp_server_managerthat usesasyncio.run()in non‑event‑loop contexts for test cleanup.At minimum, logging instead of silent
passin the inner try/except would help visibility when cleanup fails.app/modules/intelligence/tools/code_query_tools/clangd_helpers.py (1)
71-205: Subprocess flow for compile_commands generation looks soundThe helpers build
cmake/bear/compiledbcommands from fixed tokens, only varyingcwd(workspace_root), and handle failures by logging and falling back to other strategies orcompile_flags.txt. There’s no untrusted shell concatenation here, so the security lint warnings about subprocess usage are low‑risk in this context.If you want to tighten things further later, the main improvement would be narrowing some
except Exceptionblocks toOSError/subprocess.TimeoutExpiredand logging stderr more prominently, but the current behavior is acceptable.Also applies to: 207-258, 260-372
app/modules/repo_manager/repo_manager.py (1)
61-70: Consider sanitizing repo_name when constructing metadata pathsYou already sanitize branch/commit when building metadata filenames, but
_metadata_dirand_get_repo_local_pathusePath(repo_name)directly:def _metadata_dir(self, repo_name: str) -> Path: return self.metadata_base_path / Path(repo_name) def _get_repo_local_path(self, repo_name: str) -> Path: return self.repos_base_path / Path(repo_name)If
repo_nameever contained..or absolute paths (e.g., via bad upstream data), this could escape the intended.repos/.metatrees. If upstream guaranteesowner/repo‑style names, this is mostly theoretical, but it’s cheap to harden:- def _metadata_dir(self, repo_name: str) -> Path: - """Return the metadata directory for a given repository.""" - return self.metadata_base_path / Path(repo_name) + def _metadata_dir(self, repo_name: str) -> Path: + """Return the metadata directory for a given repository.""" + safe = Path(self._sanitize_for_filename(repo_name)) + return self.metadata_base_path / safe @@ - def _get_repo_local_path(self, repo_name: str) -> Path: - """Expose repository location for callers that already rely on it.""" - return self.repos_base_path / Path(repo_name) + def _get_repo_local_path(self, repo_name: str) -> Path: + """Expose repository location for callers that already rely on it.""" + safe = Path(self._sanitize_for_filename(repo_name)) + return self.repos_base_path / safeThis keeps metadata and repo directories safely contained even if repo_name is ever malformed.
Also applies to: 256-259
app/modules/code_provider/repo_manager_wrapper.py (2)
342-480: Heuristic branch/commit detection and stale docstringThe commit-vs-branch detection is a simple hex-length heuristic; that’s probably fine, but note it will treat any hex-like branch name (e.g.
deadbeef) as a commit. Also, the_get_worktree_pathdocstring still mentions “registered in Redis”, which no longer matches theIRepoManagerfilesystem-backed implementation and could confuse future maintainers.If you want to tighten things up:
- Clarify in the docstring that you’re consulting
IRepoManagermetadata/filesystem, not Redis.- Optionally expose branch vs commit explicitly at call sites (or add a small helper) instead of relying solely on the hex heuristic.
Also applies to: 601-623
481-545: Broadexcept Exceptionand silent passes make debugging harderSeveral places use bare
except Exception(and onetry/exceptwithpass) around filesystem and Git operations, then silently fall back. That’s reasonable for resiliency, but it also risks masking real bugs and makes triaging failures harder.Consider tightening this pattern, for example:
- Catch more specific exception types where feasible (e.g.,
OSError,GitCommandError).- Replace
logger.error(...);on failures withlogger.exception(...)to preserve stack traces.- In the inner
try/exceptthat currentlypasses when removing an invalid worktree, log at least a debug/warning message.Illustrative changes:
- except Exception: - # Worktree exists but is invalid, remove it - logger.warning(f"Invalid worktree at {worktree_dir}, removing") - try: - repo.git.worktree("remove", worktree_dir, force=True) - except Exception: - pass + except Exception: + # Worktree exists but is invalid, try to remove it + logger.warning("Invalid worktree at %s, removing", worktree_dir) + try: + repo.git.worktree("remove", worktree_dir, force=True) + except Exception as remove_exc: + logger.debug( + "Failed to remove invalid worktree at %s: %s", + worktree_dir, + remove_exc, + ) @@ - except Exception as e: - logger.error(f"Failed to create worktree for {ref}: {e}") + except Exception: + logger.exception("Failed to create worktree for %s", ref) @@ - except Exception as e: - logger.warning(f"Error building structure from {full_path}: {e}") + except Exception: + logger.exception("Error building structure from %s", full_path) @@ - except Exception as e: - logger.debug(f"Failed to update last accessed time: {e}") + except Exception: + logger.debug("Failed to update last accessed time", exc_info=True)This preserves your graceful fallbacks while making operational issues more observable.
Also applies to: 514-520, 541-543, 596-598, 624-625
| except HTTPException as http_exc: | ||
| raise http_exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use bare raise to preserve exception traceback.
When re-raising an exception, use bare raise instead of raise http_exc to preserve the original traceback, which aids debugging.
As per coding guidelines
Apply this diff:
except HTTPException as http_exc:
- raise http_exc
+ raise
except Exception as e:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except HTTPException as http_exc: | |
| raise http_exc | |
| except HTTPException as http_exc: | |
| raise |
🧰 Tools
🪛 Ruff (0.14.4)
171-171: Use raise without specifying exception name
Remove exception name
(TRY201)
🤖 Prompt for AI Agents
In app/modules/code_provider/code_provider_controller.py around lines 170-171,
the except block currently re-raises the caught HTTPException using "raise
http_exc", which loses the original traceback; replace that re-raise with a bare
"raise" so the original exception traceback is preserved for debugging and
logging consistency.
| def get_file_content( | ||
| self, | ||
| repo_name: str, | ||
| file_path: str, | ||
| ref: Optional[str] = None, | ||
| start_line: Optional[int] = None, | ||
| end_line: Optional[int] = None, | ||
| ) -> str: | ||
| """ | ||
| Get file content from local copy if available, otherwise fallback to provider. | ||
| Uses git worktree to access the correct branch/commit. | ||
| """ | ||
| # Try to get local copy path | ||
| worktree_path = self._get_worktree_path(repo_name, ref) | ||
| if worktree_path: | ||
| try: | ||
| # Update last accessed time | ||
| self._update_last_accessed(repo_name, ref) | ||
|
|
||
| # Read file from local filesystem | ||
| full_path = os.path.join(worktree_path, file_path) | ||
| if not os.path.exists(full_path): | ||
| logger.info( | ||
| f"[REPO_MANAGER] File {file_path} not found in local copy at {full_path}, " | ||
| f"falling back to provider API for {repo_name}@{ref}" | ||
| ) | ||
| logger.info( | ||
| f"[PROVIDER_API] Fetching file content: {repo_name}/{file_path}@{ref}" | ||
| ) | ||
| return self._provider.get_file_content( | ||
| repo_name, file_path, ref, start_line, end_line | ||
| ) | ||
|
|
||
| with open(full_path, "r", encoding="utf-8", errors="replace") as f: | ||
| content = f.read() | ||
|
|
||
| # Apply line range if specified | ||
| if start_line is not None or end_line is not None: | ||
| lines = content.split("\n") | ||
| start = (start_line - 1) if start_line is not None else 0 | ||
| end = end_line if end_line is not None else len(lines) | ||
| content = "\n".join(lines[start:end]) | ||
|
|
||
| logger.info( | ||
| f"[REPO_MANAGER] Retrieved file content from local copy: " | ||
| f"{repo_name}/{file_path}@{ref} (path: {full_path})" | ||
| ) | ||
| return content | ||
|
|
||
| except Exception as e: | ||
| logger.warning( | ||
| f"[REPO_MANAGER] Error reading file from local copy: {e}, " | ||
| f"falling back to provider API for {repo_name}/{file_path}@{ref}" | ||
| ) | ||
| logger.info( | ||
| f"[PROVIDER_API] Fetching file content: {repo_name}/{file_path}@{ref}" | ||
| ) | ||
| return self._provider.get_file_content( | ||
| repo_name, file_path, ref, start_line, end_line | ||
| ) | ||
|
|
||
| # Fallback to provider | ||
| logger.info( | ||
| f"[REPO_MANAGER] No local copy available for {repo_name}@{ref}, " | ||
| f"using provider API" | ||
| ) | ||
| logger.info( | ||
| f"[PROVIDER_API] Fetching file content: {repo_name}/{file_path}@{ref}" | ||
| ) | ||
| return self._provider.get_file_content( | ||
| repo_name, file_path, ref, start_line, end_line | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path traversal risk when joining file_path with worktree_path
get_file_content and _build_structure_from_filesystem join user-controlled file_path/path directly to the local repo root without normalizing or enforcing that the final path stays under the repository. A malicious or buggy caller could pass values like ../../etc/passwd or an absolute path and cause the server to read or list files outside the repo root, which is a serious security issue.
Recommend normalizing and enforcing containment before accessing the filesystem, e.g.:
@@ def get_file_content(
- # Read file from local filesystem
- full_path = os.path.join(worktree_path, file_path)
- if not os.path.exists(full_path):
+ # Resolve and validate file path under repo root to prevent traversal
+ base_path = os.path.realpath(worktree_path)
+ requested_path = os.path.realpath(
+ os.path.join(base_path, file_path)
+ )
+ # Ensure requested_path stays within the repository
+ if os.path.commonpath([base_path, requested_path]) != base_path:
+ logger.warning(
+ "[REPO_MANAGER] Unsafe file path %s for repo %s@%s; "
+ "falling back to provider API",
+ file_path,
+ repo_name,
+ ref,
+ )
+ logger.info(
+ "[PROVIDER_API] Fetching file content: %s/%s@%s",
+ repo_name,
+ file_path,
+ ref,
+ )
+ return self._provider.get_file_content(
+ repo_name, file_path, ref, start_line, end_line
+ )
+
+ full_path = requested_path
+ if not os.path.exists(full_path):
@@ def _build_structure_from_filesystem(
- structure = []
- full_path = os.path.join(base_path, path) if path else base_path
+ structure = []
+ base_real = os.path.realpath(base_path)
+ full_path = (
+ os.path.realpath(os.path.join(base_real, path)) if path else base_real
+ )
+
+ # Prevent traversal outside the repository root
+ if os.path.commonpath([base_real, full_path]) != base_real:
+ logger.warning(
+ "Unsafe path %s when building structure under %s; skipping",
+ path,
+ base_real,
+ )
+ return structureThis keeps local access semantics aligned with the remote provider’s expectation that paths are repository-relative, while protecting the host filesystem.
Also applies to: 546-595
🧰 Tools
🪛 Ruff (0.14.4)
262-262: Consider moving this statement to an else block
(TRY300)
264-264: Do not catch blind exception: Exception
(BLE001)
| def get_repository_structure( | ||
| self, | ||
| repo_name: str, | ||
| path: str = "", | ||
| ref: Optional[str] = None, | ||
| max_depth: int = 4, | ||
| ) -> List[Dict[str, Any]]: | ||
| """ | ||
| Get repository structure from local copy if available, otherwise fallback to provider. | ||
| Uses git worktree to access the correct branch/commit. | ||
| """ | ||
| # Try to get local copy path | ||
| worktree_path = self._get_worktree_path(repo_name, ref) | ||
| if worktree_path: | ||
| try: | ||
| # Update last accessed time | ||
| self._update_last_accessed(repo_name, ref) | ||
|
|
||
| # Build structure from local filesystem | ||
| structure = self._build_structure_from_filesystem( | ||
| worktree_path, path, max_depth | ||
| ) | ||
|
|
||
| logger.info( | ||
| f"[REPO_MANAGER] Retrieved repository structure from local copy: " | ||
| f"{repo_name}@{ref} (path: {worktree_path}, depth: {max_depth})" | ||
| ) | ||
| return structure | ||
|
|
||
| except Exception as e: | ||
| logger.warning( | ||
| f"[REPO_MANAGER] Error reading structure from local copy: {e}, " | ||
| f"falling back to provider API for {repo_name}@{ref}" | ||
| ) | ||
| logger.info( | ||
| f"[PROVIDER_API] Fetching repository structure: {repo_name}@{ref} (path: {path})" | ||
| ) | ||
| return self._provider.get_repository_structure( | ||
| repo_name, path, ref, max_depth | ||
| ) | ||
|
|
||
| # Fallback to provider | ||
| logger.info( | ||
| f"[REPO_MANAGER] No local copy available for {repo_name}@{ref}, " | ||
| f"using provider API" | ||
| ) | ||
| logger.info( | ||
| f"[PROVIDER_API] Fetching repository structure: {repo_name}@{ref} (path: {path})" | ||
| ) | ||
| return self._provider.get_repository_structure(repo_name, path, ref, max_depth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Repository-structure listing should reuse the same path-safety checks
Once you harden _build_structure_from_filesystem against traversal, get_repository_structure’s use of path will inherit that safety; just ensure you don’t bypass _build_structure_from_filesystem elsewhere and that callers treat path as repo-relative only.
🏁 Script executed:
#!/bin/bash
# Quick grep to ensure `_build_structure_from_filesystem` is the only place
# walking local directories for repo structure.
rg -n "_build_structure_from_filesystem" appLength of output: 357
🏁 Script executed:
#!/bin/bash
# Examine the _build_structure_from_filesystem implementation
# and how the path parameter is used
sed -n '546,650p' app/modules/code_provider/repo_manager_wrapper.pyLength of output: 2775
🏁 Script executed:
#!/bin/bash
# Also check the exact context around line 308 where the function is called
sed -n '300,320p' app/modules/code_provider/repo_manager_wrapper.pyLength of output: 845
🏁 Script executed:
#!/bin/bash
# Search for any other directory/filesystem walking patterns in the file
rg -n -i "listdir|walk|glob|Path|open.*r\)" app/modules/code_provider/repo_manager_wrapper.py | head -30Length of output: 1967
🏁 Script executed:
#!/bin/bash
# Search for any path validation or normalization logic
# that might handle the `path` parameter before it's used
rg -n -B5 -A5 "get_repository_structure" app/modules/code_provider/repo_manager_wrapper.py | head -50Length of output: 1719
🏁 Script executed:
#!/bin/bash
# Check if there are any normalization utilities used elsewhere
rg -n "normpath|realpath|abspath|sanitize|traversal" app/modules/code_provider/Length of output: 871
🏁 Script executed:
#!/bin/bash
# Look for any path validation in the provider interface or base class
find app -name "*.py" -type f -exec grep -l "get_repository_structure" {} \;Length of output: 323
🏁 Script executed:
#!/bin/bash
# Check the interface definition
cat app/modules/code_provider/base/code_provider_interface.py | grep -A20 "get_repository_structure"Length of output: 639
🏁 Script executed:
#!/bin/bash
# Check how local_provider handles path for repository structure
sed -n '1,50p' app/modules/code_provider/local_repo/local_provider.pyLength of output: 1620
🏁 Script executed:
#!/bin/bash
# Search for get_repository_structure implementation in local_provider
rg -n -A30 "def get_repository_structure" app/modules/code_provider/local_repo/local_provider.pyLength of output: 1101
🏁 Script executed:
#!/bin/bash
# Check if worktree paths are sandboxed or validated
rg -n -A10 "_get_worktree_path" app/modules/code_provider/repo_manager_wrapper.pyLength of output: 1461
Hardening required for path traversal protection
The current code is vulnerable to directory traversal. _build_structure_from_filesystem and similar methods construct os.path.join(base_path, path) without validating that the result stays within the repository boundary. A caller could use path="../../etc/passwd" to escape the repo.
Required fixes:
- Normalize and validate
pathparameter (e.g.,os.path.normpath(), then verify it's repo-relative) - Apply same validation to
file_pathinget_file_content(line 235) - Ensure the resolved full path is within the intended base directory (use
os.path.realpath()and check containment)
Lines 308–309, 235 (and recursion at line 584).
🧰 Tools
🪛 Ruff (0.14.4)
316-316: Consider moving this statement to an else block
(TRY300)
318-318: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In app/modules/code_provider/repo_manager_wrapper.py around lines 288–338 (and
also address line 235 and recursive calls around line 584), the code joins
user-supplied paths to the repo worktree without validation, allowing directory
traversal (e.g., "../../etc"). Normalize and validate the incoming path before
using it: call os.path.normpath(path) and reject or strip paths that resolve to
parent references (leading ".." segments); then compute the absolute real path
for the target (os.path.realpath(os.path.join(worktree_path, normalized_path)))
and ensure it is contained within the worktree (use os.path.commonpath or
os.path.commonprefix / commonpath(worktree_path, resolved_target) comparison) —
if containment fails, raise a ValueError or return an appropriate error. Apply
the same validation logic in get_file_content (line ~235), in
_build_structure_from_filesystem and any recursive descent (line ~584), so every
place that forms full filesystem paths enforces normalization + realpath
containment checks before reading or listing files.
| def _validate_command_safety(command: str) -> tuple[bool, Optional[str]]: | ||
| """ | ||
| Validate that a command is safe (read-only) and doesn't attempt write operations. | ||
| Args: | ||
| command: The command string to validate | ||
| Returns: | ||
| Tuple of (is_safe, error_message) | ||
| is_safe: True if command is safe, False otherwise | ||
| error_message: Error message if command is unsafe, None if safe | ||
| """ | ||
| command_lower = command.lower().strip() | ||
|
|
||
| # Check for write operation patterns (redirection, in-place editing) | ||
| for pattern in DANGEROUS_PATTERNS: | ||
| if pattern in command_lower: | ||
| return ( | ||
| False, | ||
| f"Command contains write operation pattern '{pattern}'. Write operations are not allowed.", | ||
| ) | ||
|
|
||
| # Check for command injection patterns | ||
| for pattern in INJECTION_PATTERNS: | ||
| if pattern in command: | ||
| return ( | ||
| False, | ||
| f"Command contains injection pattern '{pattern}'. Command chaining/substitution is not allowed.", | ||
| ) | ||
|
|
||
| # Check if command starts with an always-blocked command | ||
| first_word = command_lower.split()[0] if command_lower.split() else "" | ||
| if first_word in ALWAYS_BLOCKED_COMMANDS: | ||
| return ( | ||
| False, | ||
| f"Command '{first_word}' is not allowed. This tool only supports read-only operations.", | ||
| ) | ||
|
|
||
| # Check for write-blocked commands with dangerous flags | ||
| for cmd, dangerous_flags in WRITE_BLOCKED_COMMANDS.items(): | ||
| if command_lower.startswith(cmd): | ||
| for flag in dangerous_flags: | ||
| if flag in command_lower: | ||
| return ( | ||
| False, | ||
| f"Command '{cmd}' with flag '{flag}' is not allowed. This would modify files.", | ||
| ) | ||
|
|
||
| # Block environment variable access that might expose secrets | ||
| if command_lower.strip() == "env": | ||
| return ( | ||
| False, | ||
| "The 'env' command is blocked to prevent exposure of sensitive environment variables.", | ||
| ) | ||
|
|
||
| # Block commands that try to access parent directories | ||
| if "../" in command or ".." in command.split(): | ||
| return ( | ||
| False, | ||
| "Accessing parent directories is not allowed for security reasons.", | ||
| ) | ||
|
|
||
| return (True, None) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Interpreter bypass of “read‑only” enforcement; tighten safety model
The current safety checks are easy to circumvent with an interpreter while still passing _validate_command_safety. For example:
python -c "open('some_file', 'w').write('x')"bash -c 'rm -rf .'(no;,&&,||, redirection, orenvusage)
These will:
- Not hit
ALWAYS_BLOCKED_COMMANDS(first token ispython/bash/sh, which are allowed). - Not contain any of the configured
DANGEROUS_PATTERNSorINJECTION_PATTERNS. - Be executed directly in the repo worktree (and possibly without read‑only mounts if
run_command_isolatedever falls back from gVisor).
To keep the “read‑only bash” guarantee realistic, I recommend:
- Blocking generic shells and interpreters by default (unless you have a very constrained use case):
ALREADY_BLOCKED_COMMANDS = {
@@
- "ssh",
+ "ssh",
@@
- "sudo",
- "su",
+ "sudo",
+ "su",
+ "bash",
+ "sh",
+ "zsh",
+ "python",
+ "python3",
+ "node",
+ "ruby",
+ "perl",
}-
Optionally switching to an allow‑list model for first tokens (e.g. only
grep,find,rg,ls,cat,head,tail,awk,sedwith safe flags, etc.) and rejecting any other base command. -
Verifying
run_command_isolatedsemantics to ensure the repo path is truly mounted read‑only even in fallback paths. If that helper only treatsrepo_pathas read‑only, you may want to passrepo_path=worktree_pathand useworking_dirpurely as the in‑container CWD.
- result: CommandResult = run_command_isolated(
- command=final_command,
- working_dir=worktree_path, # Mount only the worktree root (as read-only)
- repo_path=None, # Don't mount separately since working_dir is the repo
+ result: CommandResult = run_command_isolated(
+ command=final_command,
+ working_dir=worktree_path,
+ repo_path=worktree_path,
env=safe_env,
timeout=30,
use_gvisor=True,
)Given how central this tool is and that it’s exposed to agents, I’d treat closing this interpreter bypass as a priority.
Also applies to: 282-445
🏁 Script executed:
#!/bin/bash
# Inspect gvisor_runner to confirm how read-only mounts are configured.
set -euo pipefail
echo "=== Searching for run_command_isolated implementation and mount semantics ==="
rg -n "def run_command_isolated" -C5
rg -n "repo_path" -C3 app/modules/utils/gvisor_runner.py || trueLength of output: 4361
Proceeding with verification of the mount semantics and current usage pattern:Now let me inspect the specific mount configurations and the bash_command_tool usage:
Let me now inspect the full mount logic and the actual bash_command_tool usage:Now let me run one more search to confirm the gvisor mount specifications:
Based on my verification, the original review comment contains valid security concerns. Let me now generate the rewritten review:
Interpreter bypass of read-only enforcement; tighten safety model
The current safety checks can be circumvented by invoking interpreters (python, bash, sh, etc.) while still passing _validate_command_safety. For example:
python -c "open('some_file', 'w').write('x')"bash -c 'rm -rf .'
These commands bypass all configured DANGEROUS_PATTERNS and INJECTION_PATTERNS since the first token is an allowed interpreter, not a dangerous command.
Additionally, bind mounts allow containers to access the bound host filesystem tree, and changes to bind mounts are always propagated to the host and persist after the container exits. The current implementation passes repo_path=None to run_command_isolated, leaving only working_dir mounted. Without explicit read-only enforcement at the mount level, the isolation depends entirely on gVisor's mitigation, which should not be the sole defense.
To strengthen the read-only guarantee:
- Block generic shells and interpreters by default (or switch to an allow-list model for safe commands only):
ALWAYS_BLOCKED_COMMANDS = {
"ssh",
"sudo",
"su",
+ "bash",
+ "sh",
+ "zsh",
+ "python",
+ "python3",
+ "node",
+ "ruby",
+ "perl",
}- Ensure repo path is always mounted read-only by passing both
working_dirandrepo_pathtorun_command_isolated:
- result: CommandResult = run_command_isolated(
- command=final_command,
- working_dir=worktree_path,
- repo_path=None,
+ result: CommandResult = run_command_isolated(
+ command=final_command,
+ working_dir=worktree_path,
+ repo_path=worktree_path,
env=safe_env,
timeout=30,
use_gvisor=True,
)This ensures the directory is mounted with read-only protection (:ro flag) even if gVisor isolation is unavailable or misconfigured. Given how central this tool is, I'd treat closing this interpreter bypass as a priority.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In app/modules/intelligence/tools/code_query_tools/bash_command_tool.py around
lines 75 to 138, the current safety validator can be bypassed by launching
interpreters/shells (python, bash, sh, etc.) which then perform write
operations; also the runtime mounts may not enforce read-only on the repo since
repo_path is passed as None. Fix by adding interpreter names to
ALWAYS_BLOCKED_COMMANDS (or invert to an explicit allow-list of safe commands)
and detect interpreter invocations (e.g., "python -c", "sh -c", "bash -c") in
_validate_command_safety to reject them; and change the call site that invokes
run_command_isolated to always pass repo_path (and ensure it is mounted with a
read-only flag) so the repo is mounted as read-only even if gVisor isolation
fails.
| if repo_path is not None: | ||
| # Local repository detected - return the path directly without downloading tarball | ||
| logger.info(f"ParsingHelper: Using local repository at {repo_path}") | ||
| return repo_path, project_id | ||
| # Local repository detected - use the path directly without downloading tarball | ||
| repo_path_str = str(repo_path) if repo_path else None | ||
| logger.info(f"ParsingHelper: Using local repository at {repo_path_str}") | ||
|
|
||
| local_repo = None | ||
| latest_commit_sha = None | ||
| repo_metadata = {} | ||
|
|
||
| # Get commit SHA for local repo | ||
| try: | ||
| local_repo = Repo(repo_path_str) | ||
| if commit_id: | ||
| latest_commit_sha = commit_id | ||
| elif branch: | ||
| try: | ||
| local_repo.git.checkout(branch) | ||
| latest_commit_sha = local_repo.head.commit.hexsha | ||
| except GitCommandError: | ||
| # Branch might not exist, use current HEAD | ||
| latest_commit_sha = local_repo.head.commit.hexsha | ||
| else: | ||
| latest_commit_sha = local_repo.head.commit.hexsha | ||
| except Exception as e: | ||
| logger.warning(f"Could not get commit SHA for local repo: {e}") | ||
|
|
||
| # Extract metadata for local repo | ||
| if local_repo: | ||
| try: | ||
| repo_metadata = ParseHelper.extract_repository_metadata(local_repo) | ||
| repo_metadata["error_message"] = None | ||
| project_metadata = json.dumps(repo_metadata).encode("utf-8") | ||
| ProjectService.update_project( | ||
| self.db, | ||
| project_id, | ||
| properties=project_metadata, | ||
| commit_id=latest_commit_sha, | ||
| status=ProjectStatusEnum.CLONED.value, | ||
| ) | ||
| except Exception as e: | ||
| logger.warning(f"Could not extract metadata for local repo: {e}") | ||
|
|
||
| # Copy local repo to .repos if repo manager is enabled | ||
| if self.repo_manager and repo_path_str and os.path.exists(repo_path_str): | ||
| try: | ||
| await self._copy_repo_to_repo_manager( | ||
| normalized_full_name, | ||
| repo_path_str, | ||
| branch, | ||
| latest_commit_sha, | ||
| user_id, | ||
| repo_metadata, | ||
| ) | ||
| except Exception as e: | ||
| logger.warning( | ||
| f"Failed to copy local repo to repo manager: {e}. Continuing with parsing." | ||
| ) | ||
|
|
||
| return repo_path_str, project_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Local‑repo handling now mutates the caller’s repo via git checkout; this is risky.
The new local‑repo path in setup_project_directory plus _copy_repo_to_repo_manager / worktree helpers are generally well‑structured and integrate cleanly with RepoManager. However, one behavior is risky:
- In the local‑repo branch you do:
This actively checks out
local_repo = Repo(repo_path_str) if commit_id: latest_commit_sha = commit_id elif branch: try: local_repo.git.checkout(branch) latest_commit_sha = local_repo.head.commit.hexsha except GitCommandError: latest_commit_sha = local_repo.head.commit.hexsha
branchin the caller’s working tree, potentially changing their local repo state (current branch, working‑tree files) just to compute a commit SHA and metadata.
For developer‑owned paths (e.g., a user pointing at their local clone), this side effect can be surprising or destructive.
Consider instead:
- Deriving the commit without changing HEAD, e.g.:
if commit_id: latest_commit_sha = commit_id elif branch: try: latest_commit_sha = local_repo.git.rev_parse(branch).strip() except GitCommandError: latest_commit_sha = local_repo.head.commit.hexsha else: latest_commit_sha = local_repo.head.commit.hexsha
- And not calling
git.checkouton the local clone at all.
The rest of the RepoManager mirroring (_initialize_base_repo, _create_worktree) operates on copies under .repos, so you already have a safe place to materialize specific branches/commits without touching the original working tree.
Also applies to: 1062-1077, 1080-1295
---
🏁 Script executed:
```shell
#!/bin/bash
# Extract the full context around lines 929-987, 1062-1077, 1080-1295
cat app/modules/parsing/graph_construction/parsing_helper.py | sed -n '900,1300p' | head -400
Length of output: 16744
🏁 Script executed:
#!/bin/bash
# Search for all git.checkout calls in the file
rg -n 'git\.checkout|Repo\(' app/modules/parsing/graph_construction/parsing_helper.py -A 2 -B 2Length of output: 1679
🏁 Script executed:
#!/bin/bash
# Look for _copy_repo_to_repo_manager and related worktree helpers
rg -n '_copy_repo_to_repo_manager|_initialize_base_repo|_create_worktree' app/modules/parsing/graph_construction/parsing_helper.py -A 5 -B 2Length of output: 2104
🏁 Script executed:
#!/bin/bash
# Check imports to understand GitPython usage
head -50 app/modules/parsing/graph_construction/parsing_helper.py | grep -E 'import|from'Length of output: 766
🏁 Script executed:
#!/bin/bash
# Find if local_repo is used elsewhere and how it's initialized
rg -n 'Repo\(repo_path' app/modules/parsing/graph_construction/parsing_helper.py -A 3 -B 1Length of output: 194
---
🌐 Web query:
```web
GitPython git.rev_parse method usage for branch commit SHA
💡 Result:
Short answer — use Repo.rev_parse(...) and read the Commit object's hexsha. Example:
-
Resolve a branch name to the commit SHA (full 40-char hex):
repo = git.Repo("path/to/repo")
sha = repo.rev_parse("branch-name").hexsha -
You can also pass explicit ref names:
sha = repo.rev_parse("refs/heads/branch-name").hexsha
sha = repo.rev_parse("origin/branch-name").hexsha # remote ref
Notes:
- rev_parse returns a git object (Commit, Tag, Tree, etc.); .hexsha gives the commit id. [1]
- rev_parse will raise gitdb.exc.BadObject / ValueError if the name is invalid, so catch exceptions if needed. [1][2]
References:
[1] GitPython API — Repo.rev_parse / return types.
[2] Example usage in projects / Q&A showing .rev_parse(...).hexsha usage.
Replace git.checkout with git.rev_parse to avoid mutating the caller's repository.
The code at line 945 directly mutates the caller's working tree via local_repo.git.checkout(branch). This side effect is risky for developer-owned paths and occurs before the safe mirroring to .repos.
Instead, use:
if commit_id:
latest_commit_sha = commit_id
elif branch:
try:
latest_commit_sha = local_repo.rev_parse(branch).hexsha
except Exception:
latest_commit_sha = local_repo.head.commit.hexsha
else:
latest_commit_sha = local_repo.head.commit.hexshaThis same pattern applies to the remote repository case at lines 994 and 997, where repo_details.git.checkout() should also be replaced with repo_details.rev_parse() to avoid unnecessary HEAD mutation.
🧰 Tools
🪛 Ruff (0.14.4)
952-952: Do not catch blind exception: Exception
(BLE001)
968-968: Do not catch blind exception: Exception
(BLE001)
982-982: Do not catch blind exception: Exception
(BLE001)
| def verify_checksum(file_path: Path, checksum_url: str) -> bool: | ||
| """ | ||
| Verify file checksum. | ||
| Args: | ||
| file_path: Path to the file to verify | ||
| checksum_url: URL to the checksum file | ||
| Returns: | ||
| True if checksum matches, False otherwise | ||
| """ | ||
| try: | ||
| # Download checksum file | ||
| checksum_path = file_path.parent / f"{file_path.name}.sha512" | ||
| if not download_file(checksum_url, checksum_path): | ||
| logger.warning("Failed to download checksum, skipping verification") | ||
| return True # Continue anyway | ||
|
|
||
| # Read expected checksum | ||
| with open(checksum_path, "r") as f: | ||
| expected_checksum = f.read().split()[0] | ||
|
|
||
| # Calculate actual checksum | ||
| import hashlib | ||
|
|
||
| sha512 = hashlib.sha512() | ||
| with open(file_path, "rb") as f: | ||
| for chunk in iter(lambda: f.read(4096), b""): | ||
| sha512.update(chunk) | ||
| actual_checksum = sha512.hexdigest() | ||
|
|
||
| # Clean up checksum file | ||
| checksum_path.unlink() | ||
|
|
||
| if expected_checksum == actual_checksum: | ||
| logger.info("Checksum verification passed") | ||
| return True | ||
| else: | ||
| logger.error("Checksum verification failed") | ||
| return False | ||
|
|
||
| except Exception as e: | ||
| logger.warning(f"Error verifying checksum: {e}, continuing anyway") | ||
| return True # Continue anyway | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Checksum handling should fail closed instead of silently accepting unverified binaries
verify_checksum currently treats several failure modes as “OK”:
- If the checksum download fails (
download_filereturns False), it logs a warning and returnsTrue(Lines 167‑170). - If any exception is raised during verification, it logs a warning and returns
True(Lines 194‑197).
Given this script installs a security boundary binary (runsc) that will be used to isolate untrusted code, silently proceeding without a valid checksum undermines the integrity check. It’s safer to fail the installation and let the rest of the system fall back to “no gVisor available” than to run with a potentially corrupted or tampered binary.
Consider treating checksum download/verification failures as hard failures:
- if not download_file(checksum_url, checksum_path):
- logger.warning("Failed to download checksum, skipping verification")
- return True # Continue anyway
+ if not download_file(checksum_url, checksum_path):
+ logger.warning("Failed to download checksum; aborting installation")
+ return False
@@
- except Exception as e:
- logger.warning(f"Error verifying checksum: {e}, continuing anyway")
- return True # Continue anyway
+ except Exception as e:
+ logger.warning(f"Error verifying checksum: {e}; aborting installation")
+ return FalseThis way install_gvisor will bail out cleanly on any integrity issues and keep runsc disabled instead of silently accepting an unverified binary.
Also applies to: 199-247
I'll verify the review comment by examining the actual code and its context.
---
🏁 Script executed:
```shell
#!/bin/bash
# Check the actual code at lines 153-197 in install_gvisor.py
cat app/modules/utils/install_gvisor.py | head -200 | tail -50
Length of output: 1470
🏁 Script executed:
#!/bin/bash
# Find all usages of verify_checksum function
rg -n "verify_checksum" app/modules/utils/install_gvisor.py -B2 -A2Length of output: 315
🏁 Script executed:
#!/bin/bash
# Check install_gvisor function to see how it handles the return value
rg -n "def install_gvisor" app/modules/utils/install_gvisor.py -A50Length of output: 1773
🏁 Script executed:
#!/bin/bash
# Look for any fallback behavior or error handling around gvisor
rg -n "gVisor\|runsc" app/modules/utils/install_gvisor.py -B1 -A1Length of output: 0
---
🏁 Script executed:
```shell
#!/bin/bash
# Get full install_gvisor function (lines 199 onwards)
rg -n "def install_gvisor" app/modules/utils/install_gvisor.py -A80
Length of output: 2781
🏁 Script executed:
#!/bin/bash
# Find callers of install_gvisor
rg -n "install_gvisor\(" --type py -B3 -A3Length of output: 1255
🏁 Script executed:
#!/bin/bash
# Check what happens after verify_checksum succeeds - lines 248-260
sed -n '248,265p' app/modules/utils/install_gvisor.pyLength of output: 559
Checksum handling should fail closed instead of silently accepting unverified binaries
verify_checksum currently treats several failure modes as "OK":
- If the checksum download fails (
download_filereturns False), it logs a warning and returnsTrue(Lines 167‑170). - If any exception is raised during verification, it logs a warning and returns
True(Lines 194‑197).
When verify_checksum returns True (including these failure cases), install_gvisor proceeds to chmod the binary to 0o755, move it to its final location, and return success (Lines 249‑262). This means an unverified binary is installed and becomes executable.
Given that runsc is a security boundary binary used to isolate untrusted code, silently accepting an unverified binary undermines the integrity check and creates a security risk. It is safer to fail the installation and allow the system to operate without gVisor than to run with a potentially corrupted or tampered binary.
Consider treating checksum download/verification failures as hard failures:
- if not download_file(checksum_url, checksum_path):
- logger.warning("Failed to download checksum, skipping verification")
- return True # Continue anyway
+ if not download_file(checksum_url, checksum_path):
+ logger.warning("Failed to download checksum; aborting installation")
+ return False
@@
- except Exception as e:
- logger.warning(f"Error verifying checksum: {e}, continuing anyway")
- return True # Continue anyway
+ except Exception as e:
+ logger.warning(f"Error verifying checksum: {e}; aborting installation")
+ return FalseThis way install_gvisor will cleanly return False on any integrity issues (Lines 244‑247), prevent the binary from being installed, and allow the system to fall back to operating without gVisor instead of with a potentially compromised isolation mechanism.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def verify_checksum(file_path: Path, checksum_url: str) -> bool: | |
| """ | |
| Verify file checksum. | |
| Args: | |
| file_path: Path to the file to verify | |
| checksum_url: URL to the checksum file | |
| Returns: | |
| True if checksum matches, False otherwise | |
| """ | |
| try: | |
| # Download checksum file | |
| checksum_path = file_path.parent / f"{file_path.name}.sha512" | |
| if not download_file(checksum_url, checksum_path): | |
| logger.warning("Failed to download checksum, skipping verification") | |
| return True # Continue anyway | |
| # Read expected checksum | |
| with open(checksum_path, "r") as f: | |
| expected_checksum = f.read().split()[0] | |
| # Calculate actual checksum | |
| import hashlib | |
| sha512 = hashlib.sha512() | |
| with open(file_path, "rb") as f: | |
| for chunk in iter(lambda: f.read(4096), b""): | |
| sha512.update(chunk) | |
| actual_checksum = sha512.hexdigest() | |
| # Clean up checksum file | |
| checksum_path.unlink() | |
| if expected_checksum == actual_checksum: | |
| logger.info("Checksum verification passed") | |
| return True | |
| else: | |
| logger.error("Checksum verification failed") | |
| return False | |
| except Exception as e: | |
| logger.warning(f"Error verifying checksum: {e}, continuing anyway") | |
| return True # Continue anyway | |
| def verify_checksum(file_path: Path, checksum_url: str) -> bool: | |
| """ | |
| Verify file checksum. | |
| Args: | |
| file_path: Path to the file to verify | |
| checksum_url: URL to the checksum file | |
| Returns: | |
| True if checksum matches, False otherwise | |
| """ | |
| try: | |
| # Download checksum file | |
| checksum_path = file_path.parent / f"{file_path.name}.sha512" | |
| if not download_file(checksum_url, checksum_path): | |
| logger.warning("Failed to download checksum; aborting installation") | |
| return False | |
| # Read expected checksum | |
| with open(checksum_path, "r") as f: | |
| expected_checksum = f.read().split()[0] | |
| # Calculate actual checksum | |
| import hashlib | |
| sha512 = hashlib.sha512() | |
| with open(file_path, "rb") as f: | |
| for chunk in iter(lambda: f.read(4096), b""): | |
| sha512.update(chunk) | |
| actual_checksum = sha512.hexdigest() | |
| # Clean up checksum file | |
| checksum_path.unlink() | |
| if expected_checksum == actual_checksum: | |
| logger.info("Checksum verification passed") | |
| return True | |
| else: | |
| logger.error("Checksum verification failed") | |
| return False | |
| except Exception as e: | |
| logger.warning(f"Error verifying checksum: {e}; aborting installation") | |
| return False |
🧰 Tools
🪛 Ruff (0.14.4)
194-194: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In app/modules/utils/install_gvisor.py around lines 153 to 197, the checksum
routine currently treats download failures and exceptions as success (returning
True) which allows installing unverified binaries; change it to fail closed by
returning False on checksum download failure and on any exception, log an
error-level message with context, and ensure cleanup of the temporary checksum
file only when present; in short: if download_file(...) returns False ->
logger.error(...) and return False; if any exception occurs -> logger.error(...)
and return False; keep successful verification flow unchanged.
| ```python | ||
| from app.modules.utils.gvisor_runner import run_command_isolated | ||
|
|
||
| result = run_command_isolated( | ||
| command=["ls", "-la"], | ||
| working_dir="/path/to/repo", | ||
| repo_path="/.repos/repo-name" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify repo_path example to better match typical .repos layout
The example uses repo_path="/.repos/repo-name", which suggests a top-level filesystem directory and may not match where your repos are actually stored (often something like /path/to/project/.repos/repo-name or whatever RepoManager config uses).
Consider tweaking the example to make that clearer, e.g.:
- working_dir="/path/to/repo",
- repo_path="/.repos/repo-name"
+ working_dir="/path/to/repo",
+ repo_path="/path/to/project/.repos/repo-name"or add a short note that repo_path should be the real host path to the checked-out repo/worktree.
🤖 Prompt for AI Agents
In docs/gvisor_quickstart.md around lines 30 to 37, the repo_path example points
to a misleading top-level path ("/.repos/repo-name"); update the example to show
a realistic host filesystem path that includes the project root (e.g.
"/path/to/project/.repos/repo-name") or add a one-line note stating that
repo_path must be the actual host path to the checked-out repo/worktree
(matching RepoManager config).
| # Create temporary directory | ||
| TMPDIR=$(mktemp -d) | ||
| cd "$TMPDIR" | ||
|
|
||
| echo "Downloading gVisor runsc..." | ||
| URL="https://storage.googleapis.com/gvisor/releases/release/latest/${ARCH}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Avoid hard failure on best-effort Docker VM copy and be careful overwriting daemon.json
Two things to tighten up under set -e:
-
The best-effort container copy can currently abort the script:
docker run ... gvisor-installer ... || true(Line 82) is guarded, butdocker cp runsc gvisor-installer:/usr/local/bin/runsc(Line 85) is not.- If the container fails to start (image pull error, no network, etc.),
docker cpwill fail and terminate the script before the local~/.local/bin/runscinstall runs. - Add
|| true(or a container-exists check) to thedocker cpcall to keep this path best-effort.
-
Overwriting Docker’s daemon configuration is potentially disruptive:
- The
cat > "$DAEMON_JSON"block (Lines 128‑137) replaces any existing settings with a runtimes-only JSON, which can silently drop other configuration (e.g., registries, proxies). - You do create a backup, but it’s easy for users to miss this side effect. Consider either:
- Merging in the
runscruntime into existing JSON (e.g., viajq), or - At minimum, documenting clearly that existing Docker Engine settings will be replaced and how to restore from the backup.
- Merging in the
- The
Also consider using cd "$TMPDIR" || exit 1 and cd - > /dev/null || exit 1 to satisfy Shellcheck and handle unexpected cd failures.
- cd "$TMPDIR"
+ cd "$TMPDIR" || exit 1
@@
- docker cp runsc gvisor-installer:/usr/local/bin/runsc
+ docker cp runsc gvisor-installer:/usr/local/bin/runsc || true
@@
-cd - > /dev/null
+cd - > /dev/null || exit 1Also applies to: 81-99, 118-137, 160-162
🏁 Script executed:
#!/bin/bash
# Retrieve the setup_gvisor_docker.sh script and check for set -e, docker commands, and daemon.json operations
cat scripts/setup_gvisor_docker.sh | head -100Length of output: 2945
🏁 Script executed:
#!/bin/bash
# Check lines around 82-85 for docker run and docker cp
sed -n '75,95p' scripts/setup_gvisor_docker.shLength of output: 745
🏁 Script executed:
#!/bin/bash
# Check lines around 128-137 for daemon.json operations
sed -n '120,145p' scripts/setup_gvisor_docker.shLength of output: 642
🏁 Script executed:
#!/bin/bash
# Check lines around 160-162 for cd command
sed -n '155,165p' scripts/setup_gvisor_docker.shLength of output: 249
🏁 Script executed:
#!/bin/bash
# Search for 'set -e' or similar options at the start
head -20 scripts/setup_gvisor_docker.shLength of output: 446
Verify the suggested fixes to unguarded docker cp call, daemon.json overwrite behavior, and cd error handling
The review comment identifies three legitimate issues in the script, all of which are confirmed:
-
Line 84: Unguarded
docker cp— Thedocker cp runsc gvisor-installer:/usr/local/bin/runsccall is not guarded with|| true, despite being intended as best-effort. Underset -e, if the container fails to start (line 82 guarded with|| true), the subsequentdocker cpwill still execute, fail, and abort the script before the fallback local install path runs. -
Lines 128–137: Destructive daemon.json overwrite — The
cat > "$DAEMON_JSON"unconditionally replaces the entire file with a minimal runtimes-only config. While a timestamped backup is created, existing Docker settings (registries, proxies, log drivers, etc.) are silently lost. -
Line 160: Unguarded
cd -— Thecd - > /dev/nulllacks error handling and will not catch unexpected cd failures underset -e.
The suggested fixes are appropriate:
- Add
|| trueto thedocker cpline (line 84) - Consider merging the
runscruntime into existing daemon.json (e.g., withjq) instead of overwriting, or at minimum document this clearly - Add
|| exit 1to bothcd "$TMPDIR"(line 34) andcd -(line 160)
🤖 Prompt for AI Agents
In scripts/setup_gvisor_docker.sh around lines 32 to 37 (and referencing line
84, lines 128–137, and line 160), the script currently: performs an unguarded
docker cp that can fail under set -e; unconditionally overwrites
/etc/docker/daemon.json losing existing config; and uses cd without checking
failure. Fix by adding a permissive failure operator to the docker cp call
(append "|| true" so the script continues if the copy fails); replace the
destructive cat > "$DAEMON_JSON" approach with a merge strategy (use jq to read
existing daemon.json, inject or update the runsc runtime entry, and write back
safely, keeping other settings and making a timestamped backup), or if jq is
unavailable clearly document the destructive behavior and require explicit
confirmation; and add error handling to directory changes (append "|| exit 1" to
cd "$TMPDIR" and to cd - so the script fails with a clear error if directory
changes fail).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
♻️ Duplicate comments (5)
app/modules/code_provider/github/github_service.py (1)
243-252: Critical: This uid update logic was already flagged in a previous review.The concerns raised in the earlier review remain unaddressed:
- Data integrity risk: Modifying the primary key can break foreign key relationships to projects, conversations, prompts, preferences, and custom_agents.
- Security concern: No authorization check before updating uid based on email match.
- Race condition: Concurrent requests could cause conflicts.
- Transaction isolation: Immediate commit outside broader transaction context.
Please refer to the previous review comment for detailed analysis and recommended alternatives (either remove this logic entirely or implement as a separate admin operation with proper safeguards).
app/modules/parsing/graph_construction/parsing_helper.py (1)
929-987: Avoidgit.checkouton caller-owned local repositoriesIn the local-repo path you create a fresh
Repoand then run:elif branch: try: local_repo.git.checkout(branch) latest_commit_sha = local_repo.head.commit.hexsha except GitCommandError: latest_commit_sha = local_repo.head.commit.hexshaThis mutates the caller’s working tree (current branch and files) just to discover a commit SHA, which is risky for developer-owned clones and repeats a previously flagged pattern. A safer approach is to resolve the ref without changing HEAD, e.g.:
- try: - local_repo = Repo(repo_path_str) - if commit_id: - latest_commit_sha = commit_id - elif branch: - try: - local_repo.git.checkout(branch) - latest_commit_sha = local_repo.head.commit.hexsha - except GitCommandError: - # Branch might not exist, use current HEAD - latest_commit_sha = local_repo.head.commit.hexsha - else: - latest_commit_sha = local_repo.head.commit.hexsha - except Exception as e: - logger.warning(f"Could not get commit SHA for local repo: {e}") + try: + local_repo = Repo(repo_path_str) + if commit_id: + latest_commit_sha = commit_id + elif branch: + try: + latest_commit_sha = local_repo.rev_parse(branch).hexsha + except GitCommandError: + # Branch might not exist, use current HEAD + latest_commit_sha = local_repo.head.commit.hexsha + else: + latest_commit_sha = local_repo.head.commit.hexsha + except GitCommandError as e: + logger.warning( + f"Could not resolve commit for local repo {repo_path_str}: {e}" + )This keeps the caller’s checkout intact and narrows the exception type instead of catching bare
Exception(BLE001). The same concern was raised in an earlier review for similar code in this file.app/modules/parsing/graph_construction/parsing_controller.py (1)
222-260: READY projects will be re-parsed on every call when RepoManager is disabledHere:
repo_exists_and_indexed = await ParsingController._check_repo_indexed(...) if not repo_exists_and_indexed: # re-submit parsing to create LSP index
_check_repo_indexedexplicitly returnsFalsewhenREPO_MANAGER_ENABLEDis not"true", so for READY projects with up-to-date commits you’ll still re-submit parsing on everyparse_directorycall whenever RepoManager/LSP is off. This was already flagged in an earlier review.Two concrete fixes:
- Gate the index check on the env flag and keep READY projects stable when RepoManager is disabled:
- # Check if repo exists in .repos and is LSP indexed - repo_exists_and_indexed = await ParsingController._check_repo_indexed( - project_id, - repo_details.repo_name, - repo_details.branch_name, - repo_details.commit_id, - db, - ) - - if not repo_exists_and_indexed: + repo_manager_enabled = ( + os.getenv("REPO_MANAGER_ENABLED", "false").lower() == "true" + ) + + if repo_manager_enabled: + repo_exists_and_indexed = ( + await ParsingController._check_repo_indexed( + project_id, + normalized_repo_name, + repo_details.branch_name, + repo_details.commit_id, + db, + ) + ) + + if not repo_exists_and_indexed: ... return { "project_id": project_id, "status": ProjectStatusEnum.SUBMITTED.value, } - - return {"project_id": project_id, "status": project.status} + return {"project_id": project_id, "status": project.status}
- Use the canonical/normalized repo name (
normalized_repo_name) when talking to RepoManager so you’re consistent with how repos are registered in.repos.Otherwise, READY projects will churn parsing jobs forever in non-RepoManager deployments.
Also applies to: 279-353
app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (1)
59-105: Past validation issues remain unaddressed.The previous review correctly identified that the validator is incomplete:
- It doesn't require a document (
pathoruri) forDOCUMENT_SYMBOLand other document-based methods- It doesn't enforce non-negative constraints on
lineandcharacterat the schema level- Missing document checks will cause runtime errors in
LspServerManager._build_paramsrather than failing fast at validationThe prior review provided a complete fix. Please apply it to ensure all method-specific requirements are enforced at the schema layer.
app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (1)
311-331: CRITICAL: Arbitrary file read vulnerability remains unaddressed.The previous review correctly identified that this method will read any absolute
file://URI without verifying it's withinworkspace_root. Combined withLspQueryToolInput.uri(which overridespath), an attacker could access arbitrary host files (e.g.,file:///etc/passwd).The prior review provided a complete fix using
Path.resolve()andrelative_to()to enforce workspace boundaries. This MUST be applied before merging to prevent information disclosure.
🧹 Nitpick comments (15)
docs/gvisor_mac_setup.md (1)
113-127: Verify section clarity: "Recommendation" text could be slightly redundant.The "Recommendation" section (lines 113–127) presents similar content across lines 115–120 and 121–126. While both perspectives are helpful, consider whether the duplication is intentional. If consolidation is desired, the dual perspective could be unified into a single decision tree.
scripts/install_lsp_servers.sh (3)
13-14: Remove unusedPYTHON_BINvariable.Line 13 defines
PYTHON_BINbut it's never used—onlyPIP_BINis referenced. Remove the unused variable or document its purpose if it's intended for future extensibility.-PYTHON_BIN="$VENV_PATH/bin/python" PIP_BIN="$VENV_PATH/bin/pip"
140-140: Redirect error message to stderr.Error output on line 140 should be sent to stderr for proper error stream handling:
- echo " Timeout error - connection took too long." + echo " Timeout error - connection took too long." >&2Apply the same redirection (
>&2) to related error messages at lines 139, 142, and other diagnostic outputs that report failures or issues.
151-151: Remove unusedDOWNLOAD_SUCCESSvariable.Line 151 sets
DOWNLOAD_SUCCESS=true, but the variable is never checked. The retry loop relies onbreakto exit after a successful download. Either remove the variable or use it to guard the logic:- DOWNLOAD_SUCCESS=true - break + breakrequirements.txt (1)
72-74: Consider updating to current stable versions; all are free from CVEs.Verification confirms no public CVE entries for any dependency. However, all three versions are now outdated:
pygls: 1.3.1 (in PR) vs. 2.0.0 (latest, Oct 17, 2025) — major version bumpjedi: 0.19.1 (in PR) vs. 0.19.2 (latest, Nov 11, 2024)compiledb: 0.10.1 (in PR) vs. 0.10.7 (latest, Mar 23, 2025)While these pinned versions are secure and will function correctly, consider updating to the latest stable versions (especially pygls 2.0.0, which represents significant changes) for improved compatibility and long-term maintainability.
docs/clangd_integration_plan.md (1)
185-191: Add a language tag to the fenced file-structure blockMarkdownlint flags the code fence without a language. Consider using something like:
-``` +```text app/modules/intelligence/tools/code_query_tools/ ├── clangd_session.py # ClangdSession class (existing) ...to satisfy MD040 and make the intent explicit.
app/modules/parsing/graph_construction/parsing_helper.py (2)
331-373: Retry logic is solid; tighten exception logging for better diagnosticsThe connection-timeout retry loops for both GitBucket basic auth and standard token-based downloads look good (bounded retries, exponential backoff, async sleeps). When the final attempts fail (Lines 365-367, 415-417, 453-456), consider using
logger.exception(...)so stack traces are preserved, and consistently chaining withraise ... from ewhen wrapping intoParsingFailedErrorto make root causes easier to debug.This is non-blocking but will make investigating archive-download failures much easier.
Also applies to: 383-423, 447-456
782-785: Language-detection changes look good; verify they align with LSP language keysThe updated C/C++ classification (
.c/.h→"c", common C++ extensions →"cpp") and thevalid_languagesfilter that excludes non-code categories ("markdown","xml","other") make sense.Please double-check that the returned strings (e.g.,
"c_sharp","cpp","python") match whatever language keys are used when registering LSP servers inlsp_server_managersois_workspace_indexedsees the expected languages. A mismatch here would silently disable indexing checks for some languages.Also applies to: 830-863
app/modules/intelligence/tools/code_query_tools/clangd_helpers.py (1)
42-48: Autotools detection currently just falls back tocompile_flags.txt
detect_build_systemcan return"autotools", butensure_compile_commandsonly has explicit branches for"cmake"and"make". Right now autotools projects will skip directly to thecompile_flags.txtfallback.If you intend autotools-specific handling (e.g.,
./configure+makewith Bear/compiledb), you might either add a dedicated branch or clarify in the docstring that autotools is detected but not yet used for compile_commands generation.Also applies to: 351-391
app/modules/intelligence/tools/tool_service.py (1)
47-49: Be explicit about always injectinglsp_queryinto tool lists
get_toolsnow appends"lsp_query"whenever it is configured, even if the caller did not request it intool_names:lsp_tool = self.tools.get("lsp_query") if lsp_tool is not None and all(...): tools.append(lsp_tool)This is convenient for making the LSP base tool universally available, but it is also a behavioral change: callers can no longer rely on
get_toolsreturning only the requested tools.If that’s intentional, consider documenting it at the method docstring and/or constraining it (e.g., only when a code/introspection tool is requested). If it’s not intentional, you could instead only append
lsp_querywhen"lsp_query"is explicitly requested intool_names.Also applies to: 129-142, 206-214
app/modules/intelligence/tools/code_query_tools/clangd_session.py (1)
38-54: Storecache_diror mark it intentionally unusedThe constructor accepts
cache_dir: Pathbut never uses it, which triggers ARG002 in Ruff and may confuse future readers:def __init__(..., cache_dir: Path, config: LanguageServerConfig) -> None: super().__init__(session_key, workspace_root) # cache_dir is ignored self._pygls_session = PyglsClientSession(...)If you expect to route clangd’s index dir through this later, you can simply retain it:
def __init__(..., cache_dir: Path, config: LanguageServerConfig) -> None: super().__init__(session_key, workspace_root) - # Note: cache_dir is kept for compatibility but not used for custom caching - # Clangd uses its native cache in .cache/clangd/index/ via symlink + # Clangd uses its native cache in .cache/clangd/index/ via symlink, + # but keep cache_dir for future configuration or debugging. + self.cache_dir = cache_dirOtherwise, consider renaming to
_cache_diror removing from the signature.app/modules/intelligence/tools/code_query_tools/jedi_session.py (1)
247-260: Tighten silent exception handling and useget_running_loopfor async indexingTwo small robustness tweaks:
- In
get_workspace_symbols, errors while loading the workspace cache are silently swallowed:if cache_file.exists(): try: with open(cache_file) as f: cached_symbols = json.load(f) except Exception: passSilencing all exceptions makes it hard to notice corrupted caches. Consider at least logging at debug level:
- except Exception: - pass + except Exception as exc: + logger.debug("Failed to load workspace symbols cache: %s", exc)
- In
index_workspace, you callasyncio.get_event_loop()inside anasyncmethod:loop = asyncio.get_event_loop() await loop.run_in_executor(None, self._jedi_cache.index_workspace, python_files)On modern Python this is better written as:
- loop = asyncio.get_event_loop() + loop = asyncio.get_running_loop()to avoid surprises with loop policy changes.
Also applies to: 382-388
app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (1)
326-326: Consider unpacking over concatenation.Apply this diff for a more idiomatic approach:
-final_command = [resolved_command[0]] + final_flags +final_command = [resolved_command[0], *final_flags]app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (2)
982-982: Remove unused method parameter.The
configparameter in_warmup_sessionis never used within the method. Consider removing it or documenting why it's kept for future use.async def _warmup_session( self, request: LspQueryRequest, workspace_root: str, session: BaseLspServerSession, - config: LanguageServerConfig, status_messages: List[str], ) -> None:
2009-2009: Remove redundant exception object inlogging.exceptioncalls.
logging.exception()automatically includes exception info, so passingexcis redundant:-logger.exception( - "[LSP_MANAGER] Failed to execute %s for %s: %s", - request.method.value, - request.language, - exc, -) +logger.exception( + "[LSP_MANAGER] Failed to execute %s for %s", + request.method.value, + request.language, +)Apply the same fix at line 2111.
Also applies to: 2111-2111
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
app/modules/code_provider/github/github_service.py(5 hunks)app/modules/intelligence/agents/chat_agents/tool_helpers.py(5 hunks)app/modules/intelligence/tools/code_query_tools/clangd_helpers.py(1 hunks)app/modules/intelligence/tools/code_query_tools/clangd_session.py(1 hunks)app/modules/intelligence/tools/code_query_tools/jedi_session.py(1 hunks)app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py(1 hunks)app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py(1 hunks)app/modules/intelligence/tools/code_query_tools/lsp_session.py(1 hunks)app/modules/intelligence/tools/jira_tools/get_jira_project_users_tool.py(2 hunks)app/modules/intelligence/tools/tool_service.py(4 hunks)app/modules/parsing/graph_construction/parsing_controller.py(3 hunks)app/modules/parsing/graph_construction/parsing_helper.py(6 hunks)app/modules/projects/projects_service.py(6 hunks)app/modules/utils/gvisor_runner.py(1 hunks)docs/clangd_integration_plan.md(1 hunks)docs/gvisor_mac_setup.md(4 hunks)docs/gvisor_setup.md(4 hunks)requirements.txt(1 hunks)scripts/install_lsp_servers.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/gvisor_setup.md
- app/modules/projects/projects_service.py
🧰 Additional context used
🧬 Code graph analysis (6)
app/modules/intelligence/tools/tool_service.py (2)
app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (1)
lsp_query_tool(467-491)app/modules/intelligence/tools/web_tools/code_provider_tool.py (1)
code_provider_tool(227-256)
app/modules/parsing/graph_construction/parsing_controller.py (4)
app/modules/projects/projects_schema.py (1)
ProjectStatusEnum(6-12)app/modules/repo_manager/repo_manager.py (1)
RepoManager(21-489)app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (1)
is_workspace_indexed(336-390)app/modules/parsing/graph_construction/parsing_helper.py (1)
ParseHelper(34-1495)
app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (6)
app/modules/intelligence/tools/code_query_tools/lsp_types.py (1)
LspQueryRequest(146-162)app/modules/intelligence/tools/code_query_tools/pygls_client_session.py (2)
LanguageServerConfig(30-41)from_command_string(40-41)app/modules/projects/projects_service.py (1)
get_project_from_db_by_id_sync(282-295)app/modules/repo_manager/repo_manager.py (1)
RepoManager(21-489)app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (4)
get_lsp_server_manager(50-56)register_language(93-99)is_language_registered(101-102)execute_query(1907-2022)tests/conftest.py (1)
get(142-150)
app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (6)
app/modules/intelligence/tools/code_query_tools/lsp_session.py (5)
BaseLspServerSession(22-44)SessionKey(15-19)ensure_started(34-35)send_notification(40-41)send_request(37-38)app/modules/intelligence/tools/code_query_tools/lsp_types.py (11)
HoverResult(69-112)LspMethod(18-25)LspQueryRequest(146-162)LspQueryResponse(165-188)Location(45-57)Position(28-36)SymbolInformation(115-143)from_lsp(30-31)from_lsp(53-57)from_lsp(83-112)from_lsp(117-134)app/modules/intelligence/tools/code_query_tools/pygls_client_session.py (6)
LanguageServerConfig(30-41)PyglsClientSession(44-370)ensure_started(62-228)send_notification(349-354)send_request(342-347)wait_for_indexing_complete(230-340)app/modules/intelligence/tools/code_query_tools/jedi_session.py (6)
JediSession(319-388)ensure_started(334-336)send_notification(372-376)index_workspace(274-316)index_workspace(382-388)send_request(338-370)app/modules/intelligence/tools/code_query_tools/clangd_session.py (4)
ClangdSession(29-79)ensure_started(55-58)send_notification(70-74)send_request(60-68)app/modules/intelligence/tools/code_query_tools/clangd_helpers.py (1)
ensure_compile_commands(320-403)
app/modules/parsing/graph_construction/parsing_helper.py (2)
app/modules/projects/projects_service.py (1)
update_project(369-381)app/modules/projects/projects_schema.py (1)
ProjectStatusEnum(6-12)
app/modules/code_provider/github/github_service.py (1)
app/modules/users/user_model.py (1)
User(17-34)
🪛 GitHub Check: SonarCloud Code Analysis
scripts/install_lsp_servers.sh
[failure] 102-102: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
[failure] 212-212: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
[failure] 112-112: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
[failure] 190-190: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
[warning] 140-140: Redirect this error message to stderr (>&2).
[failure] 246-246: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
[failure] 250-250: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
[failure] 223-223: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
[failure] 247-247: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
[failure] 111-111: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
[failure] 185-185: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py
[failure] 640-640: Don't use datetime.datetime.utcnow to create this datetime object.
[failure] 977-977: Refactor this function to reduce its Cognitive Complexity from 285 to the 15 allowed.
🪛 markdownlint-cli2 (0.18.1)
docs/clangd_integration_plan.md
185-185: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 OSV Scanner (2.2.4)
requirements.txt
[HIGH] 78-78: python-multipart 0.0.7: Denial of service (DoS) via deformation multipart/form-data boundary
🪛 Ruff (0.14.5)
app/modules/parsing/graph_construction/parsing_controller.py
345-345: Consider moving this statement to an else block
(TRY300)
347-347: Do not catch blind exception: Exception
(BLE001)
app/modules/intelligence/tools/code_query_tools/clangd_helpers.py
101-101: Do not catch blind exception: Exception
(BLE001)
110-110: subprocess call: check for execution of untrusted input
(S603)
139-139: Do not catch blind exception: Exception
(BLE001)
146-146: Do not catch blind exception: Exception
(BLE001)
183-183: subprocess call: check for execution of untrusted input
(S603)
192-192: subprocess call: check for execution of untrusted input
(S603)
213-213: Do not catch blind exception: Exception
(BLE001)
245-245: subprocess call: check for execution of untrusted input
(S603)
266-266: Do not catch blind exception: Exception
(BLE001)
303-307: Consider iterable unpacking instead of concatenation
(RUF005)
314-314: Consider moving this statement to an else block
(TRY300)
315-315: Do not catch blind exception: Exception
(BLE001)
340-340: Local variable workspace_path is assigned to but never used
Remove assignment to unused variable workspace_path
(F841)
app/modules/intelligence/tools/code_query_tools/jedi_session.py
46-46: Consider moving this statement to an else block
(TRY300)
47-47: Do not catch blind exception: Exception
(BLE001)
67-67: Do not catch blind exception: Exception
(BLE001)
95-95: Do not catch blind exception: Exception
(BLE001)
98-98: Consider moving this statement to an else block
(TRY300)
99-99: Do not catch blind exception: Exception
(BLE001)
115-115: Do not catch blind exception: Exception
(BLE001)
140-140: Do not catch blind exception: Exception
(BLE001)
143-143: Consider moving this statement to an else block
(TRY300)
144-144: Do not catch blind exception: Exception
(BLE001)
160-160: Do not catch blind exception: Exception
(BLE001)
194-194: Do not catch blind exception: Exception
(BLE001)
197-197: Consider moving this statement to an else block
(TRY300)
198-198: Do not catch blind exception: Exception
(BLE001)
214-214: Do not catch blind exception: Exception
(BLE001)
239-239: Do not catch blind exception: Exception
(BLE001)
242-242: Consider moving this statement to an else block
(TRY300)
243-243: Do not catch blind exception: Exception
(BLE001)
259-260: try-except-pass detected, consider logging the exception
(S110)
259-259: Do not catch blind exception: Exception
(BLE001)
307-307: Do not catch blind exception: Exception
(BLE001)
315-315: Do not catch blind exception: Exception
(BLE001)
370-370: Avoid specifying long messages outside the exception class
(TRY003)
app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py
101-103: Avoid specifying long messages outside the exception class
(TRY003)
142-142: Do not catch blind exception: Exception
(BLE001)
151-151: Avoid specifying long messages outside the exception class
(TRY003)
153-155: Avoid specifying long messages outside the exception class
(TRY003)
173-175: Avoid specifying long messages outside the exception class
(TRY003)
206-206: Do not catch blind exception: Exception
(BLE001)
252-252: Loop control variable dirs not used within loop body
Rename unused dirs to _dirs
(B007)
326-326: Consider [resolved_command[0], *final_flags] instead of concatenation
Replace with [resolved_command[0], *final_flags]
(RUF005)
387-387: Do not catch blind exception: Exception
(BLE001)
app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py
65-65: Loop control variable session not used within loop body
Rename unused session to _session
(B007)
69-70: try-except-pass detected, consider logging the exception
(S110)
69-69: Do not catch blind exception: Exception
(BLE001)
111-113: Avoid specifying long messages outside the exception class
(TRY003)
138-140: Avoid specifying long messages outside the exception class
(TRY003)
151-153: Avoid specifying long messages outside the exception class
(TRY003)
159-159: Avoid specifying long messages outside the exception class
(TRY003)
328-328: Consider moving this statement to an else block
(TRY300)
386-386: Do not catch blind exception: Exception
(BLE001)
397-397: Unused method argument: opened_files
(ARG002)
451-451: Do not catch blind exception: Exception
(BLE001)
465-465: Do not catch blind exception: Exception
(BLE001)
505-505: Local variable total_time is assigned to but never used
Remove assignment to unused variable total_time
(F841)
552-553: try-except-pass detected, consider logging the exception
(S110)
552-552: Do not catch blind exception: Exception
(BLE001)
617-617: Do not catch blind exception: Exception
(BLE001)
646-647: try-except-pass detected, consider logging the exception
(S110)
646-646: Do not catch blind exception: Exception
(BLE001)
649-649: Consider moving this statement to an else block
(TRY300)
654-654: Do not catch blind exception: Exception
(BLE001)
726-726: Do not catch blind exception: Exception
(BLE001)
730-731: try-except-pass detected, consider logging the exception
(S110)
730-730: Do not catch blind exception: Exception
(BLE001)
744-744: Do not catch blind exception: Exception
(BLE001)
773-773: Do not catch blind exception: Exception
(BLE001)
777-778: try-except-pass detected, consider logging the exception
(S110)
777-777: Do not catch blind exception: Exception
(BLE001)
797-797: Do not catch blind exception: Exception
(BLE001)
830-830: Do not catch blind exception: Exception
(BLE001)
982-982: Unused method argument: config
(ARG002)
1116-1116: Do not catch blind exception: Exception
(BLE001)
1163-1163: Do not catch blind exception: Exception
(BLE001)
1204-1204: Loop control variable file_path not used within loop body
(B007)
1218-1219: try-except-pass detected, consider logging the exception
(S110)
1218-1218: Do not catch blind exception: Exception
(BLE001)
1253-1253: Do not catch blind exception: Exception
(BLE001)
1282-1283: try-except-pass detected, consider logging the exception
(S110)
1282-1282: Do not catch blind exception: Exception
(BLE001)
1354-1354: Do not catch blind exception: Exception
(BLE001)
1372-1372: Loop control variable file_path not used within loop body
(B007)
1383-1384: try-except-pass detected, consider logging the exception
(S110)
1383-1383: Do not catch blind exception: Exception
(BLE001)
1449-1449: Do not catch blind exception: Exception
(BLE001)
1518-1519: try-except-pass detected, consider logging the exception
(S110)
1518-1518: Do not catch blind exception: Exception
(BLE001)
1672-1672: Do not catch blind exception: Exception
(BLE001)
1777-1778: try-except-pass detected, consider logging the exception
(S110)
1777-1777: Do not catch blind exception: Exception
(BLE001)
1849-1850: try-except-pass detected, consider logging the exception
(S110)
1849-1849: Do not catch blind exception: Exception
(BLE001)
1904-1904: Do not catch blind exception: Exception
(BLE001)
2003-2003: Consider moving this statement to an else block
(TRY300)
2009-2009: Redundant exception object included in logging.exception call
(TRY401)
2111-2111: Redundant exception object included in logging.exception call
(TRY401)
app/modules/parsing/graph_construction/parsing_helper.py
365-367: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
371-373: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
371-373: Avoid specifying long messages outside the exception class
(TRY003)
415-417: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
421-423: Abstract raise to an inner function
(TRY301)
421-423: Avoid specifying long messages outside the exception class
(TRY003)
453-456: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
952-952: Do not catch blind exception: Exception
(BLE001)
968-968: Do not catch blind exception: Exception
(BLE001)
982-982: Do not catch blind exception: Exception
(BLE001)
app/modules/intelligence/tools/code_query_tools/clangd_session.py
42-42: Unused method argument: cache_dir
(ARG002)
🪛 Shellcheck (0.11.0)
scripts/install_lsp_servers.sh
[warning] 13-13: PYTHON_BIN appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 151-151: DOWNLOAD_SUCCESS appears unused. Verify use (or export if used externally).
(SC2034)
🔇 Additional comments (32)
docs/gvisor_mac_setup.md (2)
106-111: Table reformatting improves readability and alignment.The updated table structure with consistent column headers and alignment makes the option comparison clearer at a glance. The content and recommendations remain intact.
1-156: Documentation content is accurate and well-structured.The gVisor setup guide accurately reflects the constraints (gVisor is Linux-only, fallback to subprocess on Mac), provides clear options with appropriate complexity levels, and includes actionable testing examples. The formatting updates enhance readability without compromising clarity or completeness.
app/modules/code_provider/github/github_service.py (3)
234-234: LGTM! Backward-compatible signature change.The addition of the optional
user_emailparameter maintains backward compatibility while enabling email-based user resolution as a fallback.
500-501: LGTM! Proper exception handling.Explicitly re-raising
HTTPExceptionpreserves the original status code and error detail, preventing them from being wrapped in a generic 500 error. This improves error clarity for API consumers.
513-515: LGTM! Consistent parameter propagation.The signature change and parameter forwarding are consistent with the
get_repos_for_userupdate. The changes maintain backward compatibility while enabling email-based user resolution throughout the call chain.Also applies to: 550-552
app/modules/intelligence/tools/jira_tools/get_jira_project_users_tool.py (1)
33-33: No issues found — the type change is properly coordinated.The
user_idparameter change frominttostrhas been correctly implemented:
- Downstream functions accept
str: Bothcheck_jira_integration_existsandget_jira_client_for_userin jira_client.py have signatures(user_id: str, db: Session).- Caller updated: ToolService passes
self.user_idwhich is typed asstr(line 103, tool_service.py).- Consistent pattern: All tool factories throughout the codebase uniformly use
user_id: str, with no type mismatches.requirements.txt (1)
80-83: Documentation comment provides helpful context for LSP server setup.The added note clearly documents that LSP servers require separate installation and directs users to the installation script and respective package managers. This is helpful for contributors unfamiliar with LSP infrastructure.
However, please verify that
scripts/install_lsp_servers.shexists and contains instructions for all mentioned servers (pyright, OmniSharp, clangd, gopls, rust-analyzer).app/modules/intelligence/agents/chat_agents/tool_helpers.py (5)
3-9: LGTM!The LSP tool labels dictionary is well-structured with clear, user-friendly descriptions for each LSP operation.
125-126: LGTM!The guard pattern correctly handles LSP tools and provides appropriate starting messages.
241-242: LGTM!The completion message for LSP tools is clear and consistent with the starting message pattern.
630-666: LSP result structures verified as correct.The verification confirms that LSP tool implementations return results matching the expected structure. The
LspQueryResponseincludessuccess,method,status_messages,locations,hover,symbols, anderrorfields, and the code in lines 630-666 correctly handles all these fields with appropriate defensive checks and data extraction.
459-472: Approve — LSP argument structure verified. LspQueryToolInput (app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py) defines project_id, language, uri/path, line and character, and LspQueryTool._execute_async/_run/_arun accept and validate those fields (position validation requires both line and character for DEFINITION/REFERENCES/HOVER), so the formatting in app/modules/intelligence/agents/chat_agents/tool_helpers.py (lines 459–472) correctly matches the actual LSP tool arguments.app/modules/parsing/graph_construction/parsing_helper.py (1)
1063-1152: RepoManager worktree integration looks correct and self-containedThe
_copy_repo_to_repo_manager,_initialize_base_repo, and_create_worktreeflow (initialize/copy base repo, then create per-ref worktrees and register them) is cohesive and keeps all RepoManager-specific behavior behind a single helper.The fallback path in
_create_worktreethat copies files and initializes a standalone repo whengit worktree addfails is a reasonable safety net for missing refs or synthetic snapshots.app/modules/intelligence/tools/code_query_tools/lsp_session.py (1)
14-44: Base LSP session abstraction looks clean and sufficient
SessionKeyas a frozen dataclass andBaseLspServerSessionwithensure_started,send_request,send_notification, andshutdownform a minimal, clear interface that other sessions (clangd, jedi, etc.) can implement consistently.No issues from this file.
app/modules/intelligence/tools/code_query_tools/lsp_query_tool.py (8)
124-147: LGTM: Defensive initialization pattern.The exception handling during RepoManager initialization is appropriate—it logs failures without blocking tool creation, allowing graceful degradation when repository management is unavailable.
148-156: LGTM: Proper authorization check.The method correctly validates project ownership before allowing LSP queries, preventing unauthorized access to repository data.
158-181: LGTM: Path traversal protection in place.The method properly validates that paths are repository-relative, rejecting absolute paths and parent directory references before constructing file URIs. This prevents accessing files outside the intended workspace.
183-212: LGTM: Flexible language server configuration.The environment variable override system provides good flexibility for customizing language server commands. The migration of
PYTHON_LSP_COMMANDto the new prefix maintains backward compatibility.
215-283: LGTM: Thorough executable resolution.The search logic comprehensively checks multiple locations (PATH, venv, .lsp_binaries) and handles platform-specific variations for OmniSharp (capital vs lowercase, .exe extension).
352-418: LGTM: Robust query execution flow.The method properly validates preconditions (repo manager, project details, worktree existence), constructs the LSP request with normalized parameters, and delegates to the server manager. Defensive exception handling for metadata updates is appropriate.
420-464: LGTM: Standard async/sync wrapper pattern.The implementation follows the established pattern from other tools (e.g.,
jira_tools/get_jira_project_users_tool.py), providing both synchronous and asynchronous execution paths.
467-491: LGTM: Guarded tool creation.The factory correctly prevents tool registration when repository management is unavailable, mirroring the
bash_command_toolpattern and ensuring the tool is only available when it can function properly.app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py (10)
74-114: LGTM: Clean manager initialization and registration API.The initialization properly sets up session tracking structures, and the registration API provides clear methods for configuring language servers. Error handling in
_get_language_configcorrectly chains the KeyError for debugging.
127-159: LGTM: Method-specific parameter validation.The parameter building correctly enforces LSP protocol requirements for each method type, with clear error messages when required parameters are missing. This complements the input validation in
LspQueryToolInput.
161-309: LGTM: Flexible normalization for multiple LSP implementations.The normalization methods handle both standard LSP and Jedi-specific formats, enabling compatibility across different language server implementations. The approximation in line 187 (
character + 10) is clearly documented as an estimate when the exact end position isn't available.
336-390: LGTM: Robust indexing status check.The method uses a two-tier approach—checking both in-memory state and persistent marker files—to reliably determine if a workspace has been indexed. Defensive exception handling for file I/O is appropriate.
392-657: Approve complex but necessary Pyright CLI integration.The method implements comprehensive Pyright CLI execution with progress tracking, cache management, and output parsing. While complex, the functionality is appropriate for providing reliable indexing completion signals.
658-917: LGTM: Comprehensive cache management.The cache configuration properly handles per-language requirements with careful symlink management, fallback handling for broken links, and integration with build system detection for C/C++. The defensive exception handling ensures graceful degradation when filesystem operations fail.
919-976: LGTM: Clean session factory pattern.The method appropriately creates language-specific session types (JediSession for Python, ClangdSession for C/C++, PyglsClientSession for others) with proper cache configuration and startup timing.
1907-2022: LGTM: Well-structured query execution.The method properly handles session lifecycle, document opening/closing, and result normalization. The try-finally ensures documents are closed even on errors.
2024-2118: LGTM: Clean workspace indexing API.The method provides a clear public API for pre-indexing workspaces during parsing, reusing the existing session warmup logic and properly tracking results per language.
545-545: Incomplete refactor suggestion - timezone import required but not mentioned.The suggested replacement of
datetime.utcnow()withdatetime.now(timezone.utc)requires importingtimezonefrom the datetime module. However, the file currently only imports thedatetimeclass (at lines 409, 1198, and 1799), nottimezone. The fix requires updating these import statements to includetimezonebefore the replacement can be applied.Additionally, the project's Python version requirement could not be verified to confirm whether the deprecation applies to the target version.
| def create_compile_flags_txt( | ||
| workspace_root: str, language: str = "c" | ||
| ) -> Optional[Path]: | ||
| """ | ||
| Create a basic compile_flags.txt file as a fallback. | ||
| This is a simple alternative to compile_commands.json that works | ||
| for projects with consistent compiler flags. | ||
| Returns: | ||
| Path to created compile_flags.txt, or None if failed | ||
| """ | ||
| workspace_path = Path(workspace_root) | ||
| compile_flags = workspace_path / "compile_flags.txt" | ||
|
|
||
| # Don't overwrite existing file | ||
| if compile_flags.exists(): | ||
| logger.info(f"[clangd] compile_flags.txt already exists at {compile_flags}") | ||
| return compile_flags | ||
|
|
||
| # Detect include directories | ||
| include_dirs = [] | ||
| for include_dir in ["include", "inc", "src", "."]: | ||
| include_path = workspace_path / include_dir | ||
| if include_path.exists() and include_path.is_dir(): | ||
| include_dirs.append(f"-I{include_dir}") | ||
|
|
||
| # Set C standard based on language | ||
| std_flag = "-std=c11" if language == "c" else "-std=c++17" | ||
|
|
||
| # Create basic compile_flags.txt | ||
| flags = [ | ||
| f"-x{language}", # Language type | ||
| std_flag, | ||
| "-Wall", | ||
| ] + include_dirs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix -x language flag for C++ in compile_flags.txt
create_compile_flags_txt currently does:
std_flag = "-std=c11" if language == "c" else "-std=c++17"
flags = [
f"-x{language}", # Language type
std_flag,
"-Wall",
] + include_dirsIf callers pass "cpp" (as suggested in the docstring), this produces -xcpp, which isn’t a valid Clang/GCC language kind and will likely break clangd’s parsing when falling back to compile_flags.txt.
Consider normalizing the language for the -x flag:
- # Set C standard based on language
- std_flag = "-std=c11" if language == "c" else "-std=c++17"
+ # Normalize language and set standard
+ norm_lang = "c" if language == "c" else "c++"
+ std_flag = "-std=c11" if norm_lang == "c" else "-std=c++17"
@@
- flags = [
- f"-x{language}", # Language type
+ flags = [
+ f"-x{norm_lang}", # Language type
std_flag,
"-Wall",
] + include_dirsso clangd sees the correct language.
🧰 Tools
🪛 Ruff (0.14.5)
303-307: Consider iterable unpacking instead of concatenation
(RUF005)
🤖 Prompt for AI Agents
In app/modules/intelligence/tools/code_query_tools/clangd_helpers.py around
lines 272-307, the function sets the language flag using f"-x{language}" which
yields invalid flags like "-xcpp"; normalize incoming language values so the -x
flag uses recognized Clang/GCC kinds (use "-xc" for C and "-xc++" for C++), map
accepted inputs like "c", "cpp", "c++" accordingly, keep the std_flag logic (c
=> -std=c11, c++ => -std=c++17), and then build flags using the normalized -x
value plus the std and other flags before writing compile_flags.txt.
| def reset_lsp_server_manager() -> None: | ||
| """Reset the global LSP server manager (useful for testing or cleanup).""" | ||
| global _lsp_server_manager | ||
| if _lsp_server_manager is not None: | ||
| logger.info("LspServerManager: Resetting global manager instance") | ||
| # Shutdown all sessions before resetting | ||
| for session in _lsp_server_manager._sessions.values(): | ||
| try: | ||
| # Note: This is a best-effort cleanup. Sessions may need async cleanup. | ||
| pass | ||
| except Exception: | ||
| pass | ||
| _lsp_server_manager = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete session cleanup in reset function.
The loop over sessions doesn't actually perform shutdown—it just passes. While the comment acknowledges this is "best-effort" and sessions "may need async cleanup," this could leak resources when the manager is reset.
Consider adding async cleanup support or at least attempting synchronous shutdown. Would you like me to propose an implementation that properly cleans up sessions during reset?
🧰 Tools
🪛 Ruff (0.14.5)
65-65: Loop control variable session not used within loop body
Rename unused session to _session
(B007)
69-70: try-except-pass detected, consider logging the exception
(S110)
69-69: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py around
lines 59 to 71, the reset_lsp_server_manager function currently iterates
sessions but does nothing, risking resource leaks; update it to perform an
actual shutdown for each session: if the session object exposes a synchronous
close/shutdown method call it and catch/log exceptions; if shutdown is async,
schedule and run it safely (use asyncio.get_event_loop().run_until_complete or
asyncio.run when no running loop) with proper exception handling and timeouts;
after attempting shutdown for all sessions clear the _sessions dict and then set
_lsp_server_manager to None so resources are released and repeated resets are
safe.
|
|
||
| # Read output line by line to track progress | ||
| files_analyzed = 0 | ||
| total_time = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove unused variable.
The total_time variable is assigned but never used. Apply this diff:
-files_analyzed = 0
-total_time = 0.0
-output_lines = []
+files_analyzed = 0
+output_lines = []Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.5)
505-505: Local variable total_time is assigned to but never used
Remove assignment to unused variable total_time
(F841)
🤖 Prompt for AI Agents
In app/modules/intelligence/tools/code_query_tools/lsp_server_manager.py around
line 505, the local variable assignment "total_time = 0.0" is unused; remove
that line to eliminate the dead variable and ensure no other code later
references it (if it is needed, replace with a used timing mechanism or store it
appropriately), then run tests/lint to confirm no references remain.
| async def _warmup_session( | ||
| self, | ||
| request: LspQueryRequest, | ||
| workspace_root: str, | ||
| session: BaseLspServerSession, | ||
| config: LanguageServerConfig, | ||
| status_messages: List[str], | ||
| ) -> None: | ||
| key = self._session_key(request.project_id, request.language) | ||
| language = request.language | ||
|
|
||
| if key in self._warmed_sessions: | ||
| logger.debug(f"Session {key} already warmed up, skipping warmup") | ||
| return | ||
|
|
||
| lock = self._warmup_locks.setdefault(key, asyncio.Lock()) | ||
| async with lock: | ||
| if key in self._warmed_sessions: | ||
| return | ||
|
|
||
| # Map language to file extensions | ||
| language_extensions = { | ||
| "python": ["*.py"], | ||
| "typescript": ["*.ts", "*.tsx"], | ||
| "javascript": ["*.js", "*.jsx"], | ||
| "go": ["*.go"], | ||
| "java": ["*.java"], | ||
| "rust": ["*.rs"], | ||
| "c": ["*.c", "*.h"], | ||
| "cpp": ["*.cpp", "*.cc", "*.cxx", "*.hpp", "*.h"], | ||
| "csharp": ["*.cs"], | ||
| "ruby": ["*.rb"], | ||
| "php": ["*.php"], | ||
| } | ||
|
|
||
| extensions = language_extensions.get(language, []) | ||
| if not extensions: | ||
| # Unknown language, skip indexing | ||
| self._warmed_sessions.add(key) | ||
| return | ||
|
|
||
| # Collect all matching files | ||
| all_files = [] | ||
| for ext in extensions: | ||
| all_files.extend(Path(workspace_root).rglob(ext)) | ||
|
|
||
| if not all_files: | ||
| self._warmed_sessions.add(key) | ||
| return | ||
|
|
||
| status_messages.append( | ||
| f"Indexing {len(all_files)} {language} files for workspace analysis" | ||
| ) | ||
|
|
||
| # Pyright doesn't actually "index" - it analyzes files on-demand | ||
| # To trigger analysis, we need to: | ||
| # 1. Open files with didOpen | ||
| # 2. Query files with documentSymbol to trigger analysis | ||
| # 3. Then workspace/symbol can return results | ||
|
|
||
| opened_files = [] | ||
| opened_file_data = [] # Store (uri, file_path) for verification | ||
|
|
||
| # Step 1: For Jedi, we don't need to open files - it works directly with file paths | ||
| # For other language servers, open files with didOpen | ||
| if not (language == "python" and isinstance(session, JediSession)): | ||
| logger.info( | ||
| f"[LSP] Opening {len(all_files)} {language} files and triggering analysis..." | ||
| ) | ||
| # Open all files for non-Jedi language servers | ||
| for idx, file_path in enumerate(all_files, start=1): | ||
| try: | ||
| text = file_path.read_text(encoding="utf-8") | ||
| except OSError: | ||
| continue | ||
|
|
||
| uri = file_path.as_uri() | ||
| await session.send_notification( | ||
| "textDocument/didOpen", | ||
| { | ||
| "textDocument": { | ||
| "uri": uri, | ||
| "languageId": language, | ||
| "version": 1, | ||
| "text": text, | ||
| } | ||
| }, | ||
| ) | ||
| opened_files.append(uri) | ||
| opened_file_data.append((uri, file_path)) | ||
|
|
||
| if idx % 50 == 0: | ||
| await asyncio.sleep(0.1) # Small delay every 50 files | ||
| logger.info(f"[LSP] Opened {idx}/{len(all_files)} files...") | ||
|
|
||
| logger.info( | ||
| f"[LSP] Opened {len(opened_files)} files. Triggering analysis by querying files..." | ||
| ) | ||
| status_messages.append(f"Opened {len(opened_files)} {language} files") | ||
| else: | ||
| # For Jedi, just prepare file data for indexing | ||
| for file_path in all_files: | ||
| uri = file_path.as_uri() | ||
| opened_file_data.append((uri, file_path)) | ||
|
|
||
| # Step 2: Index workspace using Jedi for Python | ||
| if language == "python" and isinstance(session, JediSession): | ||
| status_messages.append("Indexing workspace with Jedi...") | ||
| logger.info( | ||
| "[LSP] Indexing Python files with Jedi for workspace symbol search..." | ||
| ) | ||
|
|
||
| # Get all Python file paths | ||
| python_file_paths = [str(f) for f in all_files] | ||
|
|
||
| # Index workspace (this will cache workspace symbols) | ||
| await session.index_workspace(python_file_paths) | ||
|
|
||
| logger.info(f"[LSP] Jedi indexed {len(python_file_paths)} Python files") | ||
| status_messages.append( | ||
| f"Indexed {len(python_file_paths)} Python files with Jedi" | ||
| ) | ||
|
|
||
| # Verify indexing by checking workspace symbols | ||
| try: | ||
| from app.modules.intelligence.tools.code_query_tools.lsp_types import ( | ||
| LspMethod, | ||
| ) | ||
|
|
||
| ws_result = await session.send_request( | ||
| LspMethod.WORKSPACE_SYMBOL, {"query": ""} | ||
| ) | ||
| symbol_count = len(ws_result) if isinstance(ws_result, list) else 0 | ||
| logger.info( | ||
| f"[LSP] Jedi workspace symbol search returned {symbol_count} symbols" | ||
| ) | ||
| status_messages.append( | ||
| f"Workspace symbol search: {symbol_count} symbols found" | ||
| ) | ||
| except Exception as exc: | ||
| logger.warning(f"[LSP] Failed to verify Jedi indexing: {exc}") | ||
|
|
||
| # Mark as warmed up - Jedi doesn't need file opening | ||
| verification_passed = True | ||
| opened_files = [] # Clear opened files since Jedi doesn't need them | ||
| elif language == "python": | ||
| # Fallback for non-Jedi Python sessions (shouldn't happen) | ||
| status_messages.append("Warning: Python session is not using Jedi") | ||
| verification_passed = False | ||
|
|
||
| from app.modules.intelligence.tools.code_query_tools.lsp_types import ( | ||
| LspMethod, | ||
| ) | ||
|
|
||
| # Query a subset of files to trigger analysis | ||
| # We don't need to query all files - querying some will trigger Pyright to analyze them | ||
| files_to_query = min(20, len(opened_file_data)) # Query up to 20 files | ||
| analyzed_count = 0 | ||
| total_symbols_found = 0 | ||
|
|
||
| for idx, (uri, file_path) in enumerate( | ||
| opened_file_data[:files_to_query], 1 | ||
| ): | ||
| try: | ||
| # Query documentSymbol to trigger analysis | ||
| doc_symbol_result = await session.send_request( | ||
| LspMethod.DOCUMENT_SYMBOL, {"textDocument": {"uri": uri}} | ||
| ) | ||
|
|
||
| symbol_count = ( | ||
| len(doc_symbol_result) | ||
| if isinstance(doc_symbol_result, list) | ||
| else 0 | ||
| ) | ||
|
|
||
| if symbol_count > 0: | ||
| analyzed_count += 1 | ||
| total_symbols_found += symbol_count | ||
|
|
||
| # Small delay to avoid overwhelming Pyright | ||
| if idx % 10 == 0: | ||
| await asyncio.sleep(0.5) | ||
| logger.info( | ||
| f"[LSP] Queried {idx}/{files_to_query} files, " | ||
| f"{analyzed_count} analyzed, {total_symbols_found} symbols found" | ||
| ) | ||
| except Exception as exc: | ||
| logger.debug( | ||
| f"[LSP] Error querying documentSymbol for {file_path.name}: {exc}" | ||
| ) | ||
|
|
||
| logger.info( | ||
| f"[LSP] Triggered analysis on {files_to_query} files: " | ||
| f"{analyzed_count} files analyzed, {total_symbols_found} total symbols" | ||
| ) | ||
| status_messages.append( | ||
| f"Triggered analysis: {analyzed_count}/{files_to_query} files analyzed " | ||
| f"({total_symbols_found} symbols)" | ||
| ) | ||
|
|
||
| # Step 4: Poll to wait for Pyright to finish analyzing files | ||
| # We'll poll documentSymbol on sample files until they're all analyzed | ||
| if analyzed_count > 0: | ||
| status_messages.append( | ||
| "Waiting for Pyright to complete analysis of all files..." | ||
| ) | ||
| logger.info( | ||
| "[LSP] Polling to wait for Pyright to complete analysis..." | ||
| ) | ||
|
|
||
| # Poll sample files until they're all analyzed | ||
| max_poll_attempts = ( | ||
| 30 # Poll for up to 30 attempts (60 seconds with 2s interval) | ||
| ) | ||
| poll_interval = 2.0 # 2 seconds between polls | ||
| sample_size = min(10, len(opened_file_data)) | ||
| analyzed_in_sample = 0 # Initialize for use in else clause | ||
|
|
||
| # Get cache directory for incremental updates | ||
| cache_dir = self._get_lsp_cache_dir(workspace_root, language) | ||
| index_marker = cache_dir / ".indexed" | ||
| from datetime import datetime | ||
|
|
||
| for poll_attempt in range(max_poll_attempts): | ||
| analyzed_in_sample = 0 | ||
| total_symbols_in_sample = 0 | ||
|
|
||
| for uri, file_path in opened_file_data[:sample_size]: | ||
| try: | ||
| doc_result = await session.send_request( | ||
| LspMethod.DOCUMENT_SYMBOL, | ||
| {"textDocument": {"uri": uri}}, | ||
| ) | ||
| symbol_count = ( | ||
| len(doc_result) | ||
| if isinstance(doc_result, list) | ||
| else 0 | ||
| ) | ||
| if symbol_count > 0: | ||
| analyzed_in_sample += 1 | ||
| total_symbols_in_sample += symbol_count | ||
| except Exception: | ||
| pass | ||
|
|
||
| # Update cache marker file with progress information | ||
| try: | ||
| # Check Pyright cache file count | ||
| pyright_cache_dir = cache_dir / "pyright" | ||
| cache_file_count = 0 | ||
| if pyright_cache_dir.exists(): | ||
| cache_files = list(pyright_cache_dir.rglob("*")) | ||
| cache_file_count = len( | ||
| [f for f in cache_files if f.is_file()] | ||
| ) | ||
|
|
||
| # Write progress to marker file | ||
| progress_data = { | ||
| "status": "indexing", | ||
| "progress": { | ||
| "files_analyzed": analyzed_in_sample, | ||
| "sample_size": sample_size, | ||
| "total_symbols": total_symbols_in_sample, | ||
| "poll_attempt": poll_attempt + 1, | ||
| "max_polls": max_poll_attempts, | ||
| "elapsed_seconds": poll_attempt * poll_interval, | ||
| "cache_files": cache_file_count, | ||
| }, | ||
| "timestamp": datetime.utcnow().isoformat(), | ||
| "workspace": workspace_root, | ||
| } | ||
|
|
||
| import json | ||
|
|
||
| index_marker.write_text( | ||
| json.dumps(progress_data, indent=2), encoding="utf-8" | ||
| ) | ||
| except Exception as exc: | ||
| logger.debug(f"[LSP] Failed to update cache marker: {exc}") | ||
|
|
||
| # If most files in sample are analyzed, we're done | ||
| if analyzed_in_sample >= (sample_size * 0.8): # 80% threshold | ||
| logger.info( | ||
| f"[LSP] Analysis complete: {analyzed_in_sample}/{sample_size} files analyzed " | ||
| f"after {poll_attempt + 1} polls ({poll_attempt * poll_interval:.1f}s)" | ||
| ) | ||
| status_messages.append( | ||
| f"Analysis complete: {analyzed_in_sample}/{sample_size} files analyzed " | ||
| f"({total_symbols_in_sample} symbols)" | ||
| ) | ||
| break | ||
|
|
||
| if ( | ||
| poll_attempt % 5 == 0 and poll_attempt > 0 | ||
| ): # Log every 5 polls (every 10 seconds) | ||
| # Get cache file count for logging | ||
| cache_file_count_log = 0 | ||
| try: | ||
| pyright_cache_dir_log = cache_dir / "pyright" | ||
| if pyright_cache_dir_log.exists(): | ||
| cache_files_log = list( | ||
| pyright_cache_dir_log.rglob("*") | ||
| ) | ||
| cache_file_count_log = len( | ||
| [f for f in cache_files_log if f.is_file()] | ||
| ) | ||
| except Exception: | ||
| pass | ||
|
|
||
| logger.info( | ||
| f"[LSP] Polling progress: {poll_attempt + 1}/{max_poll_attempts} polls, " | ||
| f"{analyzed_in_sample}/{sample_size} files analyzed " | ||
| f"({poll_attempt * poll_interval:.1f}s elapsed, {cache_file_count_log} cache files)" | ||
| ) | ||
| status_messages.append( | ||
| f"Analysis in progress: {analyzed_in_sample}/{sample_size} files analyzed..." | ||
| ) | ||
|
|
||
| await asyncio.sleep(poll_interval) | ||
| else: | ||
| # Polling completed without reaching threshold | ||
| logger.warning( | ||
| f"[LSP] Analysis polling completed: {analyzed_in_sample}/{sample_size} files analyzed " | ||
| f"after {max_poll_attempts} polls" | ||
| ) | ||
| status_messages.append( | ||
| f"Analysis polling completed: {analyzed_in_sample}/{sample_size} files analyzed" | ||
| ) | ||
| else: | ||
| logger.warning( | ||
| "[LSP] No files were analyzed. Pyright might not be responding to queries." | ||
| ) | ||
| status_messages.append( | ||
| "Warning: No files analyzed, but continuing..." | ||
| ) | ||
| await asyncio.sleep(5.0) | ||
|
|
||
| # Pyright doesn't have a traditional "indexing complete" event | ||
| # Since we've already queried files to trigger analysis, we can verify | ||
| # by checking if workspace/symbol returns results or if files are analyzed | ||
|
|
||
| # If we already analyzed files successfully, we can consider indexing working | ||
| # even if workspace/symbol doesn't return results immediately | ||
| verification_passed = False | ||
|
|
||
| if language == "python" and len(opened_file_data) > 0: | ||
| # Check if we successfully analyzed files | ||
| # If we did, Pyright is working and can answer queries | ||
| status_messages.append("Verifying Pyright is ready for queries...") | ||
| logger.info( | ||
| "[LSP] Verifying Pyright can answer workspace/symbol queries..." | ||
| ) | ||
|
|
||
| # Try workspace/symbol a few times to see if it returns results | ||
| from app.modules.intelligence.tools.code_query_tools.lsp_types import ( | ||
| LspMethod, | ||
| ) | ||
|
|
||
| workspace_symbol_count = 0 | ||
| for attempt in range(3): | ||
| try: | ||
| ws_result = await session.send_request( | ||
| LspMethod.WORKSPACE_SYMBOL, {"query": ""} | ||
| ) | ||
| workspace_symbol_count = ( | ||
| len(ws_result) if isinstance(ws_result, list) else 0 | ||
| ) | ||
| if workspace_symbol_count > 0: | ||
| logger.info( | ||
| f"[LSP] workspace/symbol returned {workspace_symbol_count} symbols " | ||
| f"on attempt {attempt + 1}" | ||
| ) | ||
| verification_passed = True | ||
| status_messages.append( | ||
| f"Workspace symbol search working: {workspace_symbol_count} symbols found" | ||
| ) | ||
| break | ||
| await asyncio.sleep(2.0) # Wait between attempts | ||
| except Exception as exc: | ||
| logger.debug( | ||
| f"[LSP] workspace/symbol query failed (attempt {attempt + 1}): {exc}" | ||
| ) | ||
| await asyncio.sleep(2.0) | ||
|
|
||
| # If workspace/symbol doesn't work, verify by checking documentSymbol on files | ||
| # If files can be analyzed, Pyright is working | ||
| if not verification_passed: | ||
| logger.info( | ||
| "[LSP] workspace/symbol returned 0, verifying files can be analyzed..." | ||
| ) | ||
| status_messages.append("Verifying files can be analyzed...") | ||
|
|
||
| # Check a few files to see if they can be analyzed | ||
| files_to_check = min(5, len(opened_file_data)) | ||
| analyzable_count = 0 | ||
|
|
||
| for uri, file_path in opened_file_data[:files_to_check]: | ||
| try: | ||
| doc_result = await session.send_request( | ||
| LspMethod.DOCUMENT_SYMBOL, | ||
| {"textDocument": {"uri": uri}}, | ||
| ) | ||
| symbol_count = ( | ||
| len(doc_result) if isinstance(doc_result, list) else 0 | ||
| ) | ||
| if symbol_count > 0: | ||
| analyzable_count += 1 | ||
| except Exception: | ||
| pass | ||
|
|
||
| # If we can analyze files, Pyright is working | ||
| # This is sufficient for our use case | ||
| if analyzable_count >= (files_to_check * 0.6): # 60% threshold | ||
| verification_passed = True | ||
| logger.info( | ||
| f"[LSP] Files are analyzable: {analyzable_count}/{files_to_check}. " | ||
| "Pyright is working correctly." | ||
| ) | ||
| status_messages.append( | ||
| f"Verification passed: {analyzable_count}/{files_to_check} files can be analyzed" | ||
| ) | ||
| else: | ||
| logger.warning( | ||
| f"[LSP] Only {analyzable_count}/{files_to_check} files are analyzable. " | ||
| "Pyright might not be working correctly." | ||
| ) | ||
| status_messages.append( | ||
| f"Warning: Only {analyzable_count}/{files_to_check} files are analyzable" | ||
| ) | ||
| elif language in ("c", "cpp") and isinstance(session, ClangdSession): | ||
| # For clangd, we need to query files to trigger indexing | ||
| # clangd builds its index lazily when files are queried | ||
| logger.info( | ||
| f"[LSP] Triggering clangd indexing by querying {len(opened_file_data)} files..." | ||
| ) | ||
| status_messages.append("Querying files to trigger clangd indexing...") | ||
|
|
||
| # Query a subset of files to trigger clangd to analyze them | ||
| # We don't need to query all files - querying some will trigger clangd to index them | ||
| files_to_query = min(50, len(opened_file_data)) # Query up to 50 files | ||
| analyzed_count = 0 | ||
| total_symbols_found = 0 | ||
|
|
||
| from app.modules.intelligence.tools.code_query_tools.lsp_types import ( | ||
| LspMethod, | ||
| ) | ||
|
|
||
| for idx, (uri, file_path) in enumerate( | ||
| opened_file_data[:files_to_query], 1 | ||
| ): | ||
| try: | ||
| # Query documentSymbol to trigger clangd to analyze the file | ||
| doc_symbol_result = await session.send_request( | ||
| LspMethod.DOCUMENT_SYMBOL, {"textDocument": {"uri": uri}} | ||
| ) | ||
|
|
||
| symbol_count = ( | ||
| len(doc_symbol_result) | ||
| if isinstance(doc_symbol_result, list) | ||
| else 0 | ||
| ) | ||
|
|
||
| if symbol_count > 0: | ||
| analyzed_count += 1 | ||
| total_symbols_found += symbol_count | ||
|
|
||
| # Small delay to avoid overwhelming clangd | ||
| if idx % 10 == 0: | ||
| await asyncio.sleep(0.5) | ||
| logger.info( | ||
| f"[LSP] Queried {idx}/{files_to_query} files, " | ||
| f"{analyzed_count} analyzed, {total_symbols_found} symbols found" | ||
| ) | ||
| except Exception as exc: | ||
| logger.debug( | ||
| f"[LSP] Error querying documentSymbol for {file_path.name}: {exc}" | ||
| ) | ||
|
|
||
| logger.info( | ||
| f"[LSP] Triggered clangd indexing on {files_to_query} files: " | ||
| f"{analyzed_count} files analyzed, {total_symbols_found} total symbols" | ||
| ) | ||
| status_messages.append( | ||
| f"Triggered indexing: {analyzed_count}/{files_to_query} files analyzed " | ||
| f"({total_symbols_found} symbols)" | ||
| ) | ||
|
|
||
| # Check if compile_commands.json exists (clangd works much better with it) | ||
| cache_dir = self._get_lsp_cache_dir(workspace_root, language) | ||
| workspace_path = Path(workspace_root) | ||
| compile_commands_path = workspace_path / "compile_commands.json" | ||
| compile_flags_path = workspace_path / "compile_flags.txt" | ||
| has_compile_commands = compile_commands_path.exists() | ||
| has_compile_flags = compile_flags_path.exists() | ||
|
|
||
| # Initialize index_file_count for use after the if/else block | ||
| index_file_count = 0 | ||
|
|
||
| # Clangd won't build a comprehensive background index without compile_commands.json | ||
| # Without it, clangd only indexes opened files, not the entire project | ||
| if not has_compile_commands and not has_compile_flags: | ||
| logger.info( | ||
| "[LSP] No compile_commands.json or compile_flags.txt found. " | ||
| "Clangd will only index opened files, not build a full background index. " | ||
| "Skipping native index wait - clangd is ready for queries." | ||
| ) | ||
| status_messages.append( | ||
| "Note: No compile_commands.json - clangd will index files on-demand" | ||
| ) | ||
| verification_passed = True | ||
| # Don't wait for native index - it won't be built without compile_commands.json | ||
| else: | ||
| # Wait for clangd to build its native index | ||
| # Clangd builds its index in the background after files are queried | ||
| # The native index is what makes clangd fast | ||
| status_messages.append( | ||
| "Waiting for clangd to build native index..." | ||
| ) | ||
| logger.info( | ||
| "[LSP] Waiting for clangd to build native index (this makes the LSP server faster)..." | ||
| ) | ||
|
|
||
| # Check clangd's native index location | ||
| # Clangd stores index in .cache/clangd/index in workspace root | ||
| # We have a symlink pointing to our cache directory | ||
| clangd_index_in_workspace = ( | ||
| workspace_path / ".cache" / "clangd" / "index" | ||
| ) | ||
| clangd_cache_target = ( | ||
| cache_dir / "clangd_native" / "clangd" / "index" | ||
| ) | ||
|
|
||
| # Check both locations (symlink target and direct path) | ||
| clangd_index_dir = clangd_cache_target | ||
| if ( | ||
| clangd_index_in_workspace.exists() | ||
| and clangd_index_in_workspace.is_symlink() | ||
| ): | ||
| try: | ||
| resolved = clangd_index_in_workspace.resolve() | ||
| if resolved.exists(): | ||
| clangd_index_dir = resolved | ||
| except Exception: | ||
| pass | ||
|
|
||
| # Poll for clangd index files | ||
| # With compile_commands.json, clangd should build a comprehensive background index | ||
| max_wait_attempts = ( | ||
| 60 # Wait up to 60 attempts (3 minutes with 3s interval) | ||
| ) | ||
| wait_interval = 3.0 # 3 seconds between checks | ||
| index_file_count = 0 | ||
| last_index_file_count = 0 | ||
| stable_count = ( | ||
| 0 # Count how many times the file count stayed the same | ||
| ) | ||
|
|
||
| for wait_attempt in range(max_wait_attempts): | ||
| # Check if clangd has created index files | ||
| if clangd_index_dir.exists(): | ||
| index_files = list(clangd_index_dir.rglob("*")) | ||
| index_files = [f for f in index_files if f.is_file()] | ||
| index_file_count = len(index_files) | ||
|
|
||
| if index_file_count > 0: | ||
| # Check if index is still growing | ||
| if index_file_count == last_index_file_count: | ||
| stable_count += 1 | ||
| else: | ||
| stable_count = 0 | ||
|
|
||
| last_index_file_count = index_file_count | ||
|
|
||
| # Log progress periodically | ||
| if wait_attempt % 10 == 0 or index_file_count > 0: | ||
| logger.info( | ||
| f"[LSP] Clangd index building: {index_file_count} index files " | ||
| f"(attempt {wait_attempt + 1}/{max_wait_attempts}, stable for {stable_count} checks)" | ||
| ) | ||
|
|
||
| # If index count is stable for 3 checks (9 seconds), indexing is likely complete | ||
| if stable_count >= 3 and wait_attempt >= 10: | ||
| verification_passed = True | ||
| status_messages.append( | ||
| f"Clangd native index complete: {index_file_count} index files" | ||
| ) | ||
| logger.info( | ||
| f"[LSP] Clangd native index appears complete: {index_file_count} index files " | ||
| f"(stable for {stable_count * wait_interval:.1f}s)" | ||
| ) | ||
| break | ||
| else: | ||
| # Index directory doesn't exist yet, clangd might not have started | ||
| if wait_attempt < 10: # First 30 seconds | ||
| logger.debug( | ||
| f"[LSP] Clangd index directory not found yet (attempt {wait_attempt + 1})" | ||
| ) | ||
| elif ( | ||
| wait_attempt % 20 == 0 | ||
| ): # Every 60 seconds after initial wait | ||
| logger.info( | ||
| f"[LSP] Clangd index directory still not created after {wait_attempt * wait_interval:.0f}s. " | ||
| "Indexing may take time for large projects." | ||
| ) | ||
|
|
||
| await asyncio.sleep(wait_interval) | ||
|
|
||
| # If we didn't find index files but files were analyzed, that's okay | ||
| # Clangd might still be building the index in the background | ||
| if index_file_count == 0 and analyzed_count > 0: | ||
| logger.info( | ||
| "[LSP] Files were analyzed but native index not yet created. " | ||
| "Clangd may continue building index in background." | ||
| ) | ||
| status_messages.append( | ||
| f"Files analyzed: {analyzed_count} files (index may build in background)" | ||
| ) | ||
| verification_passed = True | ||
|
|
||
| if index_file_count > 0: | ||
| verification_passed = True | ||
| logger.info( | ||
| f"[LSP] Clangd indexing complete: {analyzed_count} files analyzed, " | ||
| f"{index_file_count} native index files" | ||
| ) | ||
| status_messages.append( | ||
| f"Indexing complete: {analyzed_count} files analyzed, " | ||
| f"{index_file_count} native index files created" | ||
| ) | ||
| elif analyzed_count > 0: | ||
| # Files were analyzed but no native index yet - still mark as success | ||
| # The native index might build later | ||
| verification_passed = True | ||
| logger.info( | ||
| f"[LSP] Clangd files analyzed but native index not yet created: " | ||
| f"{analyzed_count} files analyzed" | ||
| ) | ||
| status_messages.append( | ||
| f"Files analyzed: {analyzed_count} files " | ||
| f"(native index may build later)" | ||
| ) | ||
| else: | ||
| logger.warning( | ||
| "[LSP] No files were successfully analyzed by clangd" | ||
| ) | ||
| status_messages.append( | ||
| "Warning: No files were successfully analyzed" | ||
| ) | ||
| verification_passed = False | ||
|
|
||
| elif language == "csharp" and isinstance(session, PyglsClientSession): | ||
| # For OmniSharp (C#), we need to query files to trigger indexing | ||
| # OmniSharp builds its index when files are queried | ||
| logger.info( | ||
| f"[LSP] Triggering OmniSharp indexing by querying {len(opened_file_data)} files..." | ||
| ) | ||
| status_messages.append( | ||
| "Querying files to trigger OmniSharp indexing..." | ||
| ) | ||
|
|
||
| # Query a subset of files to trigger OmniSharp to analyze them | ||
| # We don't need to query all files - querying some will trigger OmniSharp to index them | ||
| files_to_query = min(50, len(opened_file_data)) # Query up to 50 files | ||
| analyzed_count = 0 | ||
| total_symbols_found = 0 | ||
|
|
||
| from app.modules.intelligence.tools.code_query_tools.lsp_types import ( | ||
| LspMethod, | ||
| ) | ||
|
|
||
| for idx, (uri, file_path) in enumerate( | ||
| opened_file_data[:files_to_query], 1 | ||
| ): | ||
| try: | ||
| # Query documentSymbol to trigger OmniSharp to analyze the file | ||
| doc_symbol_result = await session.send_request( | ||
| LspMethod.DOCUMENT_SYMBOL, {"textDocument": {"uri": uri}} | ||
| ) | ||
|
|
||
| symbol_count = ( | ||
| len(doc_symbol_result) | ||
| if isinstance(doc_symbol_result, list) | ||
| else 0 | ||
| ) | ||
|
|
||
| if symbol_count > 0: | ||
| analyzed_count += 1 | ||
| total_symbols_found += symbol_count | ||
|
|
||
| # Small delay to avoid overwhelming OmniSharp | ||
| if idx % 10 == 0: | ||
| await asyncio.sleep(0.5) | ||
| logger.info( | ||
| f"[LSP] Queried {idx}/{files_to_query} files, " | ||
| f"{analyzed_count} analyzed, {total_symbols_found} symbols found" | ||
| ) | ||
| except Exception as exc: | ||
| logger.debug( | ||
| f"[LSP] Error querying documentSymbol for {file_path.name}: {exc}" | ||
| ) | ||
|
|
||
| logger.info( | ||
| f"[LSP] Triggered OmniSharp indexing on {files_to_query} files: " | ||
| f"{analyzed_count} files analyzed, {total_symbols_found} total symbols" | ||
| ) | ||
| status_messages.append( | ||
| f"Triggered indexing: {analyzed_count}/{files_to_query} files analyzed " | ||
| f"({total_symbols_found} symbols)" | ||
| ) | ||
|
|
||
| # Wait for OmniSharp to complete indexing | ||
| # OmniSharp processes files asynchronously, so we wait a bit for it to catch up | ||
| if analyzed_count > 0: | ||
| status_messages.append( | ||
| "Waiting for OmniSharp to complete indexing..." | ||
| ) | ||
| logger.info("[LSP] Waiting for OmniSharp to complete indexing...") | ||
| # Wait a bit for OmniSharp to process the files | ||
| await asyncio.sleep(5.0) | ||
| verification_passed = True | ||
| else: | ||
| logger.warning( | ||
| "[LSP] No files were successfully analyzed by OmniSharp" | ||
| ) | ||
| status_messages.append( | ||
| "Warning: No files were successfully analyzed" | ||
| ) | ||
| verification_passed = False | ||
|
|
||
| else: | ||
| # For other non-Python languages, use the polling method | ||
| if isinstance(session, PyglsClientSession): | ||
| logger.info( | ||
| f"[LSP] Starting to wait for indexing completion for {language} " | ||
| f"(timeout: 60s, interval: 3s)" | ||
| ) | ||
| indexing_complete = await session.wait_for_indexing_complete( | ||
| timeout=60.0, test_query_interval=3.0 | ||
| ) | ||
| verification_passed = indexing_complete | ||
| else: | ||
| await asyncio.sleep(5.0) | ||
| verification_passed = False | ||
|
|
||
| # For Jedi, cache is written synchronously during indexing, so no wait needed | ||
| # For other language servers (like Pyright, clangd), keep files open longer to ensure cache is persisted | ||
| if language == "python" and not isinstance(session, JediSession): | ||
| if verification_passed: | ||
| status_messages.append("Waiting for cache to persist...") | ||
| logger.info("[LSP] Waiting 5 seconds for cache to persist...") | ||
| await asyncio.sleep(5.0) # Give time to write cache files | ||
| else: | ||
| # Even if verification failed, wait a bit - might still be processing | ||
| status_messages.append( | ||
| "Waiting additional time for processing files..." | ||
| ) | ||
| logger.info( | ||
| "[LSP] Verification failed, but waiting 10 more seconds for processing..." | ||
| ) | ||
| await asyncio.sleep(10.0) | ||
| elif language in ("c", "cpp") and isinstance(session, ClangdSession): | ||
| # For clangd, keep files open a bit longer to ensure cache is fully persisted | ||
| if verification_passed: | ||
| status_messages.append("Waiting for clangd cache to persist...") | ||
| logger.info( | ||
| "[LSP] Waiting 10 seconds for clangd cache to persist..." | ||
| ) | ||
| await asyncio.sleep(10.0) # Give clangd time to write cache files | ||
| else: | ||
| # Even if verification failed, wait a bit - clangd might still be processing | ||
| status_messages.append( | ||
| "Waiting additional time for clangd to process files..." | ||
| ) | ||
| logger.info( | ||
| "[LSP] Verification failed, but waiting 15 more seconds for clangd processing..." | ||
| ) | ||
| await asyncio.sleep(15.0) | ||
| elif language == "csharp" and isinstance(session, PyglsClientSession): | ||
| # For OmniSharp, keep files open a bit longer to ensure cache is fully persisted | ||
| if verification_passed: | ||
| status_messages.append("Waiting for OmniSharp cache to persist...") | ||
| logger.info( | ||
| "[LSP] Waiting 5 seconds for OmniSharp cache to persist..." | ||
| ) | ||
| await asyncio.sleep(5.0) # Give OmniSharp time to write cache files | ||
| else: | ||
| # Even if verification failed, wait a bit - OmniSharp might still be processing | ||
| status_messages.append( | ||
| "Waiting additional time for OmniSharp to process files..." | ||
| ) | ||
| logger.info( | ||
| "[LSP] Verification failed, but waiting 10 more seconds for OmniSharp processing..." | ||
| ) | ||
| await asyncio.sleep(10.0) | ||
|
|
||
| # Close all opened files now that indexing is complete | ||
| for uri in opened_files: | ||
| try: | ||
| await session.send_notification( | ||
| "textDocument/didClose", {"textDocument": {"uri": uri}} | ||
| ) | ||
| except Exception: | ||
| pass # Ignore errors when closing files | ||
|
|
||
| if not verification_passed: | ||
| logger.warning( | ||
| f"Index verification failed for {language}, but continuing" | ||
| ) | ||
| status_messages.append( | ||
| "Index verification incomplete, but marking as done" | ||
| ) | ||
|
|
||
| self._warmed_sessions.add(key) | ||
| status_messages.append(f"Workspace indexing complete for {language}") | ||
| logger.info( | ||
| f"[LSP] Warmup session completed for {language} (project: {request.project_id})" | ||
| ) | ||
|
|
||
| # Create a marker file to indicate successful indexing | ||
| # Write a timestamp to the file so we can track when indexing completed | ||
| cache_dir = self._get_lsp_cache_dir(workspace_root, language) | ||
| index_marker = cache_dir / ".indexed" | ||
| try: | ||
| from datetime import datetime | ||
|
|
||
| timestamp = datetime.utcnow().isoformat() | ||
|
|
||
| # Check cache directory for files (Jedi for Python, clangd for C/C++, Pyright for others) | ||
| cache_file_count = 0 | ||
| if language == "python": | ||
| jedi_cache_dir = cache_dir / "jedi_cache" | ||
| if jedi_cache_dir.exists(): | ||
| cache_files = list(jedi_cache_dir.rglob("*.json")) | ||
| cache_file_count = len(cache_files) | ||
| if cache_file_count > 0: | ||
| status_messages.append( | ||
| f"Jedi cache created: {cache_file_count} cache files found" | ||
| ) | ||
| logger.info( | ||
| f"Jedi cache directory contains {cache_file_count} files" | ||
| ) | ||
| else: | ||
| status_messages.append( | ||
| "Warning: Jedi cache directory exists but is empty" | ||
| ) | ||
| logger.warning( | ||
| f"Jedi cache directory {jedi_cache_dir} exists but contains no files." | ||
| ) | ||
| elif language in ("c", "cpp"): | ||
| # Check clangd native index (clangd's built-in cache) | ||
| clangd_cache_target = ( | ||
| cache_dir / "clangd_native" / "clangd" / "index" | ||
| ) | ||
|
|
||
| # Also check workspace .cache location (where clangd actually stores it) | ||
| workspace_path = Path(workspace_root) | ||
| clangd_index_in_workspace = ( | ||
| workspace_path / ".cache" / "clangd" / "index" | ||
| ) | ||
|
|
||
| native_index_count = 0 | ||
|
|
||
| # Count native index files (check both symlink target and workspace location) | ||
| index_dirs_to_check = [clangd_cache_target] | ||
| if clangd_index_in_workspace.exists(): | ||
| if clangd_index_in_workspace.is_symlink(): | ||
| try: | ||
| resolved = clangd_index_in_workspace.resolve() | ||
| if ( | ||
| resolved.exists() | ||
| and resolved not in index_dirs_to_check | ||
| ): | ||
| index_dirs_to_check.append(resolved) | ||
| except Exception: | ||
| pass | ||
| elif clangd_index_in_workspace.is_dir(): | ||
| index_dirs_to_check.append(clangd_index_in_workspace) | ||
|
|
||
| for index_dir in index_dirs_to_check: | ||
| if index_dir.exists(): | ||
| index_files = list(index_dir.rglob("*")) | ||
| index_files = [f for f in index_files if f.is_file()] | ||
| native_index_count += len(index_files) | ||
|
|
||
| if native_index_count > 0: | ||
| status_messages.append( | ||
| f"Clangd native index: {native_index_count} index files" | ||
| ) | ||
| logger.info( | ||
| f"Clangd native index: {native_index_count} index files " | ||
| f"(checked {clangd_index_in_workspace} and {clangd_cache_target})" | ||
| ) | ||
| else: | ||
| # For other languages, check Pyright cache | ||
| pyright_cache_dir = cache_dir / "pyright" | ||
| if pyright_cache_dir.exists(): | ||
| cache_files = list(pyright_cache_dir.rglob("*")) | ||
| cache_files = [f for f in cache_files if f.is_file()] | ||
| cache_file_count = len(cache_files) | ||
| if cache_file_count > 0: | ||
| status_messages.append( | ||
| f"Cache created: {cache_file_count} cache files found" | ||
| ) | ||
| logger.info( | ||
| f"Cache directory contains {cache_file_count} files" | ||
| ) | ||
|
|
||
| # Write final completion status (JSON format) | ||
| # The marker file may have been updated during polling, so we finalize it now | ||
| import json | ||
|
|
||
| completion_data = { | ||
| "status": "completed", | ||
| "indexed_at": timestamp, | ||
| "workspace": workspace_root, | ||
| "verification_passed": verification_passed, | ||
| "cache_files": cache_file_count, | ||
| "files_opened": len(opened_files), | ||
| } | ||
|
|
||
| index_marker.write_text( | ||
| json.dumps(completion_data, indent=2), encoding="utf-8" | ||
| ) | ||
| logger.info( | ||
| f"Created index marker for {language} at {index_marker} " | ||
| f"(workspace: {workspace_root}, timestamp: {timestamp}, " | ||
| f"cache_files: {cache_file_count})" | ||
| ) | ||
| except Exception as exc: | ||
| logger.warning(f"Failed to create index marker for {language}: {exc}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Extreme cognitive complexity requires refactoring.
This 929-line method has a cognitive complexity of 285 (allowed: 15), making it difficult to understand, test, and maintain. The method handles multiple language-specific indexing workflows with significant code duplication.
Consider extracting language-specific indexing logic into separate methods:
_warmup_jedi_session()for Python/Jedi_warmup_pyright_session()for Python/Pyright_warmup_clangd_session()for C/C++_warmup_omnisharp_session()for C#_warmup_generic_session()for other languages
This would improve readability, testability, and reduce duplication. Would you like me to propose a refactored structure?
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis
[failure] 977-977: Refactor this function to reduce its Cognitive Complexity from 285 to the 15 allowed.
🪛 Ruff (0.14.5)
982-982: Unused method argument: config
(ARG002)
1116-1116: Do not catch blind exception: Exception
(BLE001)
1163-1163: Do not catch blind exception: Exception
(BLE001)
1204-1204: Loop control variable file_path not used within loop body
(B007)
1218-1219: try-except-pass detected, consider logging the exception
(S110)
1218-1218: Do not catch blind exception: Exception
(BLE001)
1253-1253: Do not catch blind exception: Exception
(BLE001)
1282-1283: try-except-pass detected, consider logging the exception
(S110)
1282-1282: Do not catch blind exception: Exception
(BLE001)
1354-1354: Do not catch blind exception: Exception
(BLE001)
1372-1372: Loop control variable file_path not used within loop body
(B007)
1383-1384: try-except-pass detected, consider logging the exception
(S110)
1383-1383: Do not catch blind exception: Exception
(BLE001)
1449-1449: Do not catch blind exception: Exception
(BLE001)
1518-1519: try-except-pass detected, consider logging the exception
(S110)
1518-1518: Do not catch blind exception: Exception
(BLE001)
1672-1672: Do not catch blind exception: Exception
(BLE001)
1777-1778: try-except-pass detected, consider logging the exception
(S110)
1777-1777: Do not catch blind exception: Exception
(BLE001)
1849-1850: try-except-pass detected, consider logging the exception
(S110)
1849-1849: Do not catch blind exception: Exception
(BLE001)
1904-1904: Do not catch blind exception: Exception
(BLE001)
app/modules/utils/gvisor_runner.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Early return breaks gVisor on Mac/Windows with Docker.
The early return at lines 230-240 appears to prevent using Docker with gVisor on Mac/Windows. Here's the issue:
- On Mac/Windows,
is_gvisor_available()returnsTrueif Docker with runsc runtime is available (lines 64-71), regardless of whether a local runsc binary exists. get_runsc_binary()may returnNoneon Mac/Windows if there's no local runsc binary (runsc is provided by Docker runtime, not installed locally).- The new early return would then fallback to regular subprocess, bypassing the Docker+gVisor path at lines 274-296.
This breaks the intended functionality where Docker with gVisor should be used on Mac/Windows.
Consider removing the early return or making it platform-specific:
runsc_path = get_runsc_binary()
-if not runsc_path:
- logger.warning(
- "[GVISOR] gVisor runsc binary not found, falling back to regular subprocess (less secure)"
- )
- return _run_command_regular(
- command=command,
- working_dir=working_dir,
- env=env,
- timeout=timeout,
- )
-
-logger.info(f"[GVISOR] gVisor available, using runsc at {runsc_path}")
+if runsc_path:
+ logger.info(f"[GVISOR] runsc binary found at {runsc_path}")
+else:
+ logger.info("[GVISOR] No local runsc binary, will attempt Docker with gVisor runtime")Run this script to verify the intended behavior and check if get_runsc_path() returns None on Mac/Windows with Docker:
🏁 Script executed:
#!/bin/bash
# Check the implementation of get_runsc_path to understand its behavior across platforms
echo "=== Checking get_runsc_path implementation ==="
ast-grep --pattern 'def get_runsc_path($$$):
$$$'
echo -e "\n=== Finding all usages of get_runsc_path ==="
rg -n "get_runsc_path" --type py -C 2
echo -e "\n=== Checking if runsc_path can be None in _run_with_docker_gvisor ==="
rg -n "runsc_path.*Optional" app/modules/utils/gvisor_runner.py -A 2 -B 2Length of output: 1575
The search result is general shell scripting information, which is not what I need. Let me run scripts to directly examine the codebase.
🏁 Script executed:
#!/bin/bash
echo "=== get_runsc_path() implementation ==="
sed -n '274,305p' app/modules/utils/install_gvisor.py
echo -e "\n=== is_gvisor_available() full implementation ==="
sed -n '50,73p' app/modules/utils/gvisor_runner.py
echo -e "\n=== Code after line 240 (Docker path logic) ==="
sed -n '274,296p' app/modules/utils/gvisor_runner.pyLength of output: 2745
Early return breaks gVisor on Mac/Windows with Docker.
The early return at lines 230-240 prevents using Docker with gVisor on Mac/Windows. The issue is confirmed:
get_runsc_path()only detects a local runsc binary (via install path or system PATH). On Mac/Windows, runsc is not available locally—it's provided by Docker's runtime.is_gvisor_available()on Mac/Windows only checks if Docker is available; it does NOT require a local runsc binary.- The new early return gates the Docker+gVisor path (lines 274-296) on
runsc_pathbeing present, bypassing it whenrunsc_pathisNone. - The Docker path at lines 274-296 is designed to handle
runsc_path: Optional[Path], so it gracefully supports theNonecase.
On Mac/Windows with Docker Desktop, this causes a regression: the code falls back to regular subprocess instead of using the intended Docker+gVisor isolation.
Remove the early return or make it platform-specific:
runsc_path = get_runsc_binary()
-if not runsc_path:
- logger.warning(
- "[GVISOR] gVisor runsc binary not found, falling back to regular subprocess (less secure)"
- )
- return _run_command_regular(
- command=command,
- working_dir=working_dir,
- env=env,
- timeout=timeout,
- )
-
-logger.info(f"[GVISOR] gVisor available, using runsc at {runsc_path}")
+if runsc_path:
+ logger.info(f"[GVISOR] runsc binary found at {runsc_path}")
+else:
+ logger.info("[GVISOR] No local runsc binary, will attempt Docker with gVisor runtime")| pytest | ||
| pytest-asyncio | ||
| Pillow==10.3.0 | ||
| python-multipart>=0.0.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Critical pre-existing security vulnerability in python-multipart.
The dependency python-multipart>=0.0.7 (line 78, not changed in this PR) has a HIGH severity DoS vulnerability (GHSA-59g5-xgcq-4qw3) in the multipart/form-data boundary handling. While this is not introduced by your PR, it should be escalated and fixed before merging to main.
Consider upgrading python-multipart to a patched version or investigate whether this vulnerability affects your use case. FastAPI depends on this for form parsing, so remediation is important.
🌐 Web query:
What is the latest patched version of python-multipart that fixes the DoS vulnerability GHSA-59g5-xgcq-4qw3?
💡 Result:
The vulnerability GHSA-59g5-xgcq-4qw3 is fixed in python-multipart version 0.0.18. [1][2]
Sources:
[1] GitHub Advisory for GHSA-59g5-xgcq-4qw3 (python-multipart).
[2] OSV / CVE-2024-53981 entry (notes fix in 0.0.18).
---
🏁 Script executed:
```shell
#!/bin/bash
# Check for lock files and version pinning
fd -type f \( -name "requirements*.txt" -o -name "poetry.lock" -o -name "setup.lock" -o -name "Pipfile*" -o -name "pyproject.toml" \) | head -20
# Also check the full requirements.txt context around line 78
echo "=== Full requirements.txt ==="
cat requirements.txt | head -90 | tail -20
Length of output: 883
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for lock files without using --type
find . -maxdepth 3 -type f \( -name "*.lock" -o -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "Pipfile" \) 2>/dev/null | sort
# Check if requirements.txt was modified in this PR
echo "=== Git status of requirements.txt ==="
git status requirements.txt 2>/dev/null || echo "Not a git repo or file not tracked"
# Check git diff for line 78
echo "=== Git diff around line 78 ==="
git diff requirements.txt 2>/dev/null | grep -A5 -B5 "python-multipart" || echo "No git diff available"
Length of output: 174
Upgrade python-multipart to fix HIGH severity DoS vulnerability GHSA-59g5-xgcq-4qw3.
The dependency python-multipart>=0.0.7 (line 78) contains a HIGH severity DoS vulnerability in multipart/form-data boundary handling. This vulnerability is fixed in python-multipart version 0.0.18. Update the requirement to python-multipart>=0.0.18 to ensure the patched version is installed.
🧰 Tools
🪛 OSV Scanner (2.2.4)
[HIGH] 78-78: python-multipart 0.0.7: Denial of service (DoS) via deformation multipart/form-data boundary
🤖 Prompt for AI Agents
In requirements.txt around line 78, the pinned requirement
python-multipart>=0.0.7 has a HIGH severity DoS vulnerability
(GHSA-59g5-xgcq-4qw3); update the constraint to require the patched release by
changing it to python-multipart>=0.0.18 so installs will pick a fixed version,
then run your dependency install and tests (and regenerate any lockfile) to
ensure compatibility.
| cd "$OMNISHARP_DIR" || exit 1 | ||
|
|
||
| # Check if already downloaded | ||
| if [ -f "omnisharp.${EXT}" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use [[ instead of [ for conditional tests.
For safety and feature richness, replace all single-bracket [ conditionals with double-bracket [[. This prevents word splitting and glob expansion issues. Apply the following changes:
- if [ -f "omnisharp.${EXT}" ]; then
+ if [[ -f "omnisharp.${EXT}" ]]; then- while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
+ while [[ $RETRY_COUNT -lt $MAX_RETRIES ]]; do- if [ "$EXT" = "tar.gz" ]; then
+ if [[ "$EXT" = "tar.gz" ]]; then- if command -v unzip &> /dev/null; then
+ if [[ command -v unzip &> /dev/null ]]; thenReplace all remaining [ ...] conditionals at lines 112, 190, 212, 223, 246, 247, and 250 with [[ ... ]].
Also applies to: 112-112, 185-185, 190-190, 212-212, 223-223, 246-246, 247-247, 250-250
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis
[failure] 102-102: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.
🤖 Prompt for AI Agents
In scripts/install_lsp_servers.sh around lines 102, 112, 185, 190, 212, 223,
246, 247, and 250, the script uses single-bracket [ ... ] conditionals which can
suffer from word-splitting and globbing issues; replace each single-bracket
conditional with the double-bracket [[ ... ]] form (keeping the same test
expressions and spacing) to ensure safer evaluation and richer features (e.g.,
pattern matching), and verify any string comparisons or test operators remain
valid inside [[ ]] after the substitution.




Summary by CodeRabbit
Release Notes
New Features
Improvements
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.