-
Notifications
You must be signed in to change notification settings - Fork 3
Merge PR #9: PathFilter, file_path scoping, and progress reporting #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Codeturion
wants to merge
26
commits into
master
Choose a base branch
from
review/pr-9
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
4f87c3c
docs: add filtering and worktree support design
michael-howell-island 54376ca
docs: add filtering implementation plan
michael-howell-island 6c299b2
feat: add PathFilter with default worktree/submodule skip rules
michael-howell-island 81cc39c
feat: add .codesurfaceignore and --exclude glob support to PathFilter
michael-howell-island b6815e4
feat: thread PathFilter through parse_directory for dir and file excl…
michael-howell-island d13cb01
fix: add path_filter support to Go, Java, Python parse_directory over…
michael-howell-island c2dd5ff
feat: add --exclude and --include-submodules CLI args, wire PathFilte…
michael-howell-island 67fe469
feat: add file_path scoping to search, get_signature, get_class tools
michael-howell-island 4f08f53
fix: Path(rel) in incremental reindex, apply file_path to get_class_m…
michael-howell-island f7116b5
docs: add filtering features, install instructions for fork
michael-howell-island 0385cc6
fix: replace rglob with os.walk to prune excluded dirs before descent
michael-howell-island cc24549
Merge pull request #1 from michael-howell-island/feat/filtering-and-w…
michael-howell-island 8fd7a72
docs: add startup progress reporting design
michael-howell-island 15d5c6b
docs: add startup progress implementation plan
michael-howell-island b56ef8a
feat: add on_progress callback to BaseParser.parse_directory
michael-howell-island 8929c53
feat: forward on_progress callback in Python, Go, Java parser overrides
michael-howell-island be30dec
feat: stream indexing progress to stderr with file count and percentage
michael-howell-island 3753853
fix: remove duplicate done line in main, strengthen progress test ass…
michael-howell-island 075f619
fix: apply is_file_excluded in _count_files to match parser behavior
michael-howell-island b918d8b
fix: call on_progress even on parse failure, move sys imports to modu…
michael-howell-island 260e019
perf: optimize indexing speed and fix startup hang on large JS/TS repos
michael-howell-island 4dc4af7
fix: expand tilde in --project path to support ~/work/cloud style args
michael-howell-island 84432c4
chore: remove fork-specific README additions and planning docs
michael-howell-island 8075185
Merge PR #9 with conflict resolution
Codeturion 77af855
Align C++ parser with BaseParser, restore TS test file skip
Codeturion e5380c6
Fix progress count accuracy and eliminate third directory walk
Codeturion File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,126 @@ | ||
| """Path filtering for codesurface indexing. | ||
|
|
||
| Handles default exclusions (worktrees, submodules, vendored/build dirs) | ||
| and user-configured exclusions (.codesurfaceignore, --exclude CLI flag). | ||
| """ | ||
| from __future__ import annotations | ||
|
|
||
| import fnmatch | ||
| from pathlib import Path | ||
|
|
||
| # Directories excluded by name in every project — vendored deps, build | ||
| # output, VCS internals, and IDE config that never contain user source. | ||
| _DEFAULT_EXCLUDED_DIRS: frozenset[str] = frozenset({ | ||
| # JS / Node | ||
| "node_modules", "bower_components", | ||
| # Python | ||
| ".venv", "venv", "env", "__pycache__", ".tox", ".mypy_cache", | ||
| ".pytest_cache", "site-packages", | ||
| # Go | ||
| "vendor", "testdata", "third_party", "examples", "example", | ||
| # .NET / Java | ||
| "bin", "obj", "packages", ".gradle", ".mvn", | ||
| "generated", "generated-sources", "generated-test-sources", | ||
| # Build output / caches | ||
| "dist", "build", "out", "target", ".next", ".nuxt", ".nx", | ||
| # VCS / IDE | ||
| ".git", ".hg", ".svn", | ||
| ".idea", ".vscode", ".vs", | ||
| # Misc | ||
| ".yarn", ".pnp", "coverage", ".turbo", ".cache", ".worktrees", | ||
| }) | ||
|
|
||
|
|
||
| def _read_git_file(path: Path) -> str | None: | ||
| """Read .git FILE content if present. Returns None if .git is a directory.""" | ||
| git = path / ".git" | ||
| if git.is_file(): | ||
| try: | ||
| return git.read_text().strip() | ||
| except OSError: | ||
| return None | ||
| return None | ||
|
|
||
|
|
||
| def _is_git_worktree(git_content: str) -> bool: | ||
| """True if .git file references a worktrees/ path.""" | ||
| return "/worktrees/" in git_content | ||
|
|
||
|
|
||
| def _is_git_submodule(git_content: str) -> bool: | ||
| """True if .git file references a modules/ path.""" | ||
| return "/modules/" in git_content | ||
|
|
||
|
|
||
| def _read_ignore_file(project_root: Path) -> list[str]: | ||
| """Read .codesurfaceignore and return non-empty, non-comment lines.""" | ||
| ignore_path = project_root / ".codesurfaceignore" | ||
| if not ignore_path.is_file(): | ||
| return [] | ||
| lines = [] | ||
| for line in ignore_path.read_text().splitlines(): | ||
| stripped = line.strip() | ||
| if stripped and not stripped.startswith("#"): | ||
| lines.append(stripped) | ||
| return lines | ||
|
|
||
|
|
||
| class PathFilter: | ||
| """Determines which directories and files to skip during indexing. | ||
|
|
||
| Default exclusions (always applied): | ||
| - Any directory named .worktrees | ||
| - Any subdirectory with a .git FILE referencing /worktrees/ (git worktree) | ||
| - Any subdirectory with a .git FILE referencing /modules/ (submodule), | ||
| unless include_submodules=True | ||
|
|
||
| User exclusions via exclude_globs (CLI) and .codesurfaceignore (project file). | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| project_root: Path, | ||
| exclude_globs: list[str] | None = None, | ||
| include_submodules: bool = False, | ||
| ) -> None: | ||
| self._root = project_root | ||
| self._include_submodules = include_submodules | ||
| self._globs: list[str] = list(exclude_globs or []) | ||
| self._globs.extend(_read_ignore_file(project_root)) | ||
|
|
||
| def is_dir_excluded_name(self, name: str) -> bool: | ||
| """Fast check using only the directory basename (no I/O).""" | ||
| return name in _DEFAULT_EXCLUDED_DIRS | ||
|
|
||
| def is_dir_excluded(self, path: Path) -> bool: | ||
| """Return True if this directory should be skipped entirely.""" | ||
| name = path.name | ||
|
|
||
| if name in _DEFAULT_EXCLUDED_DIRS: | ||
| return True | ||
|
|
||
| # .git FILE detection (worktrees / submodules) | ||
| git_content = _read_git_file(path) | ||
| if git_content is not None: | ||
| if _is_git_worktree(git_content): | ||
| return True | ||
| if _is_git_submodule(git_content) and not self._include_submodules: | ||
| return True | ||
|
|
||
| return False | ||
|
|
||
| def is_file_excluded(self, path: Path) -> bool: | ||
| """Return True if this file matches any user exclusion glob.""" | ||
| if not self._globs: | ||
| return False | ||
| try: | ||
| rel = str(path.relative_to(self._root)).replace("\\", "/") | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| except ValueError: | ||
| return False | ||
| return any(fnmatch.fnmatch(rel, g) for g in self._globs) | ||
|
|
||
| def is_file_excluded_rel(self, rel_path: str) -> bool: | ||
| """Return True if a relative path matches any user exclusion glob.""" | ||
| if not self._globs: | ||
| return False | ||
| return any(fnmatch.fnmatch(rel_path, g) for g in self._globs) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for constructing the
file_pathSQL condition is duplicated here and inget_class_members(lines 212-218). Consider extracting this into a shared utility function to ensure consistency and reduce maintenance overhead.