Perf/reduce redundant syscalls#201
Perf/reduce redundant syscalls#201SugarFreeDoNoSo wants to merge 8 commits intompfaffenberger:mainfrom
Conversation
📝 WalkthroughWalkthroughRefactors directory listing and file inspection to use os.scandir and stat-based checks, centralizes and caches ripgrep discovery, simplifies config directory creation with os.makedirs(..., exist_ok=True), and tightens per-operation error handling for listing, reading, and log rotation. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@code_puppy/command_line/utils.py`:
- Around line 16-17: The list comprehensions that build dirs and files from
entries call DirEntry.is_dir(follow_symlinks=True) which can raise
OSError/PermissionError and crash the listing; replace those comprehensions in
utils.py with an explicit loop over entries that calls e.is_dir(...) inside a
try/except OSError block, skipping any entry that raises and appending e.name to
either dirs or files accordingly (keep the variables dirs and files and
follow_symlinks=True usage so behavior is preserved).
In `@code_puppy/tools/file_operations.py`:
- Around line 153-166: The cached _find_rg currently stores None permanently;
change it so missing results are not cached: inside _find_rg, after failing to
locate ripgrep (before returning None) call _find_rg.cache_clear() (or implement
a small module-level cache variable instead) so subsequent calls (from
list_files/grep) can rediscover a newly installed "rg"; ensure the function name
_find_rg and callers (e.g., list_files, grep) continue to use the same API.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
code_puppy/command_line/utils.pycode_puppy/config.pycode_puppy/error_logging.pycode_puppy/tools/file_operations.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
code_puppy/tools/file_operations.py (1)
312-339: Optional: reduce extra stat calls in non-recursive listing.
Right now each entry may hit multiple stat-like calls (is_dir,is_file,stat). You can mirror the recursive path by doing a singleDirEntry.stat()and branching onstat.S_ISDIR/S_ISREG, with a small fallback on error to preserve behavior.♻️ Suggested refactor (single stat per entry)
- with os.scandir(directory) as it: - for entry in sorted(it, key=lambda e: e.name): - if entry.is_dir(follow_symlinks=True): - if entry.name.startswith("."): - continue - results.append( - ListedFile( - path=entry.name, - type="directory", - size=0, - full_path=entry.path, - depth=0, - ) - ) - elif entry.is_file(follow_symlinks=True): - try: - size = entry.stat().st_size - except OSError: - size = 0 - results.append( - ListedFile( - path=entry.name, - type="file", - size=size, - full_path=entry.path, - depth=0, - ) - ) + with os.scandir(directory) as it: + for entry in sorted(it, key=lambda e: e.name): + try: + st = entry.stat() + mode = st.st_mode + except OSError: + # Fallback: keep previous behavior on stat failures + if entry.is_dir(follow_symlinks=True): + if entry.name.startswith("."): + continue + results.append( + ListedFile( + path=entry.name, + type="directory", + size=0, + full_path=entry.path, + depth=0, + ) + ) + elif entry.is_file(follow_symlinks=True): + results.append( + ListedFile( + path=entry.name, + type="file", + size=0, + full_path=entry.path, + depth=0, + ) + ) + continue + + if stat.S_ISDIR(mode): + if entry.name.startswith("."): + continue + results.append( + ListedFile( + path=entry.name, + type="directory", + size=0, + full_path=entry.path, + depth=0, + ) + ) + elif stat.S_ISREG(mode): + results.append( + ListedFile( + path=entry.name, + type="file", + size=st.st_size, + full_path=entry.path, + depth=0, + ) + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code_puppy/tools/file_operations.py` around lines 312 - 339, Reduce duplicate stat-like calls by calling entry.stat(follow_symlinks=True) once and branching on stat.S_ISDIR/stat.S_ISREG: inside the os.scandir(directory) loop (where ListedFile objects are created), replace the is_dir/is_file checks with a single try: st = entry.stat(follow_symlinks=True) except OSError: set size=0 and skip or treat as file per current behavior; then use stat.S_ISDIR(st.st_mode) to detect directories (preserve the entry.name.startswith(".") skip) and stat.S_ISREG for files (use st.st_size for size). Ensure you still populate ListedFile(path=entry.name, type=..., size=..., full_path=entry.path, depth=0) and preserve follow_symlinks semantics and the OSError fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@code_puppy/tools/file_operations.py`:
- Around line 312-339: Reduce duplicate stat-like calls by calling
entry.stat(follow_symlinks=True) once and branching on
stat.S_ISDIR/stat.S_ISREG: inside the os.scandir(directory) loop (where
ListedFile objects are created), replace the is_dir/is_file checks with a single
try: st = entry.stat(follow_symlinks=True) except OSError: set size=0 and skip
or treat as file per current behavior; then use stat.S_ISDIR(st.st_mode) to
detect directories (preserve the entry.name.startswith(".") skip) and
stat.S_ISREG for files (use st.st_size for size). Ensure you still populate
ListedFile(path=entry.name, type=..., size=..., full_path=entry.path, depth=0)
and preserve follow_symlinks semantics and the OSError fallback.
|
Are you working on code bases with like 50 million lines or something? Please explain slowness... You sure it wasn't just LLM getting rate limited and exponential backoff? |
|
Not critical, just a micro-optimization. With multiple agents running in parallel on the same filesystem, contention was causing the same ops to slow down dramatically, so I tidied it up. |
It was taking longer than it had any right to, so I took a look under the hood. This trims redundant filesystem syscalls (scandir/stat batching), caches ripgrep lookup, and removes a few avoidable checks.
Summary by CodeRabbit