Skip to content

Conversation

@ghinks
Copy link
Owner

@ghinks ghinks commented Oct 21, 2025

Summary

Implements intelligent cache optimization to reduce redundant GitHub API calls by tracking and using cached PR date ranges for smarter fetch strategies.

Key Features

1. New Cache Query Method

  • Added get_pr_metadata_date_range() to SQLiteCache
  • Queries min/max created_at dates from cached PRs per repository
  • Filters out expired entries automatically
  • Returns date range and count for quick cache coverage assessment

2. Smart Gap Detection

  • Retrieves cached date range before making API calls
  • Backward fetch: Only when requested start date < cached minimum
  • Forward fetch: Only when needed and requested end date > cached maximum
  • Skips redundant fetches when cache already covers requested range

3. Optimized Fetch Logic

The get_pull_requests_between_dates() function now:

  • Checks cached date range first
  • Only fetches missing data gaps
  • Combines cached and newly fetched PRs efficiently
  • Provides debug visibility via print statements

Changes

Modified Files:

  • reviewtally/cache/sqlite_cache.py: Added get_pr_metadata_date_range() method
  • reviewtally/cache/cache_manager.py: Added get_cached_date_range() wrapper
  • reviewtally/queries/get_prs.py: Optimized to use cached date ranges

New Tests:

  • tests/cache/test_pr_metadata_date_range.py (5 new tests)
  • All 65 tests passing (60 existing + 5 new)

Benefits

  • Reduced API calls: Only fetches data that's actually missing
  • Better cache utilization: Leverages pr_metadata table for quick date range lookups
  • Improved performance: Especially beneficial for large repos with extensive PR history
  • Debug visibility: Clear logging when backward/forward fetches occur

Test Plan

  • Run ruff linting - All checks passed
  • Run mypy type checking - Success, no issues found
  • Run all unit tests - 65/65 tests passing
  • Run integration test against expressjs org - Successful
  • Verify cache optimization working - Confirmed via debug output

Performance Impact

Integration tests show efficient cache utilization with minimal redundant API calls on subsequent runs for the same date ranges. The system now intelligently fills only the gaps in cached data.

🤖 Generated with Claude Code

ghinks and others added 4 commits October 21, 2025 08:36
Add intelligent cache optimization to reduce redundant GitHub API calls by
tracking and using cached PR date ranges for smarter fetch strategies.

## Changes

**New SQLiteCache Method:**
- Added `get_pr_metadata_date_range()` to query min/max dates from cached PRs
- Returns date range and count, filtering out expired entries
- Enables quick assessment of cache coverage per repository

**CacheManager Enhancement:**
- Added `get_cached_date_range()` wrapper method
- Integrates seamlessly with existing cache infrastructure

**Optimized Fetch Logic (get_prs.py):**
- Retrieve cached date range before making API calls
- Implement smart gap detection:
  - Backward fetch: Only if requested start < cached minimum
  - Forward fetch: Only if needed and requested end > cached maximum
- Skip redundant fetches when cache already covers requested range
- Print debug messages showing backward/forward fetch operations

**Comprehensive Testing:**
- Created 5 new tests in `tests/cache/test_pr_metadata_date_range.py`
- Test coverage includes:
  - Empty cache handling
  - Single and multiple PR date ranges
  - Repository isolation
  - Expired entry filtering
- All 65 tests passing (60 existing + 5 new)

## Benefits

- **Reduced API calls**: Only fetches missing data gaps instead of full refetches
- **Better cache utilization**: Leverages pr_metadata table for quick queries
- **Improved performance**: Especially beneficial for repos with extensive PR history
- **Debug visibility**: Clear logging when gap fetches occur

## Performance Impact

Integration tests show efficient cache utilization with minimal redundant
API calls on subsequent runs for the same date ranges.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Fix PermissionError on Windows when cleaning up temporary test directories
by explicitly closing the SQLite database connection before fixture teardown.

## Problem

On Windows, the test teardown was failing with:
```
PermissionError: [WinError 32] The process cannot access the file because 
it is being used by another process: api_cache.db
```

This occurred because SQLite keeps file locks on Windows, preventing the
temporary directory cleanup from deleting the database file.

## Solution

Modified the cache_manager fixture to:
- Yield the SQLiteCache instance instead of returning it
- Explicitly call cache.close() during teardown
- Properly release the database connection before temp directory cleanup

## Testing

- ✅ All 65 tests passing on Linux
- ✅ Verified fixture properly closes connection
- ✅ No change to test behavior or assertions
- ✅ Fix addresses Windows-specific file locking issue

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replace pr_index_cache with direct queries on pr_metadata_cache to simplify
caching architecture and reduce code complexity.

Changes:
- Add get_pr_summaries() and get_pr_metadata_stats() to sqlite_cache.py
- Update cache_manager.py to use pr_metadata queries instead of pr_index
- Simplify get_prs.py by removing pr_index management logic
- Remove pr_index table creation and all related methods
- Fix timezone import in cache_manager.py

All tests pass and integration test verified with expressjs organization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
remove get_pr_summary_list and call the cache.get_pr_summaries directly.
@ghinks ghinks merged commit edd9c24 into main Oct 23, 2025
9 checks passed
@ghinks ghinks deleted the feat/update-pr-metadata-cache-criterion branch October 23, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants