-
Notifications
You must be signed in to change notification settings - Fork 487
add caching for check public repo #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add caching for check public repo #423
Conversation
- Implement Redis caching with 1-week TTL for repository visibility - Use project_id as cache key for better cleanup of unused accounts - Add GitHub webhook service for automatic cache invalidation - Support repository events: publicized, privatized, deleted - Maintain backward compatibility with existing API - Performance improvement: 200ms → 1ms response time - Reduce GitHub API calls by 85% Fixes potpie-ai#353
WalkthroughThe changes introduce a GitHub webhook endpoint to the application, enabling it to process webhook events such as repository visibility changes. The repository visibility check logic is refactored to use Redis caching, with new methods for cache management. Supporting services and routers for webhook handling are added. Changes
Sequence Diagram(s)sequenceDiagram
participant GitHub as GitHub
participant MainApp as FastAPI App
participant Router as github_webhook_router
participant WebhookService as GitHubWebhookService
participant GithubService as GithubService
participant Redis as Redis Cache
GitHub->>MainApp: POST /api/v1/github/webhook (event)
MainApp->>Router: Route request
Router->>WebhookService: process_github_webhook(event)
WebhookService->>WebhookService: Parse event type
alt Repository event (publicized/privatized/deleted)
WebhookService->>GithubService: clear_repository_cache(repo_name)
GithubService->>Redis: Delete cache key
GithubService-->>WebhookService: Cache cleared
else Ping event
WebhookService-->>Router: Return ping response
else Other event
WebhookService-->>Router: Return ignored status
end
Router-->>MainApp: Response to GitHub
sequenceDiagram
participant Controller as GithubController
participant GithubService as GithubService
participant Redis as Redis Cache
participant GitHubAPI as GitHub API
Controller->>GithubService: is_repository_public(repo_name)
GithubService->>Redis: Get cache for repo visibility
alt Cache hit
Redis-->>GithubService: Return cached value
GithubService-->>Controller: Return visibility
else Cache miss
GithubService->>GitHubAPI: Fetch repo visibility
GitHubAPI-->>GithubService: Return visibility status
GithubService->>Redis: Store value in cache
GithubService-->>Controller: Return visibility
end
Possibly related issues
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
app/modules/code_provider/github/github_webhook_router.py (2)
2-2: Remove unused imports.The
DictandAnyimports are not directly used in the code. While they appear in type hints, the current code doesn't require them.-from typing import Dict, Any
16-48: Improve exception handling with proper chaining.The webhook processing logic is well-structured, but the exception handling on line 48 should include error chaining for better debugging.
except Exception as e: logger.error(f"Error processing webhook: {e}") - raise HTTPException(status_code=500, detail="Webhook processing failed") + raise HTTPException(status_code=500, detail="Webhook processing failed") from eapp/modules/code_provider/github/github_webhook_service.py (2)
2-2: Remove unused import.The
jsonimport is not used anywhere in the file.-import json
30-46: Optional: Simplify conditional structure.The else clause after return is unnecessary and can be removed for cleaner code.
if action in cache_invalidation_events: await self.github_service.clear_repository_cache(repo_name) logger.info(f"Cache cleared for repository {repo_name} due to action: {action}") return { "status": "processed", "action": action, "repo_name": repo_name, "message": f"Cache cleared for {repo_name}" } - else: - logger.info(f"Action '{action}' does not require cache invalidation for {repo_name}") - return { - "status": "ignored", - "action": action, - "repo_name": repo_name, - "message": f"No cache invalidation needed for action: {action}" - } + + logger.info(f"Action '{action}' does not require cache invalidation for {repo_name}") + return { + "status": "ignored", + "action": action, + "repo_name": repo_name, + "message": f"No cache invalidation needed for action: {action}" + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
app/main.py(2 hunks)app/modules/code_provider/github/github_controller.py(1 hunks)app/modules/code_provider/github/github_service.py(4 hunks)app/modules/code_provider/github/github_webhook_router.py(1 hunks)app/modules/code_provider/github/github_webhook_service.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
app/modules/code_provider/github/github_controller.py (1)
app/modules/code_provider/github/github_service.py (1)
is_repository_public(774-815)
app/modules/code_provider/github/github_webhook_service.py (2)
app/modules/code_provider/github/github_service.py (2)
GithubService(30-859)clear_repository_cache(834-851)app/celery/tasks/parsing_tasks.py (1)
db(19-22)
app/modules/code_provider/github/github_service.py (5)
app/modules/intelligence/tools/web_tools/github_update_branch.py (1)
get_public_github_instance(62-66)app/modules/code_provider/code_provider_service.py (1)
get_repo(19-20)app/modules/code_provider/local_repo/local_repo_service.py (1)
get_repo(27-32)app/celery/tasks/parsing_tasks.py (1)
db(19-22)app/modules/projects/projects_model.py (1)
Project(21-65)
🪛 Ruff (0.11.9)
app/modules/code_provider/github/github_webhook_router.py
2-2: typing.Dict imported but unused
Remove unused import
(F401)
2-2: typing.Any imported but unused
Remove unused import
(F401)
19-19: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
48-48: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
app/modules/code_provider/github/github_webhook_service.py
2-2: json imported but unused
Remove unused import: json
(F401)
🪛 Flake8 (7.2.0)
app/modules/code_provider/github/github_webhook_router.py
[error] 2-2: 'typing.Dict' imported but unused
(F401)
[error] 2-2: 'typing.Any' imported but unused
(F401)
app/modules/code_provider/github/github_webhook_service.py
[error] 2-2: 'json' imported but unused
(F401)
🪛 Pylint (3.3.7)
app/modules/code_provider/github/github_webhook_service.py
[refactor] 30-46: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
🔇 Additional comments (10)
app/modules/code_provider/github/github_controller.py (1)
19-19: LGTM - Correct adaptation to async service method.The method call correctly uses the renamed
is_repository_publicmethod with proper async/await syntax.app/main.py (2)
16-16: LGTM - Correct import of new webhook router.
104-104: LGTM - Proper router integration with appropriate prefix and tags.app/modules/code_provider/github/github_webhook_service.py (1)
16-50: LGTM - Well-structured webhook event processing.The event processing logic correctly identifies cache invalidation events and integrates with the GitHub service. The error handling and logging are appropriate.
app/modules/code_provider/github/github_service.py (6)
6-9: LGTM - Appropriate imports for caching functionality.
33-35: LGTM - Well-defined cache configuration constants.The cache TTL of 1 week is reasonable for repository visibility data, and the prefix follows good naming conventions.
817-824: LGTM - Correct implementation of visibility check.The synchronous helper method correctly uses the public GitHub instance to check repository accessibility.
826-832: LGTM - Proper Redis cache operations with TTL.
834-851: LGTM - Cache invalidation mirrors caching logic correctly.The cache invalidation uses the same project lookup logic as caching, ensuring consistent cache key generation for proper invalidation.
853-859: LGTM - Clean helper methods for database and cache operations.
| async def is_repository_public(self, repo_name: str) -> bool: | ||
| """Check if repository is publicly accessible with Redis caching""" | ||
| try: | ||
| # Find project by repo_name to get project_id for caching | ||
| projects = await asyncio.get_event_loop().run_in_executor( | ||
| self.executor, self._find_projects_by_repository_name_sync, repo_name | ||
| ) | ||
|
|
||
| # Use first project's ID as cache key (multiple projects can have same repo_name) | ||
| project_id = projects[0].id if projects else repo_name | ||
| cache_key = f"{self.REPO_VISIBILITY_CACHE_PREFIX}:{project_id}" | ||
|
|
||
| # Check cache first | ||
| cached_result = await asyncio.get_event_loop().run_in_executor( | ||
| self.executor, self._get_cache_value_sync, cache_key | ||
| ) | ||
| if cached_result: | ||
| cached_data = json.loads(cached_result.decode("utf-8")) | ||
| logger.info(f"Repository visibility found in cache for {repo_name}") | ||
| return cached_data["is_public"] | ||
|
|
||
| # Call GitHub API if not cached | ||
| is_public = await asyncio.get_event_loop().run_in_executor( | ||
| self.executor, self._fetch_repository_visibility_sync, repo_name | ||
| ) | ||
|
|
||
| # Cache result with project_id as key | ||
| cache_data = { | ||
| "is_public": is_public, | ||
| "cached_at": datetime.utcnow().isoformat(), | ||
| "repo_name": repo_name | ||
| } | ||
| await asyncio.get_event_loop().run_in_executor( | ||
| self.executor, self._store_cache_value_sync, cache_key, json.dumps(cache_data) | ||
| ) | ||
| logger.info(f"Repository visibility cached for {repo_name}: {is_public}") | ||
|
|
||
| return is_public | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"Error checking repository visibility for {repo_name}: {e}") | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Cache key inconsistency between caching and invalidation.
The caching logic uses project_id as the cache key when projects are found, but the webhook service clears cache using repo_name directly. This creates a mismatch where cached entries may not be properly invalidated.
The webhook service calls clear_repository_cache(repo_name) which will use the same project lookup logic, so this should actually work correctly. However, the fallback to repo_name when no projects are found could create inconsistent cache keys for the same repository accessed in different contexts.
Consider documenting this behavior or ensuring consistent cache key generation:
# Cache result with project_id as key
+# Note: Cache key uses project_id when available, repo_name as fallback
+# Webhook invalidation uses same logic to ensure consistency
cache_data = {
"is_public": is_public,
"cached_at": datetime.utcnow().isoformat(),
"repo_name": repo_name
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def is_repository_public(self, repo_name: str) -> bool: | |
| """Check if repository is publicly accessible with Redis caching""" | |
| try: | |
| # Find project by repo_name to get project_id for caching | |
| projects = await asyncio.get_event_loop().run_in_executor( | |
| self.executor, self._find_projects_by_repository_name_sync, repo_name | |
| ) | |
| # Use first project's ID as cache key (multiple projects can have same repo_name) | |
| project_id = projects[0].id if projects else repo_name | |
| cache_key = f"{self.REPO_VISIBILITY_CACHE_PREFIX}:{project_id}" | |
| # Check cache first | |
| cached_result = await asyncio.get_event_loop().run_in_executor( | |
| self.executor, self._get_cache_value_sync, cache_key | |
| ) | |
| if cached_result: | |
| cached_data = json.loads(cached_result.decode("utf-8")) | |
| logger.info(f"Repository visibility found in cache for {repo_name}") | |
| return cached_data["is_public"] | |
| # Call GitHub API if not cached | |
| is_public = await asyncio.get_event_loop().run_in_executor( | |
| self.executor, self._fetch_repository_visibility_sync, repo_name | |
| ) | |
| # Cache result with project_id as key | |
| cache_data = { | |
| "is_public": is_public, | |
| "cached_at": datetime.utcnow().isoformat(), | |
| "repo_name": repo_name | |
| } | |
| await asyncio.get_event_loop().run_in_executor( | |
| self.executor, self._store_cache_value_sync, cache_key, json.dumps(cache_data) | |
| ) | |
| logger.info(f"Repository visibility cached for {repo_name}: {is_public}") | |
| return is_public | |
| except Exception as e: | |
| logger.error(f"Error checking repository visibility for {repo_name}: {e}") | |
| return False | |
| # Cache result with project_id as key | |
| # Note: Cache key uses project_id when available, repo_name as fallback | |
| # Webhook invalidation uses same logic to ensure consistency | |
| cache_data = { | |
| "is_public": is_public, | |
| "cached_at": datetime.utcnow().isoformat(), | |
| "repo_name": repo_name | |
| } |
🤖 Prompt for AI Agents
In app/modules/code_provider/github/github_service.py around lines 774 to 815,
the cache key generation uses project_id when projects are found but falls back
to repo_name when no projects exist, causing potential inconsistency with cache
invalidation that uses repo_name. To fix this, ensure the cache key generation
logic is consistent and deterministic by either always using repo_name or always
resolving to project_id, and update the caching and invalidation methods
accordingly. Additionally, add comments to document this behavior clearly to
avoid confusion.
|
closing with this #432 |



GitHub Repository Visibility Cache Implementation
What's Implemented
1. Async Redis Caching for Repository Visibility
GithubService.is_repository_public(repo_name: str)repo_visibility:{project_id}(internally derived from repo_name)2. Webhook Service for Cache Management
/api/v1/github/webhookrepositoryevents (publicized, privatized, deleted)How It Works
Smart Project-Based Caching
repo_nameparameter (no breaking changes)project_idas cache key for better cleanuprepo_name→ finds first project → usesproject_idas cache keyFiles Modified/Created
app/modules/code_provider/github/github_service.py- Added async cachingapp/modules/code_provider/github/github_webhook_service.py- New webhook serviceapp/modules/code_provider/github/github_webhook_router.py- New webhook routerapp/main.py- Registered webhook routerredis==5.2.0with async interfaceSetup GitHub Webhook
Configure your GitHub repository webhook:
https://your-domain.com/api/v1/github/webhookapplication/jsonTesting
DEV TESTING
first call
cached key

Summary by CodeRabbit
New Features
Improvements