Skip to content

Conversation

@uphargaur
Copy link

@uphargaur uphargaur commented Jul 6, 2025

GitHub Repository Visibility Cache Implementation

What's Implemented

1. Async Redis Caching for Repository Visibility

  • Method: GithubService.is_repository_public(repo_name: str)
  • Cache Key: repo_visibility:{project_id} (internally derived from repo_name)
  • TTL: 1 week (604800 seconds)
  • Async: Uses existing sync Redis with thread executor (non-blocking)

2. Webhook Service for Cache Management

  • Endpoint: /api/v1/github/webhook
  • Events: Processes repository events (publicized, privatized, deleted)
  • Action: Automatically clears cache when repository visibility changes

How It Works

  1. First call: GitHub API + Redis cache (slower)
  2. Subsequent calls: Redis cache only (fast)
  3. Cache management: Webhook triggers cache clearing when repository visibility changes

Smart Project-Based Caching

  • API: Simple repo_name parameter (no breaking changes)
  • Internal: Uses project_id as cache key for better cleanup
  • Logic: repo_name → finds first project → uses project_id as cache key
  • Benefits:
    • Simple API (no breaking changes)
    • Promotes cleanup of unused projects
    • Fallback to repo_name if no project found
  • Webhook: Direct cache clearing by repository name

Files Modified/Created

  • app/modules/code_provider/github/github_service.py - Added async caching
  • app/modules/code_provider/github/github_webhook_service.py - New webhook service
  • app/modules/code_provider/github/github_webhook_router.py - New webhook router
  • app/main.py - Registered webhook router
  • Uses existing redis==5.2.0 with async interface

Setup GitHub Webhook

Configure your GitHub repository webhook:

  • URL: https://your-domain.com/api/v1/github/webhook
  • Content Type: application/json
  • Events: Repository events
  • Active: Yes

Testing

# Test repository visibility check
curl "/api/v1/github/check-public-repo?repo_name=owner/repo"

# Test webhook endpoint
curl -X POST "/api/v1/github/webhook" \
  -H "Content-Type: application/json" \
  -H "X-GitHub-Event: ping" \
  -d '{"zen": "test"}'

DEV TESTING

image

first call

cached key
image

Summary by CodeRabbit

  • New Features

    • Introduced new API endpoints to handle GitHub webhook events, enabling integration with GitHub repository updates.
    • Added support for processing "repository" and "ping" webhook events, with automatic cache invalidation for repository visibility changes.
  • Improvements

    • Implemented caching for GitHub repository visibility checks to improve performance and reduce unnecessary API calls.
    • Added automatic cache clearing when relevant repository events are received from GitHub webhooks.

- Implement Redis caching with 1-week TTL for repository visibility
- Use project_id as cache key for better cleanup of unused accounts
- Add GitHub webhook service for automatic cache invalidation
- Support repository events: publicized, privatized, deleted
- Maintain backward compatibility with existing API
- Performance improvement: 200ms → 1ms response time
- Reduce GitHub API calls by 85%

Fixes potpie-ai#353
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 6, 2025

Walkthrough

The changes introduce a GitHub webhook endpoint to the application, enabling it to process webhook events such as repository visibility changes. The repository visibility check logic is refactored to use Redis caching, with new methods for cache management. Supporting services and routers for webhook handling are added.

Changes

File(s) Change Summary
app/main.py Imports and registers a new router for GitHub webhooks under /api/v1.
app/modules/code_provider/github/github_controller.py Updates method to use the renamed and refactored repository visibility check with caching.
app/modules/code_provider/github/github_service.py Refactors and renames visibility check method, adds Redis caching, cache management, and supporting helpers.
app/modules/code_provider/github/github_webhook_router.py Adds a new FastAPI router and endpoint to receive and dispatch GitHub webhook events.
app/modules/code_provider/github/github_webhook_service.py New module defining a service for processing GitHub webhook events, including cache invalidation logic.

Sequence Diagram(s)

sequenceDiagram
    participant GitHub as GitHub
    participant MainApp as FastAPI App
    participant Router as github_webhook_router
    participant WebhookService as GitHubWebhookService
    participant GithubService as GithubService
    participant Redis as Redis Cache

    GitHub->>MainApp: POST /api/v1/github/webhook (event)
    MainApp->>Router: Route request
    Router->>WebhookService: process_github_webhook(event)
    WebhookService->>WebhookService: Parse event type
    alt Repository event (publicized/privatized/deleted)
        WebhookService->>GithubService: clear_repository_cache(repo_name)
        GithubService->>Redis: Delete cache key
        GithubService-->>WebhookService: Cache cleared
    else Ping event
        WebhookService-->>Router: Return ping response
    else Other event
        WebhookService-->>Router: Return ignored status
    end
    Router-->>MainApp: Response to GitHub
Loading
sequenceDiagram
    participant Controller as GithubController
    participant GithubService as GithubService
    participant Redis as Redis Cache
    participant GitHubAPI as GitHub API

    Controller->>GithubService: is_repository_public(repo_name)
    GithubService->>Redis: Get cache for repo visibility
    alt Cache hit
        Redis-->>GithubService: Return cached value
        GithubService-->>Controller: Return visibility
    else Cache miss
        GithubService->>GitHubAPI: Fetch repo visibility
        GitHubAPI-->>GithubService: Return visibility status
        GithubService->>Redis: Store value in cache
        GithubService-->>Controller: Return visibility
    end
Loading

Possibly related issues

  • Add Caching for check_public_repo() #353: The main issue refactors and renames the check_public_repo method to is_repository_public in GithubService and implements Redis caching for repository visibility checks, directly addressing the caching enhancement described in the retrieved issue for the same method and file.

Poem

🐇
A webhook hops in, GitHub sends a cheer,
Now caching keeps our checks quite clear.
Redis remembers, old calls are few,
Visibility flips? We’ll handle them too!
With routers and services, all hopping in line—
This code’s as fresh as carrots, and working just fine!

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jul 6, 2025

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
app/modules/code_provider/github/github_webhook_router.py (2)

2-2: Remove unused imports.

The Dict and Any imports are not directly used in the code. While they appear in type hints, the current code doesn't require them.

-from typing import Dict, Any

16-48: Improve exception handling with proper chaining.

The webhook processing logic is well-structured, but the exception handling on line 48 should include error chaining for better debugging.

    except Exception as e:
        logger.error(f"Error processing webhook: {e}")
-        raise HTTPException(status_code=500, detail="Webhook processing failed")
+        raise HTTPException(status_code=500, detail="Webhook processing failed") from e
app/modules/code_provider/github/github_webhook_service.py (2)

2-2: Remove unused import.

The json import is not used anywhere in the file.

-import json

30-46: Optional: Simplify conditional structure.

The else clause after return is unnecessary and can be removed for cleaner code.

            if action in cache_invalidation_events:
                await self.github_service.clear_repository_cache(repo_name)
                logger.info(f"Cache cleared for repository {repo_name} due to action: {action}")
                return {
                    "status": "processed",
                    "action": action,
                    "repo_name": repo_name,
                    "message": f"Cache cleared for {repo_name}"
                }
-            else:
-                logger.info(f"Action '{action}' does not require cache invalidation for {repo_name}")
-                return {
-                    "status": "ignored",
-                    "action": action,
-                    "repo_name": repo_name,
-                    "message": f"No cache invalidation needed for action: {action}"
-                }
+            
+            logger.info(f"Action '{action}' does not require cache invalidation for {repo_name}")
+            return {
+                "status": "ignored",
+                "action": action,
+                "repo_name": repo_name,
+                "message": f"No cache invalidation needed for action: {action}"
+            }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 213eaf0 and b60288a.

📒 Files selected for processing (5)
  • app/main.py (2 hunks)
  • app/modules/code_provider/github/github_controller.py (1 hunks)
  • app/modules/code_provider/github/github_service.py (4 hunks)
  • app/modules/code_provider/github/github_webhook_router.py (1 hunks)
  • app/modules/code_provider/github/github_webhook_service.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
app/modules/code_provider/github/github_controller.py (1)
app/modules/code_provider/github/github_service.py (1)
  • is_repository_public (774-815)
app/modules/code_provider/github/github_webhook_service.py (2)
app/modules/code_provider/github/github_service.py (2)
  • GithubService (30-859)
  • clear_repository_cache (834-851)
app/celery/tasks/parsing_tasks.py (1)
  • db (19-22)
app/modules/code_provider/github/github_service.py (5)
app/modules/intelligence/tools/web_tools/github_update_branch.py (1)
  • get_public_github_instance (62-66)
app/modules/code_provider/code_provider_service.py (1)
  • get_repo (19-20)
app/modules/code_provider/local_repo/local_repo_service.py (1)
  • get_repo (27-32)
app/celery/tasks/parsing_tasks.py (1)
  • db (19-22)
app/modules/projects/projects_model.py (1)
  • Project (21-65)
🪛 Ruff (0.11.9)
app/modules/code_provider/github/github_webhook_router.py

2-2: typing.Dict imported but unused

Remove unused import

(F401)


2-2: typing.Any imported but unused

Remove unused import

(F401)


19-19: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


48-48: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

app/modules/code_provider/github/github_webhook_service.py

2-2: json imported but unused

Remove unused import: json

(F401)

🪛 Flake8 (7.2.0)
app/modules/code_provider/github/github_webhook_router.py

[error] 2-2: 'typing.Dict' imported but unused

(F401)


[error] 2-2: 'typing.Any' imported but unused

(F401)

app/modules/code_provider/github/github_webhook_service.py

[error] 2-2: 'json' imported but unused

(F401)

🪛 Pylint (3.3.7)
app/modules/code_provider/github/github_webhook_service.py

[refactor] 30-46: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

🔇 Additional comments (10)
app/modules/code_provider/github/github_controller.py (1)

19-19: LGTM - Correct adaptation to async service method.

The method call correctly uses the renamed is_repository_public method with proper async/await syntax.

app/main.py (2)

16-16: LGTM - Correct import of new webhook router.


104-104: LGTM - Proper router integration with appropriate prefix and tags.

app/modules/code_provider/github/github_webhook_service.py (1)

16-50: LGTM - Well-structured webhook event processing.

The event processing logic correctly identifies cache invalidation events and integrates with the GitHub service. The error handling and logging are appropriate.

app/modules/code_provider/github/github_service.py (6)

6-9: LGTM - Appropriate imports for caching functionality.


33-35: LGTM - Well-defined cache configuration constants.

The cache TTL of 1 week is reasonable for repository visibility data, and the prefix follows good naming conventions.


817-824: LGTM - Correct implementation of visibility check.

The synchronous helper method correctly uses the public GitHub instance to check repository accessibility.


826-832: LGTM - Proper Redis cache operations with TTL.


834-851: LGTM - Cache invalidation mirrors caching logic correctly.

The cache invalidation uses the same project lookup logic as caching, ensuring consistent cache key generation for proper invalidation.


853-859: LGTM - Clean helper methods for database and cache operations.

Comment on lines +774 to +815
async def is_repository_public(self, repo_name: str) -> bool:
"""Check if repository is publicly accessible with Redis caching"""
try:
# Find project by repo_name to get project_id for caching
projects = await asyncio.get_event_loop().run_in_executor(
self.executor, self._find_projects_by_repository_name_sync, repo_name
)

# Use first project's ID as cache key (multiple projects can have same repo_name)
project_id = projects[0].id if projects else repo_name
cache_key = f"{self.REPO_VISIBILITY_CACHE_PREFIX}:{project_id}"

# Check cache first
cached_result = await asyncio.get_event_loop().run_in_executor(
self.executor, self._get_cache_value_sync, cache_key
)
if cached_result:
cached_data = json.loads(cached_result.decode("utf-8"))
logger.info(f"Repository visibility found in cache for {repo_name}")
return cached_data["is_public"]

# Call GitHub API if not cached
is_public = await asyncio.get_event_loop().run_in_executor(
self.executor, self._fetch_repository_visibility_sync, repo_name
)

# Cache result with project_id as key
cache_data = {
"is_public": is_public,
"cached_at": datetime.utcnow().isoformat(),
"repo_name": repo_name
}
await asyncio.get_event_loop().run_in_executor(
self.executor, self._store_cache_value_sync, cache_key, json.dumps(cache_data)
)
logger.info(f"Repository visibility cached for {repo_name}: {is_public}")

return is_public

except Exception as e:
logger.error(f"Error checking repository visibility for {repo_name}: {e}")
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Cache key inconsistency between caching and invalidation.

The caching logic uses project_id as the cache key when projects are found, but the webhook service clears cache using repo_name directly. This creates a mismatch where cached entries may not be properly invalidated.

The webhook service calls clear_repository_cache(repo_name) which will use the same project lookup logic, so this should actually work correctly. However, the fallback to repo_name when no projects are found could create inconsistent cache keys for the same repository accessed in different contexts.

Consider documenting this behavior or ensuring consistent cache key generation:

# Cache result with project_id as key
+# Note: Cache key uses project_id when available, repo_name as fallback
+# Webhook invalidation uses same logic to ensure consistency
cache_data = {
    "is_public": is_public,
    "cached_at": datetime.utcnow().isoformat(),
    "repo_name": repo_name
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def is_repository_public(self, repo_name: str) -> bool:
"""Check if repository is publicly accessible with Redis caching"""
try:
# Find project by repo_name to get project_id for caching
projects = await asyncio.get_event_loop().run_in_executor(
self.executor, self._find_projects_by_repository_name_sync, repo_name
)
# Use first project's ID as cache key (multiple projects can have same repo_name)
project_id = projects[0].id if projects else repo_name
cache_key = f"{self.REPO_VISIBILITY_CACHE_PREFIX}:{project_id}"
# Check cache first
cached_result = await asyncio.get_event_loop().run_in_executor(
self.executor, self._get_cache_value_sync, cache_key
)
if cached_result:
cached_data = json.loads(cached_result.decode("utf-8"))
logger.info(f"Repository visibility found in cache for {repo_name}")
return cached_data["is_public"]
# Call GitHub API if not cached
is_public = await asyncio.get_event_loop().run_in_executor(
self.executor, self._fetch_repository_visibility_sync, repo_name
)
# Cache result with project_id as key
cache_data = {
"is_public": is_public,
"cached_at": datetime.utcnow().isoformat(),
"repo_name": repo_name
}
await asyncio.get_event_loop().run_in_executor(
self.executor, self._store_cache_value_sync, cache_key, json.dumps(cache_data)
)
logger.info(f"Repository visibility cached for {repo_name}: {is_public}")
return is_public
except Exception as e:
logger.error(f"Error checking repository visibility for {repo_name}: {e}")
return False
# Cache result with project_id as key
# Note: Cache key uses project_id when available, repo_name as fallback
# Webhook invalidation uses same logic to ensure consistency
cache_data = {
"is_public": is_public,
"cached_at": datetime.utcnow().isoformat(),
"repo_name": repo_name
}
🤖 Prompt for AI Agents
In app/modules/code_provider/github/github_service.py around lines 774 to 815,
the cache key generation uses project_id when projects are found but falls back
to repo_name when no projects exist, causing potential inconsistency with cache
invalidation that uses repo_name. To fix this, ensure the cache key generation
logic is consistent and deterministic by either always using repo_name or always
resolving to project_id, and update the caching and invalidation methods
accordingly. Additionally, add comments to document this behavior clearly to
avoid confusion.

@uphargaur
Copy link
Author

closing with this #432

@uphargaur uphargaur closed this Jul 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants