fix(vt): prevent pending scan starvation and retry unresolved results#468
Open
fix(vt): prevent pending scan starvation and retry unresolved results#468
Conversation
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes the VirusTotal pending-state sync gap where skills can stay
pending(or miss cached VT analysis) long after VT already has a Code Insight verdict.Bug
Users reported skills showing pending in ClawHub even when VirusTotal already shows a completed analysis.
Production Analysis (sample skill proof)
Sample:
openclaw-workspace-governance-installerMeasured directly from production Convex data:
version.createdAt):2026-02-21T12:58:03.591ZllmAnalysis.checkedAt):2026-02-21T12:58:22.979Z(+19.4s)vtAnalysis.checkedAt):2026-02-21T17:16:48.115Z(+258.74m)skill.updatedAt):2026-02-21T17:16:48.278Z(+163msafter VT cache)This proves that once ClawHub actually fetches VT, persistence is immediate; the large delay is in the poll/backfill selection path.
Additional proof from the same skill history:
11versions hadsha256hashbut missing cachedvtAnalysis.vt:fetchResultsfor those hashes returned:6 clean3 suspicious2 pendingSo for
9/11, VT had final results while ClawHub still had no cached VT analysis.Root Cause
getPendingScanSkillsInternalsampled only a bounded recent window fromby_active_updated(max 1000), which can starve older pending records under high update churn.vtAnalysisexisted at all, so unresolved states (pending/stale) could become non-retriable.backfillPendingScanswas intended to process all pending skills but used the same bounded selector + recency suppression.Fix
convex/skills.tsexhaustivemode togetPendingScanSkillsInternalfor true backfill scans.by_active_updated(desc)by_active_created(asc)Then dedupe + shuffle.
skipRecentMinutes <= 0.clean,malicious,suspicious), and continue polling unresolved statuses (pending,stale, etc).convex/vt.tsbackfillPendingScansto call pending selection with:exhaustive: trueskipRecentMinutes: 0Tests
Added:
convex/skills.pendingScanQueue.test.tsExecuted:
pnpm test convex/skills.pendingScanQueue.test.ts convex/skills.rateLimit.test.tsImpact
Greptile Summary
This PR fixes VT scan queue starvation where older skills stayed
pendingindefinitely while high-churn records monopolized the bounded queue. The fix adds exhaustive mode for backfills, mixes recent+oldest slices in normal mode, and retries unresolved VT statuses (pending,stale) while skipping finalized ones (clean,malicious,suspicious).exhaustivemode forgetPendingScanSkillsInternalthat uses.collect()for backfills; normal mode now queries bothby_active_updated(desc) andby_active_created(asc) pools, dedupes, and shuffles to prevent starvationskipRecentMinutes <= 0to allow immediate retry in backfill flowsvtAnalysisexists" to "skip only ifvtAnalysis.statusis finalized (clean,malicious,suspicious)", allowing retry ofpending/stale/error statesexhaustive: trueandskipRecentMinutes: 0for comprehensive scansConfidence Score: 5/5
Last reviewed commit: 62955e3
Follow-up fix
After review, exhaustive pending-scan selection still inherited a
limit <= 100clamp intended for cron polling. This PR now removes that clamp in exhaustive mode sobackfillPendingScans(limit: 10000, exhaustive: true)can actually process more than 100 records per invocation.