Skip to content

Comments

fix(vt): prevent pending scan starvation and retry unresolved results#468

Open
ngutman wants to merge 2 commits intomainfrom
fix/vt-pending-poll-starvation
Open

fix(vt): prevent pending scan starvation and retry unresolved results#468
ngutman wants to merge 2 commits intomainfrom
fix/vt-pending-poll-starvation

Conversation

@ngutman
Copy link

@ngutman ngutman commented Feb 21, 2026

Summary

This PR fixes the VirusTotal pending-state sync gap where skills can stay pending (or miss cached VT analysis) long after VT already has a Code Insight verdict.

Bug

Users reported skills showing pending in ClawHub even when VirusTotal already shows a completed analysis.

Production Analysis (sample skill proof)

Sample: openclaw-workspace-governance-installer

Measured directly from production Convex data:

  • Publish (version.createdAt): 2026-02-21T12:58:03.591Z
  • LLM cache (llmAnalysis.checkedAt): 2026-02-21T12:58:22.979Z (+19.4s)
  • VT cache (vtAnalysis.checkedAt): 2026-02-21T17:16:48.115Z (+258.74m)
  • Skill patch (skill.updatedAt): 2026-02-21T17:16:48.278Z (+163ms after VT cache)

This proves that once ClawHub actually fetches VT, persistence is immediate; the large delay is in the poll/backfill selection path.

Additional proof from the same skill history:

  • 11 versions had sha256hash but missing cached vtAnalysis.
  • Running live vt:fetchResults for those hashes returned:
    • 6 clean
    • 3 suspicious
    • 2 pending

So for 9/11, VT had final results while ClawHub still had no cached VT analysis.

Root Cause

  1. getPendingScanSkillsInternal sampled only a bounded recent window from by_active_updated (max 1000), which can starve older pending records under high update churn.
  2. Versions were excluded from polling if vtAnalysis existed at all, so unresolved states (pending / stale) could become non-retriable.
  3. backfillPendingScans was intended to process all pending skills but used the same bounded selector + recency suppression.

Fix

convex/skills.ts

  • Add exhaustive mode to getPendingScanSkillsInternal for true backfill scans.
  • In normal mode, mix two bounded pools:
    • newest by by_active_updated (desc)
    • oldest by by_active_created (asc)
      Then dedupe + shuffle.
  • Disable recency suppression when skipRecentMinutes <= 0.
  • Skip only finalized VT statuses (clean, malicious, suspicious), and continue polling unresolved statuses (pending, stale, etc).

convex/vt.ts

  • Update backfillPendingScans to call pending selection with:
    • exhaustive: true
    • skipRecentMinutes: 0

Tests

Added: convex/skills.pendingScanQueue.test.ts

  • Verifies unresolved VT entries from the oldest slice are selected while finalized entries are skipped.
  • Verifies exhaustive mode ignores recent-check suppression for manual/backfill flows.

Executed:

  • pnpm test convex/skills.pendingScanQueue.test.ts convex/skills.rateLimit.test.ts

Impact

  • Reduces long-lived false pending/missing VT state.
  • Improves fairness of queue processing under high write churn.
  • Makes backfill behavior match its intent (all pending, not just recently updated window).

Greptile Summary

This PR fixes VT scan queue starvation where older skills stayed pending indefinitely while high-churn records monopolized the bounded queue. The fix adds exhaustive mode for backfills, mixes recent+oldest slices in normal mode, and retries unresolved VT statuses (pending, stale) while skipping finalized ones (clean, malicious, suspicious).

  • convex/skills.ts:1758-1849 — Added exhaustive mode for getPendingScanSkillsInternal that uses .collect() for backfills; normal mode now queries both by_active_updated (desc) and by_active_created (asc) pools, dedupes, and shuffles to prevent starvation
  • convex/skills.ts:1814-1818 — Recency filtering now respects skipRecentMinutes <= 0 to allow immediate retry in backfill flows
  • convex/skills.ts:1834-1840 — Changed filtering logic from "skip if vtAnalysis exists" to "skip only if vtAnalysis.status is finalized (clean, malicious, suspicious)", allowing retry of pending/stale/error states
  • convex/vt.ts:675-676 — Backfill now calls with exhaustive: true and skipRecentMinutes: 0 for comprehensive scans
  • convex/skills.pendingScanQueue.test.ts — Tests verify unresolved records from oldest slice are selected, finalized ones skipped, and exhaustive mode bypasses recency suppression

Confidence Score: 5/5

  • Safe to merge with high confidence — well-tested bug fix with clear scope
  • The PR addresses a specific, well-documented production bug with a targeted fix. The implementation is sound: exhaustive mode properly bypasses limits for backfills, dual-pool querying prevents starvation, and the finalized status set correctly matches VT's possible outcomes. Tests cover both the starvation scenario and exhaustive mode behavior. Code follows existing patterns and includes proper comments.
  • No files require special attention

Last reviewed commit: 62955e3

Follow-up fix

After review, exhaustive pending-scan selection still inherited a limit <= 100 clamp intended for cron polling. This PR now removes that clamp in exhaustive mode so backfillPendingScans(limit: 10000, exhaustive: true) can actually process more than 100 records per invocation.

@vercel
Copy link
Contributor

vercel bot commented Feb 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clawhub Ready Ready Preview, Comment Feb 21, 2026 7:44pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant