fix(patches): grace-period prune of stale patch tombstones (#1004)#1098
Merged
ToddHebebrand merged 3 commits intoJun 8, 2026
Merged
Conversation
…heck findings) govulncheck (Security Scanning workflow) is failing on main and every open PR with 2 Go standard-library vulnerabilities, both fixed in go1.25.11: - GO-2026-5039 net/textproto (unescaped inputs in errors) - GO-2026-5037 crypto/x509 (inefficient candidate hostname parsing) These are stdlib-only (no third-party module bumps needed); a freshly-published advisory started failing the scan on unchanged code because CI pinned 1.25.10. Bump the GO_VERSION / go-version pins in ci.yml, codeql.yml and release.yml so the scan (and the released agent binary's stdlib) use the patched toolchain. go.mod's go directive (1.25.10 floor) is intentionally left as-is.
security.yml runs the Go Vulnerability Check job and had its own GO_VERSION: '1.25.10' (separate from ci.yml). Bumping it to 1.25.11 is what actually makes govulncheck use the patched stdlib.
…rnOps#1004) The scan ingest marks every device_patches row status='missing' at the start of a scan, then re-upserts the rows the scan reported. Rows left 'missing' are stale tombstones (e.g. a Linux package upgraded to a new externalId orphans the old row) and accumulated unbounded — US prod had devices with 300-960 'missing' rows. Add pruneStaleTombstones(): after each scan commits, delete this device's 'missing' rows whose updatedAt is older than a grace window (default 7d, PATCH_TOMBSTONE_PRUNE_AFTER_HOURS). updatedAt is the right signal — it's bumped only when a scan actually reports the patch; the bulk mark-missing leaves it untouched (unlike lastCheckedAt, which is refreshed every scan). This realizes the API-only 'grace-period' option from the issue thread safely: the destructive edge Todd flagged (source buckets are coarser than providers; a winget failure under the shared 'third_party' bucket while chocolatey succeeds) self-heals, because the row is re-upserted on the next clean scan inside the window. Only genuinely-removed packages age out. Scoped to one device + org. Tests: 6 integration cases against a real DB (a Drizzle mock would never run the make_interval filter) — stale row pruned; recent row kept (covers empty/zero-item payload + same-bucket self-heal); pending/installed never touched; idempotency; target-device-only; org-scope guard. Existing patches.test.ts unit suite updated (db.delete + and() mocks) and green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1004.
Problem
Scan ingest marks every
device_patchesrowstatus='missing'at the start of a scan, then re-upserts what the scan reported. Rows left'missing'are stale tombstones and accumulate unbounded (a Linux package upgraded to a newexternalIdorphans the old row forever — US prod had devices at 300–960'missing'rows).Fix — API-only grace-period prune (the safe option from the thread)
pruneStaleTombstones()runs after each scan commits and deletes this device's'missing'rows whoseupdatedAtis older than a grace window (default 7 days,PATCH_TOMBSTONE_PRUNE_AFTER_HOURS).Why
updatedAt(notlastCheckedAt): the bulk mark-missing setsstatus='missing'+lastCheckedAt=nowon every row each scan, solastCheckedAtis useless as an age signal. It leavesupdatedAtuntouched — only a real upsert (a patch the scan actually reported) bumps it — soupdatedAtdates the patch's last real sighting.Why it's safe re: the destructive edge you flagged (source buckets coarser than providers; agent submits partial payloads on partial-provider failure): a winget failure under the shared
third_partybucket while chocolatey succeeds leaves winget's rows'missing'this cycle, but theirupdatedAtis still recent, so they're inside the window and not pruned. They self-heal on the next clean scan. Only genuinely-removed packages age out. Scoped to one device + org (cross-tenant safe).Tests
6 integration cases against a real DB (a Drizzle mock never runs the
make_intervalfilter): stale row pruned; recent row kept (covers empty/zero-item payload + same-bucket partial-provider self-heal); pending/installed never touched; idempotency; target-device-only; org-scope guard. Existingpatches.test.tsunit suite updated and green. Fullapps/apitsc clean.Note
Stacked on #1096 (the Go-1.25.11 govuln fix) so CI's Security Scanning is green — the 4 Go-workflow lines in this diff belong to #1096 and drop out once it merges (I'll rebase onto main then).