Load stale cache immediately on TTL expiry, refresh in background#244
Load stale cache immediately on TTL expiry, refresh in background#244nhogade wants to merge 1 commit intokorotovsky:masterfrom
Conversation
When the disk cache exists but has expired, load it into the snapshot immediately so tools are available without waiting for a full API refetch. The fresh data is then fetched and the snapshot is atomically swapped when complete. On large Enterprise Grid workspaces (30K+ channels), the full cache rebuild takes ~9 minutes. Previously, all tool calls would block with "cache is not ready yet" during this time. Now they work instantly with slightly stale data while the refresh happens transparently. This applies to both the users and channels cache paths. Fixes korotovsky#243
chriscoey
left a comment
There was a problem hiding this comment.
Good approach to the cold-start problem. The key change — always loading the existing cache into the snapshot and setting usersReady/channelsReady before falling through to the API fetch — means tools are available instantly with slightly stale data. The atomic snapshot swap on refresh completion keeps concurrent access safe.
The pattern is applied symmetrically across both users and channels caches, which makes the change easy to follow. An additional benefit: if the API fetch after loading stale data fails (e.g., rate limited), the stale data is still available — strictly better than the current behavior where an expired cache is discarded before re-fetching.
This would be a big improvement for large workspaces where cold refreshes take minutes.
|
I want to second @chriscoey's comment above. I run this server on a large Enterprise Grid workspace with a long cache TTL and a persistent volume mount so the cache survives container restarts. This setup gives me sub-second startups throughout the day, but the first start after the TTL expires still blocks for 5–10 minutes while the full API rebuild runs. This PR would eliminate that last cold-start penalty. I would love to see this get merged. Happy to help test if that would be useful. |
Summary
When the disk cache exists but has expired (TTL), load it into the snapshot immediately so tools are available without waiting for a full API refetch. The fresh data is then fetched and the snapshot is atomically swapped when complete.
On large Enterprise Grid workspaces (30K+ channels), the full cache rebuild takes ~9 minutes. Previously, all tool calls blocked with "cache is not ready yet" during this period. Now they work instantly with slightly stale data while the refresh happens transparently.
Changes
Single file change (
pkg/provider/api.go) — bothrefreshUsersInternalandrefreshChannelsInternal:ready = trueready = true→ fetch fresh data → atomically swap snapshotThe key insight is that
ProvideUsersMap()andProvideChannelsMaps()useatomic.Pointerloads (no lock needed), so tool calls can read the stale snapshot while the refresh goroutine holds the write lock for the API fetch.How it works
atomic.Pointersnapshot →channelsReady = true/usersReady = trueIsReady()returnstrue→ server starts accepting tool calls immediatelyatomic.Pointer.Store()swaps the snapshot seamlesslyTesting
go test ./pkg/provider/...)go build ./...succeedsFixes #243