Skip to content

Load stale cache immediately on TTL expiry, refresh in background#244

Open
nhogade wants to merge 1 commit intokorotovsky:masterfrom
nhogade:feat/stale-cache-immediate-load
Open

Load stale cache immediately on TTL expiry, refresh in background#244
nhogade wants to merge 1 commit intokorotovsky:masterfrom
nhogade:feat/stale-cache-immediate-load

Conversation

@nhogade
Copy link
Copy Markdown

@nhogade nhogade commented Mar 18, 2026

Summary

When the disk cache exists but has expired (TTL), load it into the snapshot immediately so tools are available without waiting for a full API refetch. The fresh data is then fetched and the snapshot is atomically swapped when complete.

On large Enterprise Grid workspaces (30K+ channels), the full cache rebuild takes ~9 minutes. Previously, all tool calls blocked with "cache is not ready yet" during this period. Now they work instantly with slightly stale data while the refresh happens transparently.

Changes

Single file change (pkg/provider/api.go) — both refreshUsersInternal and refreshChannelsInternal:

  • Before: expired cache → discard → block on full API fetch → set ready = true
  • After: expired cache → load stale data → set ready = true → fetch fresh data → atomically swap snapshot

The key insight is that ProvideUsersMap() and ProvideChannelsMaps() use atomic.Pointer loads (no lock needed), so tool calls can read the stale snapshot while the refresh goroutine holds the write lock for the API fetch.

How it works

  1. Cache file exists but TTL expired
  2. Stale data loaded into atomic.Pointer snapshot → channelsReady = true / usersReady = true
  3. IsReady() returns true → server starts accepting tool calls immediately
  4. Function continues to fetch fresh data from Slack API (still under write lock — prevents concurrent refreshes)
  5. When fresh data arrives, atomic.Pointer.Store() swaps the snapshot seamlessly
  6. Updated cache written to disk

Testing

  • All existing tests pass (go test ./pkg/provider/...)
  • Verified go build ./... succeeds

Fixes #243

When the disk cache exists but has expired, load it into the snapshot
immediately so tools are available without waiting for a full API
refetch. The fresh data is then fetched and the snapshot is atomically
swapped when complete.

On large Enterprise Grid workspaces (30K+ channels), the full cache
rebuild takes ~9 minutes. Previously, all tool calls would block with
"cache is not ready yet" during this time. Now they work instantly
with slightly stale data while the refresh happens transparently.

This applies to both the users and channels cache paths.

Fixes korotovsky#243
Copy link
Copy Markdown

@chriscoey chriscoey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good approach to the cold-start problem. The key change — always loading the existing cache into the snapshot and setting usersReady/channelsReady before falling through to the API fetch — means tools are available instantly with slightly stale data. The atomic snapshot swap on refresh completion keeps concurrent access safe.

The pattern is applied symmetrically across both users and channels caches, which makes the change easy to follow. An additional benefit: if the API fetch after loading stale data fails (e.g., rate limited), the stale data is still available — strictly better than the current behavior where an expired cache is discarded before re-fetching.

This would be a big improvement for large workspaces where cold refreshes take minutes.

@gkatz2
Copy link
Copy Markdown

gkatz2 commented Apr 1, 2026

I want to second @chriscoey's comment above. I run this server on a large Enterprise Grid workspace with a long cache TTL and a persistent volume mount so the cache survives container restarts. This setup gives me sub-second startups throughout the day, but the first start after the TTL expires still blocks for 5–10 minutes while the full API rebuild runs.

This PR would eliminate that last cold-start penalty. I would love to see this get merged. Happy to help test if that would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support incremental cache refresh on TTL expiry instead of full rebuild

3 participants