fix(macos-app): disable URL response caching for cluster-state polling#2005
Merged
AlexCheema merged 4 commits intoexo-explore:mainfrom May 1, 2026
Merged
Conversation
`ClusterStateService` polls `/state` at 2 Hz via `URLSession.shared`, which keeps an on-disk `URLCache` attached by default. Each polled response body gets persisted under `~/Library/Caches/exolabs.EXO/...`, and macOS counts those persistent-cache writes against the process's `disk writes` resource limit. On a long-running cluster node, this trips macOS into emitting microstackshot diagnostic reports (`/Library/Logs/DiagnosticReports/ EXO_*.diag`) with `Event: disk writes` exceeding the per-process daily-average limit. The heaviest stack on every report points at `__CFURLCache::CreateAndStoreCacheNode → write` in CFNetwork, with ~500–620 KB/sec sustained for as long as the polling loop runs. Six reports observed on a single Mac Studio M3 Ultra over eight days (2026-04-22 → 2026-04-29), totalling 53+ GB of cache writes. Highest single sample: 34.36 GB / 15-hour run. Fix: route polling through an ephemeral, non-caching `URLSession`. Cluster-state responses are time-sensitive and small; nothing benefits from being cached on disk. The per-request `.reloadIgnoringLocalCacheData` calls already in the file are kept as defense in depth (they only affect read behavior; the session- level `urlCache = nil` is what stops the writes). No behavioral change for in-flight requests. SSD wear and background CPU drop noticeably on long-running nodes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2004.
ClusterStateServicepolls/stateat 2 Hz viaURLSession.shared, which keeps an on-diskURLCacheattached by default. Every polled response body gets persisted under~/Library/Caches/exolabs.EXO/, sustaining ~500–620 KB/sec of file-backed memory dirtied — far above macOS's ~25 KB/sec per-process daily-average baseline. Six microstackshot reports observed on a single Mac Studio M3 Ultra over eight days, with one 15-hour run accumulating 34.36 GB of cache writes.Heaviest stack on every diagnostic report (96–98% of samples):
Full diagnostic data and analysis in #2004.
What changed
ClusterStateServicenow defaults to an ephemeral, non-cachingURLSessioninstead ofURLSession.shared. Cluster-state responses are time-sensitive and small; nothing benefits from being cached on disk.The existing per-request
request.cachePolicy = .reloadIgnoringLocalCacheDatacalls are kept as defense in depth — they only affect read behavior, but harmless to leave alongside the session-level config.Scope
session:parameter remains ininit, so tests can still inject a custom mock session unchanged.BugReportServiceand otherURLSession.sharedcallers: untouched. If maintainers prefer an app-wide URLCache disable instead, happy to switch the approach (issue body has the alternative spelled out).Verification
Verified locally that compiling EXO with this change produces a working menubar app and
ClusterStateServicecontinues to fetch state correctly. After ~30 min of idle polling, no new entries in/Library/Logs/DiagnosticReports/EXO_*.diagand no growth in~/Library/Caches/exolabs.EXO/.Test plan
~/Library/Caches/exolabs.EXO/Cache.db*does not grow🤖 Generated with Claude Code