Skip to content

fix(macos-app): disable URL response caching for cluster-state polling#2005

Merged
AlexCheema merged 4 commits intoexo-explore:mainfrom
ecohash-co:fix/disable-urlcache-cluster-state
May 1, 2026
Merged

fix(macos-app): disable URL response caching for cluster-state polling#2005
AlexCheema merged 4 commits intoexo-explore:mainfrom
ecohash-co:fix/disable-urlcache-cluster-state

Conversation

@ecohash-co
Copy link
Copy Markdown
Contributor

Fixes #2004.

ClusterStateService polls /state at 2 Hz via URLSession.shared, which keeps an on-disk URLCache attached by default. Every polled response body gets persisted under ~/Library/Caches/exolabs.EXO/, sustaining ~500–620 KB/sec of file-backed memory dirtied — far above macOS's ~25 KB/sec per-process daily-average baseline. Six microstackshot reports observed on a single Mac Studio M3 Ultra over eight days, with one 15-hour run accumulating 34.36 GB of cache writes.

Heaviest stack on every diagnostic report (96–98% of samples):

_dispatch_workloop_worker_thread → _dispatch_block_async_invoke2 →
  __CFURLCache::CreateAndStoreCacheNode → write

Full diagnostic data and analysis in #2004.

What changed

ClusterStateService now defaults to an ephemeral, non-caching URLSession instead of URLSession.shared. Cluster-state responses are time-sensitive and small; nothing benefits from being cached on disk.

private static func makeNonCachingSession() -> URLSession {
    let config = URLSessionConfiguration.ephemeral
    config.urlCache = nil
    config.requestCachePolicy = .reloadIgnoringLocalCacheData
    return URLSession(configuration: config)
}

The existing per-request request.cachePolicy = .reloadIgnoringLocalCacheData calls are kept as defense in depth — they only affect read behavior, but harmless to leave alongside the session-level config.

Scope

  • Behavioral: none. Polled requests still go out at the same cadence; responses still parse the same; no semantic change to any API surface.
  • Test injection: the session: parameter remains in init, so tests can still inject a custom mock session unchanged.
  • BugReportService and other URLSession.shared callers: untouched. If maintainers prefer an app-wide URLCache disable instead, happy to switch the approach (issue body has the alternative spelled out).

Verification

Verified locally that compiling EXO with this change produces a working menubar app and ClusterStateService continues to fetch state correctly. After ~30 min of idle polling, no new entries in /Library/Logs/DiagnosticReports/EXO_*.diag and no growth in ~/Library/Caches/exolabs.EXO/.

Test plan

  • Build EXO from this branch on macOS 26.4
  • Launch, let cluster state polling run for 30+ min
  • Confirm no new microstackshot diagnostic reports
  • Confirm ~/Library/Caches/exolabs.EXO/Cache.db* does not grow

🤖 Generated with Claude Code

`ClusterStateService` polls `/state` at 2 Hz via `URLSession.shared`,
which keeps an on-disk `URLCache` attached by default. Each polled
response body gets persisted under `~/Library/Caches/exolabs.EXO/...`,
and macOS counts those persistent-cache writes against the process's
`disk writes` resource limit.

On a long-running cluster node, this trips macOS into emitting
microstackshot diagnostic reports (`/Library/Logs/DiagnosticReports/
EXO_*.diag`) with `Event: disk writes` exceeding the per-process
daily-average limit. The heaviest stack on every report points at
`__CFURLCache::CreateAndStoreCacheNode → write` in CFNetwork, with
~500–620 KB/sec sustained for as long as the polling loop runs.

Six reports observed on a single Mac Studio M3 Ultra over eight days
(2026-04-22 → 2026-04-29), totalling 53+ GB of cache writes. Highest
single sample: 34.36 GB / 15-hour run.

Fix: route polling through an ephemeral, non-caching `URLSession`.
Cluster-state responses are time-sensitive and small; nothing
benefits from being cached on disk. The per-request
`.reloadIgnoringLocalCacheData` calls already in the file are kept
as defense in depth (they only affect read behavior; the session-
level `urlCache = nil` is what stops the writes).

No behavioral change for in-flight requests. SSD wear and
background CPU drop noticeably on long-running nodes.
Copy link
Copy Markdown
Contributor

@AlexCheema AlexCheema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thank you

@AlexCheema AlexCheema enabled auto-merge (squash) April 30, 2026 12:53
@AlexCheema AlexCheema merged commit b26268d into exo-explore:main May 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

macos-app: ClusterStateService polling writes ~600 KB/sec to disk via URLCache

3 participants