Skip to content

Comments

fix(sync): prevent App↔CLI sync permanent wedge#701

Open
seibe wants to merge 5 commits intoslopus:mainfrom
seibe:fix/sync-wedge
Open

fix(sync): prevent App↔CLI sync permanent wedge#701
seibe wants to merge 5 commits intoslopus:mainfrom
seibe:fix/sync-wedge

Conversation

@seibe
Copy link

@seibe seibe commented Feb 21, 2026

Summary

Fixes a class of bugs where the App or CLI message sync could become permanently stuck ("wedged"), requiring a full app restart to recover.

Related issues

Addresses: #504, #297, #108, #136, #191, #587, #209, #358 — reports of messages not syncing, sessions stuck on "Waiting for messages...", real-time sync failures between devices.

Comparison with similar PRs

Root causes addressed

  • Unbounded retries: InvalidateSync and backoff would retry forever on permanent failures (e.g. 404, decryption errors). Now they have configurable maxRetries and enter a recoverable "wedged" state that can be reset by external triggers like socket reconnect.
  • Missing timeouts: Socket emitWithAck and HTTP fetch calls could hang indefinitely. All now have explicit timeouts (15s socket, 30s RPC, 60s HTTP).
  • Cursor rewind race: sessionLastSeq could be rewound when a slower HTTP response completed after a faster socket update had already advanced the cursor. Now uses monotonic Math.max advancement.
  • Initial load fetches all messages: Previously fetched every message from seq 0 on session open. Now uses reverse pagination to fetch only the latest 50, with a "Load older messages" button for backward pagination.
  • User messages invisible: Feature-flag gating in typesRaw.ts caused user messages to be dropped depending on the session protocol send flag state. Both legacy and modern user messages are now always normalized.
  • Silent server callback omission: The server's update-metadata socket handler could silently return without calling back, leaving the client's emitWithAck hanging until the new timeout fires.

Changes by package

happy-app

  • InvalidateSync: Add wedge state, maxRetries, onError callback, and wedge recovery on re-invalidation
  • createBackoff: Add maxRetries option with BackoffGaveUpError typed error
  • sync.ts: Refactor fetchMessages into fetchLatestMessages (reverse pagination) and fetchForwardMessages (incremental); add fetchOlderMessages for UI-driven backward pagination; monotonic seq advancement; bounded retries on all sync instances
  • apiSocket.ts: Add 30s/15s timeouts on emitWithAck, 60s timeout on fetch with AbortController
  • typesRaw.ts: Remove feature-flag gating on user message normalization
  • ChatList.tsx: Add "Load older messages" button at list top
  • storage.ts: Add hasOlderMessages, oldestLoadedSeq, isLoadingOlder fields

happy-cli

  • Mirror InvalidateSync and createBackoff changes (wedge state, maxRetries)
  • Add 15s timeouts on socket emitWithAck calls in apiSession.ts and apiMachine.ts

happy-server

  • v3SessionRoutes.ts: Add before_seq reverse pagination parameter, make after_seq optional, reject both specified simultaneously
  • v3SessionRoutes.test.ts: Add tests for reverse pagination, mutual exclusion, and latest-messages default
  • sessionUpdateHandler.ts: Send error callback on missing session instead of silently returning
  • eventRouter.ts: Add repeatKey field to feed post update type

Test plan

  • Server: New tests for reverse pagination and edge cases pass
  • App: Open a session with many messages → only latest 50 load initially → "Load older messages" loads more
  • App: Kill server mid-sync → sync enters wedged state → reconnect recovers sync automatically
  • App: User messages visible in both legacy and session-protocol modes
  • CLI: emitWithAck timeout fires correctly on unresponsive server (15s for metadata, 30s for session RPC)

…c failure

- Add maxRetries option to backoff() with BackoffGaveUpError
- InvalidateSync: support bounded retries, wedge detection, auto-recovery
- handleUpdate: await session refresh with 10s timeout when encryption missing,
  fall through to fetchMessages instead of silently dropping
- getMessagesSync: use maxRetries=30 to prevent infinite retry loops
- fetchMessages: add diagnostic counters (decryptFailed, normalizeFailed)
- normalizeRawMessage: log drop reasons for unexpected failures (Zod, missing
  uuid, missing turn) while silencing expected drops (meta, compact summary)
- Clean up verbose debug logs in handleUpdate
…dges

Socket emitWithAck calls now have explicit timeouts (15-30s), fetch gets a 60s
abort controller, InvalidateSync gains maxRetries/wedge detection with automatic
recovery on next invalidate, and the server returns error callbacks for missing
sessions.
Ensure monotonic cursor advancement in fetchMessages and flushOutbox using
Math.max to prevent socket update races from rewinding sessionLastSeq.
Remove redundant fetchMessages invalidation from new-message handler —
fast-path already handles the message, so the extra invalidation only
widens the race window.
…ock contention

Two issues prevented conversation history from displaying when resuming
a session:

1. normalizeRawMessage dropped session-protocol user text messages when
   ENABLE_SESSION_PROTOCOL_SEND was disabled (the default). Since the CLI
   sends all messages via session protocol, every historical user message
   was silently discarded. Removed the flag-gated filtering so both legacy
   and session-protocol user messages are always normalised.

2. fetchMessages enqueued messages via scheduleQueuedMessagesProcessing
   which tried to acquire the same lock fetchMessages already held. The
   queue processing was deferred until after applyMessagesLoaded had
   already set isLoaded=true with an empty message list, causing a race
   where the UI briefly (or permanently) showed the empty state. Replaced
   enqueueMessages with a direct applyMessages call inside the lock so
   messages are in the store before isLoaded flips.
When resuming sessions with long histories, syncing all messages caused
excessive re-renders and scrolling. Now fetches only the latest 50
messages on initial load, with a "load older messages" button for
explicit pagination. Server supports reverse pagination via before_seq.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant