fix(sync): prevent App↔CLI sync permanent wedge#701
Open
seibe wants to merge 5 commits intoslopus:mainfrom
Open
fix(sync): prevent App↔CLI sync permanent wedge#701seibe wants to merge 5 commits intoslopus:mainfrom
seibe wants to merge 5 commits intoslopus:mainfrom
Conversation
…c failure - Add maxRetries option to backoff() with BackoffGaveUpError - InvalidateSync: support bounded retries, wedge detection, auto-recovery - handleUpdate: await session refresh with 10s timeout when encryption missing, fall through to fetchMessages instead of silently dropping - getMessagesSync: use maxRetries=30 to prevent infinite retry loops - fetchMessages: add diagnostic counters (decryptFailed, normalizeFailed) - normalizeRawMessage: log drop reasons for unexpected failures (Zod, missing uuid, missing turn) while silencing expected drops (meta, compact summary) - Clean up verbose debug logs in handleUpdate
…dges Socket emitWithAck calls now have explicit timeouts (15-30s), fetch gets a 60s abort controller, InvalidateSync gains maxRetries/wedge detection with automatic recovery on next invalidate, and the server returns error callbacks for missing sessions.
Ensure monotonic cursor advancement in fetchMessages and flushOutbox using Math.max to prevent socket update races from rewinding sessionLastSeq. Remove redundant fetchMessages invalidation from new-message handler — fast-path already handles the message, so the extra invalidation only widens the race window.
…ock contention Two issues prevented conversation history from displaying when resuming a session: 1. normalizeRawMessage dropped session-protocol user text messages when ENABLE_SESSION_PROTOCOL_SEND was disabled (the default). Since the CLI sends all messages via session protocol, every historical user message was silently discarded. Removed the flag-gated filtering so both legacy and session-protocol user messages are always normalised. 2. fetchMessages enqueued messages via scheduleQueuedMessagesProcessing which tried to acquire the same lock fetchMessages already held. The queue processing was deferred until after applyMessagesLoaded had already set isLoaded=true with an empty message list, causing a race where the UI briefly (or permanently) showed the empty state. Replaced enqueueMessages with a direct applyMessages call inside the lock so messages are in the store before isLoaded flips.
When resuming sessions with long histories, syncing all messages caused excessive re-renders and scrolling. Now fetches only the latest 50 messages on initial load, with a "load older messages" button for explicit pagination. Server supports reverse pagination via before_seq.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a class of bugs where the App or CLI message sync could become permanently stuck ("wedged"), requiring a full app restart to recover.
Related issues
Addresses: #504, #297, #108, #136, #191, #587, #209, #358 — reports of messages not syncing, sessions stuck on "Waiting for messages...", real-time sync failures between devices.
Comparison with similar PRs
OutgoingMessageQueueby skipping unreleased items. This is a valid fix for one symptom (OutgoingMessageQueue head-of-line blocking causes 'one behind' responses with background tasks #639), but does not address the underlying causes: unbounded retries, missing timeouts, cursor rewind races, and the app fetching all messages from seq 0. This PR addresses the full chain of failure modes that lead to permanent sync wedge, including server-side reverse pagination support.before_seqparameter, mutual exclusion validation, proper test coverage, and integration with the sync engine's cursor management.Root causes addressed
InvalidateSyncandbackoffwould retry forever on permanent failures (e.g. 404, decryption errors). Now they have configurablemaxRetriesand enter a recoverable "wedged" state that can be reset by external triggers like socket reconnect.emitWithAckand HTTPfetchcalls could hang indefinitely. All now have explicit timeouts (15s socket, 30s RPC, 60s HTTP).sessionLastSeqcould be rewound when a slower HTTP response completed after a faster socket update had already advanced the cursor. Now uses monotonicMath.maxadvancement.typesRaw.tscaused user messages to be dropped depending on the session protocol send flag state. Both legacy and modern user messages are now always normalized.update-metadatasocket handler could silently return without calling back, leaving the client'semitWithAckhanging until the new timeout fires.Changes by package
happy-app
InvalidateSync: Add wedge state,maxRetries,onErrorcallback, and wedge recovery on re-invalidationcreateBackoff: AddmaxRetriesoption withBackoffGaveUpErrortyped errorsync.ts: RefactorfetchMessagesintofetchLatestMessages(reverse pagination) andfetchForwardMessages(incremental); addfetchOlderMessagesfor UI-driven backward pagination; monotonic seq advancement; bounded retries on all sync instancesapiSocket.ts: Add 30s/15s timeouts onemitWithAck, 60s timeout onfetchwithAbortControllertypesRaw.ts: Remove feature-flag gating on user message normalizationChatList.tsx: Add "Load older messages" button at list topstorage.ts: AddhasOlderMessages,oldestLoadedSeq,isLoadingOlderfieldshappy-cli
InvalidateSyncandcreateBackoffchanges (wedge state,maxRetries)emitWithAckcalls inapiSession.tsandapiMachine.tshappy-server
v3SessionRoutes.ts: Addbefore_seqreverse pagination parameter, makeafter_seqoptional, reject both specified simultaneouslyv3SessionRoutes.test.ts: Add tests for reverse pagination, mutual exclusion, and latest-messages defaultsessionUpdateHandler.ts: Send error callback on missing session instead of silently returningeventRouter.ts: AddrepeatKeyfield to feed post update typeTest plan
emitWithAcktimeout fires correctly on unresponsive server (15s for metadata, 30s for session RPC)