Skip to content

camel-salesforce: Fix race condition in channel subscription and improve reconnect advice handling#22874

Open
shaipan wants to merge 5 commits intoapache:mainfrom
shaipan:salesforce-fix-connect-listener-race
Open

camel-salesforce: Fix race condition in channel subscription and improve reconnect advice handling#22874
shaipan wants to merge 5 commits intoapache:mainfrom
shaipan:salesforce-fix-connect-listener-race

Conversation

@shaipan
Copy link
Copy Markdown
Contributor

@shaipan shaipan commented Apr 30, 2026

Fix a race condition and improve reconnect behavior in the Streaming API subscription helper.

  • Synchronize clear() + addAll() on channelsToSubscribe in the handshake listener to prevent
    the connection listener from reading a partially updated set
  • Take a snapshot of channelsToSubscribe before subscribing in the connection listener to avoid
    concurrent modification
  • Handle all non-retry reconnect advice (not just "none") per the CometD/Bayeux spec, including
    on successful connect messages
  • Initiate handshake on temporary errors during connect to recover the session

@github-actions
Copy link
Copy Markdown
Contributor

🌟 Thank you for your contribution to the Apache Camel project! 🌟
🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

  • First-time contributors require MANUAL approval for the GitHub Actions to run
  • You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot although they are normally detected and executed by CI.
  • You can label PRs using skip-tests and test-dependents to fine-tune the checks executed by this PR.
  • Build and test logs are available in the summary page. Only Apache Camel committers have access to the summary.

⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

🧪 CI tested the following changed modules:

  • components/camel-salesforce/camel-salesforce-component
All tested modules (10 modules)
  • Camel :: JBang :: MCP
  • Camel :: JBang :: Plugin :: Route Parser
  • Camel :: JBang :: Plugin :: TUI
  • Camel :: JBang :: Plugin :: Validate
  • Camel :: Launcher :: Container
  • Camel :: Salesforce
  • Camel :: Salesforce :: CodeGen
  • Camel :: Salesforce :: Maven Plugin
  • Camel :: YAML DSL :: Validator
  • Camel :: YAML DSL :: Validator Maven Plugin

⚙️ View full build and test results

Copy link
Copy Markdown
Contributor

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up to #22840 — the race condition fix and reconnect advice improvements are directionally good. A few items to address before this can be merged:

1. Missing JIRA ticket

Per project conventions, PRs should reference a JIRA ticket with the format CAMEL-XXXX: Brief description in the title and branch name. Please create a JIRA ticket and update the PR title/branch accordingly.

2. No tests for changed behavior

The PR changes concurrency semantics (lock + snapshot) and reconnect advice handling (multiple new branches), but adds no tests. The existing SubscriptionHelperTest only covers replay ID logic.

At minimum, the reconnect advice logic could be unit-tested by invoking the connection listener with mocked messages carrying different advice values ("none", "handshake", "retry", null).

3. Incomplete lock coverage — potential lost update on channelsToSubscribe

The new channelsLock protects the compound operations in the handshake and connection listeners, but subscribe() at line 451 calls channelsToSubscribe.add(channelName) under a different lock (lock), not channelsLock. This creates a window where subscribe() adds a channel after the snapshot is taken but before clear() in the connection listener — the channel is added, then immediately cleared, and never subscribed.

Similarly, channelsToSubscribe.remove() at lines 204 and 266 is unprotected, though those are benign (removing from a cleared set is a no-op).

Consider either:

  • Using channelsLock consistently for all mutations of channelsToSubscribe, or
  • Documenting why the partial coverage is sufficient.

4. Behavioral change when advice is null on failed connect

The old code handshakes when advice == null OR reconnect == "none":

if (message.getAdvice() == null || "none".equals(message.getAdvice().get("reconnect")))

The new code only handshakes when reconnect is non-null and not "retry":

if (reconnectAdvice != null && !"retry".equals(reconnectAdvice))

When the server sends no advice at all and the error is not a temporary error, the old code would handshake but the new code does nothing. Could this regress certain failure scenarios? If this change is intentional per the Bayeux spec, it would be helpful to add a comment explaining the rationale.

5. Minor question — reconnect advice on successful connect

The new code handles non-retry reconnect advice on successful connect messages. Has this scenario been observed in practice with Salesforce's CometD server, or is it a defensive addition per the spec? Either way is fine, just curious.


Overall this is a solid improvement to the streaming subscription resilience. Addressing the above (especially items 2–4) would make it ready to merge. Thank you for the contribution!

Claude Code on behalf of Guillaume Nodet

@shaipan
Copy link
Copy Markdown
Contributor Author

shaipan commented May 1, 2026

Thanks for the follow-up to #22840 — the race condition fix and reconnect advice improvements are directionally good. A few items to address before this can be merged:

1. Missing JIRA ticket

Per project conventions, PRs should reference a JIRA ticket with the format CAMEL-XXXX: Brief description in the title and branch name. Please create a JIRA ticket and update the PR title/branch accordingly.

2. No tests for changed behavior

The PR changes concurrency semantics (lock + snapshot) and reconnect advice handling (multiple new branches), but adds no tests. The existing SubscriptionHelperTest only covers replay ID logic.

At minimum, the reconnect advice logic could be unit-tested by invoking the connection listener with mocked messages carrying different advice values ("none", "handshake", "retry", null).

3. Incomplete lock coverage — potential lost update on channelsToSubscribe

The new channelsLock protects the compound operations in the handshake and connection listeners, but subscribe() at line 451 calls channelsToSubscribe.add(channelName) under a different lock (lock), not channelsLock. This creates a window where subscribe() adds a channel after the snapshot is taken but before clear() in the connection listener — the channel is added, then immediately cleared, and never subscribed.

Similarly, channelsToSubscribe.remove() at lines 204 and 266 is unprotected, though those are benign (removing from a cleared set is a no-op).

Consider either:

  • Using channelsLock consistently for all mutations of channelsToSubscribe, or
  • Documenting why the partial coverage is sufficient.

4. Behavioral change when advice is null on failed connect

The old code handshakes when advice == null OR reconnect == "none":

if (message.getAdvice() == null || "none".equals(message.getAdvice().get("reconnect")))

The new code only handshakes when reconnect is non-null and not "retry":

if (reconnectAdvice != null && !"retry".equals(reconnectAdvice))

When the server sends no advice at all and the error is not a temporary error, the old code would handshake but the new code does nothing. Could this regress certain failure scenarios? If this change is intentional per the Bayeux spec, it would be helpful to add a comment explaining the rationale.

5. Minor question — reconnect advice on successful connect

The new code handles non-retry reconnect advice on successful connect messages. Has this scenario been observed in practice with Salesforce's CometD server, or is it a defensive addition per the spec? Either way is fine, just curious.

Overall this is a solid improvement to the streaming subscription resilience. Addressing the above (especially items 2–4) would make it ready to merge. Thank you for the contribution!

Claude Code on behalf of Guillaume Nodet

#1 (Missing JIRA ticket):

We can use this one https://issues.apache.org/jira/browse/CAMEL-23391

#2 (No tests):
The reconnect advice logic runs inside private CometD callbacks that require heavy mocking of BayeuxClient, worker pool, and the component. The existing tests in SubscriptionHelperTest follow the same pattern — only static/package-private methods are covered. Happy to add tests if you can suggest a preferred approach for testing these listeners in a seprate PR.

#3 (Incomplete lock coverage):
channelsToSubscribe is a ConcurrentHashMap.newKeySet() — individual operations like add() and remove() are already atomic and thread-safe. The channelsLock is only needed for compound operations (clear() + addAll(), snapshot + clear()) where atomicity of the pair matters. Wrapping add() in subscribe() with channelsLock would risk deadlock: subscribe() is called under the inherited lock, and the connection listener acquires channelsLock then calls subscribe() which acquires lock — a lock ordering inversion. The remove() calls are benign as you noted.

#4 (Null advice regression):
Good catch — fixed. Changed to reconnectAdvice == null || !"retry".equals(reconnectAdvice) which restores the original handshake-on-null-advice behavior. Added a comment explaining the Bayeux spec rationale.

#5 (Successful connect advice):
Yes there is a production issue where the listner stopped working after this.
30-Apr-2026 13:37:30.774 GMT DEBUG [Camel (camel-1) thread #90 - SalesforceHttpClient]
[org.apache.camel.component.salesforce.internal.streaming.SubscriptionHelper]
[{}] - [CHANNEL:META_CONNECT]: {clientId=rc1evauvtfawssvkcmb7qse50uj, advice={reconnect=none},
channel=/meta/connect, id=82, successful=true}

@shaipan shaipan requested a review from gnodet May 1, 2026 05:26
…ubscribe() to prevent re-subscription on every connect cycle
@shaipan
Copy link
Copy Markdown
Contributor Author

shaipan commented May 1, 2026

@gnodet During testing we discovered that subscribe() was adding the channel back to channelsToSubscribe via channelsToSubscribe.add(channelName). Since the connection listener takes a snapshot of channelsToSubscribe, clears it, and calls subscribe() for each consumer — subscribe() was re-adding the channel, causing the next connect cycle to find it non-empty and call subscribe() again. This stacked duplicate CometD listeners on the same channel, resulting in each event being delivered N times (once per stacked listener, where N = number of connect cycles since handshake).

The fix removes channelsToSubscribe.add(channelName) from subscribe(). This is safe because channelsToSubscribe is only meant to track channels pending subscription after a handshake — the handshake listener populates it from channelToConsumers.keySet(), which already includes the channel after channelToConsumers.computeIfAbsent(...).add(consumer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants