Skip to content

Conversation

chesnokoff
Copy link
Contributor

Thank you for submitting the pull request to the Apache Ignite.

In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:

The Contribution Checklist

  • There is a single JIRA ticket related to the pull request.
  • The web-link to the pull request is attached to the JIRA ticket.
  • The JIRA ticket has the Patch Available state.
  • The pull request body describes changes that have been made.
    The description explains WHAT and WHY was made instead of HOW.
  • The pull request title is treated as the final commit message.
    The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • A reviewer has been mentioned through the JIRA comments
    (see the Maintainers list)
  • The pull request has been checked by the Teamcity Bot and
    the green visa attached to the JIRA ticket (see TC.Bot: Check PR)

Notes

If you need any help, please email [email protected] or ask anу advice on http://asf.slack.com #ignite channel.

Comment on lines 113 to 99
Set<Byte> versions = ctx.discovery().allNodes().stream()
.map(node -> IgniteProductVersion.fromString(node.attribute(ATTR_BUILD_VER)).minor())
.collect(Collectors.toSet());
Copy link
Contributor Author

@chesnokoff chesnokoff Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous patch used field to store available version only first time. But there are two problems:

  1. If we change coordinator to node with higher version, it may accept +2 version so there is a probability to have three versions of nodes in cluster
  2. If +1 version node was accepted, after that it disconnected and we want to add -1 version it will be rejected

So we have to scan available versions in discovery().allNodes() every time

NavigableMap<Long, Collection<ClusterNode>> hist = updateTopologyHistory(topVer, top);

lsnr.onDiscovery(
IgniteFuture<?> fut = lsnr.onDiscovery(
Copy link
Contributor Author

@chesnokoff chesnokoff Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The future is responsible for updating discovery().allNodes() that is used in allowedMinorVersions so we need to wait it. Otherwise, there is race codition, when we can accept +1 and -1 version

Copy link
Contributor Author

@chesnokoff chesnokoff Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not implemented test for this because I think usually there will no be race condition with discovery().allNodes(). It can be reproduced only for high load systems. For instance when ten nodes with +1 and -1 versions are trying to connect to slow master at the same time.

I think long TCP communication will usually save us from the race. So a reproducer would be flaky


String stackTrace = errors.toString();

assert stackTrace.contains("Remote node rejected due to incompatible version for cluster join");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be replaced with X.hasCause


assertTrue(waitForCondition(() -> Ignition.allGrids().size() == 2, getTestTimeout()));

startGrid(3, "2.19.0", false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For RU you should reuse node ids. grid(0) is restarted with new version


/** */
@Test
public void testRollingUpgrade0() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add more tests:

  1. For downgrade version of cluster
  2. Upgrade and then downgrade back
  3. Start upgrade not from coordinator node

+ "Allowed versions for joining:\n"
+ " - " + locVer.major() + '.' + locVer.minor() + ".X\n"
+ " - " + locVer.major() + '.' + (locVer.minor() + 1) + ".X\n"
+ " - " + locVer.major() + '.' + (locVer.minor() - 1) + ".X";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case cluster has 2.18 (locVer) and 2.19 versions, then we can't accept 2.17 while this log promises it.

boolean withClients
) throws Exception {
startGrid(0, acceptedVer1, false);
startGrid(1, acceptedVer2, withClients);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a cluster with multiple server nodes. In this case different client nodes can be connected to different server nodes. We should check that all client nodes versions are participating in the check.

* Test grids starting with non compatible release types.
* Test Rolling Upgrade release types.
*/
@WithSystemProperty(key = "IGNITE.ROLLING.UPGRADE.VERSION.CHECK", value = "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use OsDiscoveryNodeValidationProcessor#IGNITE_ROLLING_UPGRADE_VERSION_CHECK as a key.

}

/** Checks that the third grid is not compatible. */
private void testConflictVersions(String acceptedVer1, String acceptedVer2, String rejVer, boolean withClients) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withClients -> isClient

}

/** Tests that starting a node with rejected version fails with remote rejection. */
private void testConflictVersions(String acceptedVer, String rejVer, boolean withClient) {
Copy link
Contributor

@wernerdv wernerdv Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withClient -> isClient

public class OsDiscoveryNodeValidationProcessor extends GridProcessorAdapter implements DiscoveryNodeValidationProcessor {
/** Enables version check for rolling upgrade. */
@SystemProperty(value = "Enables version check for rolling upgrade.")
public static final String IGNITE_ROLLING_UPGRADE_VERSION_CHECK = "IGNITE.ROLLING.UPGRADE.VERSION.CHECK";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underscore is used as a separator in system properties accross ignite codebase. Please fix this in string literal: IGNITE_ROLLING_UPGRADE_VERSION_CHECK instead of IGNITE.ROLLING.UPGRADE.VERSION.CHECK

Copy link
Contributor Author

@chesnokoff chesnokoff Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I found LocalDeploymentSpi#IGNITE_DEPLOYMENT_ADDITIONAL_CHECK:

/** Enables additional check for resource name on resources removal. */
@SystemProperty(value = "Enables an additional check of a resource name on resources removal")
public static final String IGNITE_DEPLOYMENT_ADDITIONAL_CHECK = "IGNITE.DEPLOYMENT.ADDITIONAL.CHECK";

That's why I used dots. But now I see that other properties use underscores. Resolved

Copy link

@chesnokoff chesnokoff force-pushed the ignite-25534 branch 2 times, most recently from c4fef5e to 8539410 Compare October 1, 2025 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants