-
Notifications
You must be signed in to change notification settings - Fork 1.9k
IGNITE-25534 Update Ignite versions validation for cluster #12301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Set<Byte> versions = ctx.discovery().allNodes().stream() | ||
.map(node -> IgniteProductVersion.fromString(node.attribute(ATTR_BUILD_VER)).minor()) | ||
.collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous patch used field to store available version only first time. But there are two problems:
- If we change coordinator to node with higher version, it may accept +2 version so there is a probability to have three versions of nodes in cluster
- If +1 version node was accepted, after that it disconnected and we want to add -1 version it will be rejected
So we have to scan available versions in discovery().allNodes()
every time
NavigableMap<Long, Collection<ClusterNode>> hist = updateTopologyHistory(topVer, top); | ||
|
||
lsnr.onDiscovery( | ||
IgniteFuture<?> fut = lsnr.onDiscovery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The future is responsible for updating discovery().allNodes()
that is used in allowedMinorVersions
so we need to wait it. Otherwise, there is race codition, when we can accept +1 and -1 version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not implemented test for this because I think usually there will no be race condition with discovery().allNodes()
. It can be reproduced only for high load systems. For instance when ten nodes with +1 and -1 versions are trying to connect to slow master at the same time.
I think long TCP communication will usually save us from the race. So a reproducer would be flaky
|
||
String stackTrace = errors.toString(); | ||
|
||
assert stackTrace.contains("Remote node rejected due to incompatible version for cluster join"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be replaced with X.hasCause
|
||
assertTrue(waitForCondition(() -> Ignition.allGrids().size() == 2, getTestTimeout())); | ||
|
||
startGrid(3, "2.19.0", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For RU you should reuse node ids. grid(0) is restarted with new version
|
||
/** */ | ||
@Test | ||
public void testRollingUpgrade0() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add more tests:
- For downgrade version of cluster
- Upgrade and then downgrade back
- Start upgrade not from coordinator node
+ "Allowed versions for joining:\n" | ||
+ " - " + locVer.major() + '.' + locVer.minor() + ".X\n" | ||
+ " - " + locVer.major() + '.' + (locVer.minor() + 1) + ".X\n" | ||
+ " - " + locVer.major() + '.' + (locVer.minor() - 1) + ".X"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case cluster has 2.18 (locVer) and 2.19 versions, then we can't accept 2.17 while this log promises it.
boolean withClients | ||
) throws Exception { | ||
startGrid(0, acceptedVer1, false); | ||
startGrid(1, acceptedVer2, withClients); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a cluster with multiple server nodes. In this case different client nodes can be connected to different server nodes. We should check that all client nodes versions are participating in the check.
23dc391
to
48a6be0
Compare
* Test grids starting with non compatible release types. | ||
* Test Rolling Upgrade release types. | ||
*/ | ||
@WithSystemProperty(key = "IGNITE.ROLLING.UPGRADE.VERSION.CHECK", value = "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use OsDiscoveryNodeValidationProcessor#IGNITE_ROLLING_UPGRADE_VERSION_CHECK
as a key.
} | ||
|
||
/** Checks that the third grid is not compatible. */ | ||
private void testConflictVersions(String acceptedVer1, String acceptedVer2, String rejVer, boolean withClients) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withClients
-> isClient
} | ||
|
||
/** Tests that starting a node with rejected version fails with remote rejection. */ | ||
private void testConflictVersions(String acceptedVer, String rejVer, boolean withClient) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withClient
-> isClient
6c98a2f
to
e503a05
Compare
public class OsDiscoveryNodeValidationProcessor extends GridProcessorAdapter implements DiscoveryNodeValidationProcessor { | ||
/** Enables version check for rolling upgrade. */ | ||
@SystemProperty(value = "Enables version check for rolling upgrade.") | ||
public static final String IGNITE_ROLLING_UPGRADE_VERSION_CHECK = "IGNITE.ROLLING.UPGRADE.VERSION.CHECK"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Underscore is used as a separator in system properties accross ignite codebase. Please fix this in string literal: IGNITE_ROLLING_UPGRADE_VERSION_CHECK
instead of IGNITE.ROLLING.UPGRADE.VERSION.CHECK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I found LocalDeploymentSpi#IGNITE_DEPLOYMENT_ADDITIONAL_CHECK
:
/** Enables additional check for resource name on resources removal. */
@SystemProperty(value = "Enables an additional check of a resource name on resources removal")
public static final String IGNITE_DEPLOYMENT_ADDITIONAL_CHECK = "IGNITE.DEPLOYMENT.ADDITIONAL.CHECK";
That's why I used dots. But now I see that other properties use underscores. Resolved
|
…e.contains with X#hasCause
c4fef5e
to
8539410
Compare
Thank you for submitting the pull request to the Apache Ignite.
In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:
The Contribution Checklist
The description explains WHAT and WHY was made instead of HOW.
The following pattern must be used:
IGNITE-XXXX Change summary
whereXXXX
- number of JIRA issue.(see the Maintainers list)
the
green visa
attached to the JIRA ticket (see TC.Bot: Check PR)Notes
If you need any help, please email [email protected] or ask anу advice on http://asf.slack.com #ignite channel.