SYSTEM PRESHUTDOWN command for graceful shutdown swarm node #852

ianton-ru · 2025-06-10T21:16:03Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

SYSTEM PRESHUTDOWN command for graceful shutdown swarm node

Documentation entry for user-facing changes

Solved #759
New command SYSTEM PRESHUTDOWN.
Scenario:
We want to scale down swarm cluster. On node which we want to shutdown we call SYSTEM PRESHUTDOWN, after that node stops to accept new distributed commands. It can still processed objects which is started processed before SYSTEM PRESHUTDOWN. When all that objects successfully processed, we can kill that node without any errors or lost data in responses on initiator.

After SYSTEM PRESHUTDOWN on swarm node:

unregister node from autodiscovery clusters if exists
stop getting new tasks for objectStorageCluster-family functions (s3Cluster/icebergCluster/etc.)

On initiator node:

for all distributed requests with setting skip_unavailable_shards=true unexpected closing of socket is legal if no data packets were accepted before. This allow to shutdown non-autodiscovery node too.

Exclude tests:

ilejn · 2025-06-11T08:21:41Z

src/Storages/ObjectStorage/StorageObjectStorageSource.cpp

 {
+    if (getContext()->isPreShutdownCalled())


Is it true that operations prohibited in PreShutdown phase (e.g getting new tasks), are allowed in Shutdown phase?
If yes, is it correct?

As I understand ClickHouse does not have specific shutdown phase. On SYSTEM SHUTDOWN just calls kill(0, SIGTERM). Without PRESHUTDOW this caused error on initiator as well as already taken but unfinished tasks.

Then what is the purpose of shutdown_called flag?

From the first glance I would expect that all checks that are true for preshutdown_called should be true for shutdown_called as well.

Ah, understood, flag set in destructor.
Yes, make sense to set preshutdown there too.

arthurpassos · 2025-06-11T11:04:25Z

As I understand, regular queries would still be processed. Literally the only thing that's stopped is the "swarm node" work. Wouldn't it make more sense to rename the command to something more meaningful? (e.g UNREGISTER FROM SWARM)

This is just a question, not a change request

ianton-ru · 2025-06-12T08:04:16Z

As I understand, regular queries would still be processed. Literally the only thing that's stopped is the "swarm node" work. Wouldn't it make more sense to rename the command to something more meaningful? (e.g UNREGISTER FROM SWARM)

It's a topic to discussion.
For now I don't have other use cases when we need and can do graceful shutdown from clickhouse side (shutdown main node is not that case, because errors are on client side in code which are out our control. May be possible to add special error "New query can't be executed, node prepared to shutdown", but it is useless without support on client side). But if we have one, I think that better to have single command PRESUTDOWN to prepare shutdown instead of several commands UNREGISTER FROM SWARM, STOP DOING SOMETHING, CLOSE SOMETHING ELSE, because all this hypothetical command make sense only when called in single pack. And better to add new logic under PRESHUTDOWN command.

ianton-ru · 2025-06-23T11:18:23Z

Need to make something "system stop swarm/system start swarm"

ianton-ru · 2025-06-27T14:09:13Z

SYSTEM STOP SWARM MODE
SYSTEM START WARM MODE
SYSTEM SHOW SWARM MODE

Co-authored-by: Davit Mnatobishvili <[email protected]>

ianton-ru · 2025-07-03T13:57:08Z

SYSTEM STOP SWARM MODE
SYSTEM START WARM MODE
SELECT value FROM system.metrics WHERE metric='IsSwarmModeEnabled'

arthurpassos · 2025-07-03T14:20:34Z

src/Interpreters/Context.cpp

 void Context::shutdown() TSA_NO_THREAD_SAFETY_ANALYSIS
 {
    shared->shutdown();
 }

+void Context::stopSwarmMode()


The individual operations of this function are atomic, but the function itself is not. Can't this cause problems?

For instance, consider two STOP/START queries running in different threads:

thread1 -> stop swarm mode
metrics updated, it is now false
context switch

thread2 -> start swarm mode
metrics updated, it is now true
swarm mode enabled

thread 1 -> resume
swarm mode disabled

The resulting state is: swarm mode boolean = true, metrics = false.

arthurpassos · 2025-07-03T14:23:42Z

src/Interpreters/InterpreterSystemQuery.cpp

@@ -693,6 +693,20 @@ BlockIO InterpreterSystemQuery::execute()
        case Type::START_MOVES:
            startStopAction(ActionLocks::PartsMove, true);
            break;
+        case Type::STOP_SWARM_MODE:


Perhaps this should also be a single atomic operation?

arthurpassos · 2025-07-03T14:27:07Z

src/QueryPipeline/RemoteQueryExecutorReadContext.cpp

-            read_context.packet.type = read_context.executor.getConnections().receivePacketTypeUnlocked(async_callback);
-            read_context.has_read_packet_part = PacketPart::Type;
-            suspend_callback();
+            if (e.code() == ErrorCodes::ATTEMPT_TO_READ_AFTER_EOF


Could you please add a comment explaining what ErrorCodes::ATTEMPT_TO_READ_AFTER_EOF usually means and why you are catching it?

The same for the below if statement

arthurpassos · 2025-07-03T14:29:33Z

src/Interpreters/Context.cpp

@@ -4500,6 +4502,21 @@ std::shared_ptr<Cluster> Context::tryGetCluster(const std::string & cluster_name
    return res;
 }

+void Context::unregisterInDynamicClusters()


Sorry for my ignorance.

What does "dynamic" cluster actually mean?

arthurpassos

The structure looks ok, we just need to sort out the atomic thing and I need to understand the dynamic cluster thing

ianton-ru added antalya antalya-25.3 labels Jun 11, 2025

ilejn reviewed Jun 11, 2025

View reviewed changes

SYSTEM PRESHUTDOWN to allow graceful shutdown node

9642c42

ianton-ru force-pushed the feature/system_preshutdown branch from a842ce6 to 9642c42 Compare June 11, 2025 09:17

ianton-ru and others added 3 commits June 12, 2025 12:26

Fix tests

42c201c

I hate coroutines

49cf8bf

Merge branch 'antalya-25.3' into feature/system_preshutdown

b75ce28

Enmk added antalya-25.3.3 swarms Antalya Roadmap: Swarms labels Jun 19, 2025

Change PRESUTDOWN on STOP SWARM command

8310a92

Enmk and others added 15 commits July 3, 2025 13:21

Using Altinity's branding instead of upstream's

5c3c509

Updated packages metadata

27ff656

Minor fixups

9a66f65

Fixed maintainer information in packages

40f4765

Compact favicon

5476f61

Altinity branding and colors for play.html

86cac4f

Minor: Typo fix + other small changes

00c3954

Update github-repo

d3df0ad

Co-authored-by: Davit Mnatobishvili <[email protected]>

Update binary.html

a7254d2

Co-authored-by: Davit Mnatobishvili <[email protected]>

lock_object_storage_task_distribution_ms setting

e4c48fa

Remove timeouts

d19ead0

Faster test

b998184

Moved changes to 25.3 section

e820d88

Fix after review

5105455

Allow data and metadata by different paths

14d42eb

ianton-ru added 3 commits July 3, 2025 13:21

Cut bucket from path

870025d

Iceberg catalog with S3 tables

cb0bf72

Dirty workaround to resolve correct endpoint in HEAD requests

eed67f8

svb-alt removed the antalya-25.3.3 label Jul 3, 2025

ianton-ru and others added 2 commits July 3, 2025 16:06

IsSwarmModeEnabled metric

5f00c10

Merge branch 'antalya-25.3' into feature/system_preshutdown

650b0f0

arthurpassos reviewed Jul 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SYSTEM PRESHUTDOWN command for graceful shutdown swarm node #852

SYSTEM PRESHUTDOWN command for graceful shutdown swarm node #852

Uh oh!

ianton-ru commented Jun 10, 2025 •

edited

Loading

Uh oh!

ilejn Jun 11, 2025 •

edited

Loading

Uh oh!

ianton-ru Jun 11, 2025

Uh oh!

ilejn Jun 11, 2025

Uh oh!

ianton-ru Jun 11, 2025

Uh oh!

arthurpassos commented Jun 11, 2025 •

edited

Loading

Uh oh!

ianton-ru commented Jun 12, 2025 •

edited

Loading

Uh oh!

ianton-ru commented Jun 23, 2025

Uh oh!

ianton-ru commented Jun 27, 2025

Uh oh!

ianton-ru commented Jul 3, 2025

Uh oh!

arthurpassos Jul 3, 2025

Uh oh!

arthurpassos Jul 3, 2025

Uh oh!

arthurpassos Jul 3, 2025

Uh oh!

arthurpassos Jul 3, 2025

Uh oh!

arthurpassos left a comment

Uh oh!

Uh oh!

SYSTEM PRESHUTDOWN command for graceful shutdown swarm node #852

Are you sure you want to change the base?

SYSTEM PRESHUTDOWN command for graceful shutdown swarm node #852

Uh oh!

Conversation

ianton-ru commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

Exclude tests:

Uh oh!

ilejn Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ianton-ru Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

ilejn Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

ianton-ru Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

arthurpassos commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianton-ru commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianton-ru commented Jun 23, 2025

Uh oh!

ianton-ru commented Jun 27, 2025

Uh oh!

ianton-ru commented Jul 3, 2025

Uh oh!

arthurpassos Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

arthurpassos Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

arthurpassos Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

arthurpassos Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

arthurpassos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ianton-ru commented Jun 10, 2025 •

edited

Loading

ilejn Jun 11, 2025 •

edited

Loading

arthurpassos commented Jun 11, 2025 •

edited

Loading

ianton-ru commented Jun 12, 2025 •

edited

Loading