Skip to content

Fixes accumulo-cluster script behavior:#5730

Merged
kevinrr888 merged 4 commits into
apache:2.1from
kevinrr888:2.1-feature-5698
Jul 24, 2025
Merged

Fixes accumulo-cluster script behavior:#5730
kevinrr888 merged 4 commits into
apache:2.1from
kevinrr888:2.1-feature-5698

Conversation

@kevinrr888

Copy link
Copy Markdown
Member

"accumulo-cluster start" would briefly return control back to command line, continue execution, then hang but everything seemed to start as expected. Cause was background tasks were started and not waited on before exiting, but would still complete after exit. So, functionality wasn't wrong, it just was strange behavior/confusing to see. Now wait for those tasks to complete.

closes #5698

"accumulo-cluster start" would briefly return control back to command
line, continue execution, then hang but everything seemed to start as
expected. Cause was background tasks were started and not waited on
before exiting, but would still complete after exit. So, functionality
wasn't wrong, it just was strange behavior/confusing to see. Now wait
for those tasks to complete.

closes apache#5698
@kevinrr888 kevinrr888 added this to the 2.1.4 milestone Jul 9, 2025
@kevinrr888 kevinrr888 self-assigned this Jul 9, 2025
Comment thread assemble/bin/accumulo-cluster Outdated

@ctubbsii ctubbsii left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to wait on subprocesses forked from the current script.

A subshell can be created by wrapping a command in parens.

for host in "${hosts[@]}"; do
  (ssh $host ....)
done

The script can be forked so the subshells can be run concurrently, using & to follow a command:

for host in "${hosts[@]}"; do
  (ssh $host ....) &
done

You can cause the script itself to wait on all forked child processes by calling the wait command:

for host in "${hosts[@]}"; do
  (ssh $host ....) &
done
wait

This would not rely on any ps output or grep pattern, and would just rely on child processes forked by this process.

Here's an example that executes two child processes, and finishes the second one first, and waits on both:

(sleep 10; echo "  first") &  (sleep 5; echo "  second") &  echo waiting...; wait; echo finished

Comment thread assemble/bin/accumulo-cluster Outdated
@kevinrr888 kevinrr888 requested a review from ctubbsii July 14, 2025 16:06
Comment thread assemble/bin/accumulo-cluster
# call debug to print the command only, or execute asynchronously if debug is off
function debugOrRun() {
debug "$(quote "$@")" || "$@"
debug "$(quote "$@")" || ("$@") &

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting a change for this PR, just sharing something I was looking into. Was wondering if we could wait where when the number of children is greater than some threshold. Found this while looking into that https://stackoverflow.com/a/21996387. Calling pgrep -c each time we start a new process would be N^2 type behavior though because all prev procs a listed each time a new one is added.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could keep a count and wait in batches, but it's hard to know how big the count should be, and making it configurable is a pain and adds complexity. I think that we should remember that these scripts are a cluster management reference implementation. If a user has a more complicated case that requires such batching, they can either run this command repeatedly with non-overlapping subsets of the servers to start processes on, or they can use alternate scripts centered around something like pssh or pdsh that has those kinds of features. I would hold off on this until we get feedback from users and determine that we actually need to have it.

@ctubbsii ctubbsii left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few small suggestions.

Comment thread assemble/bin/accumulo-cluster Outdated
Comment thread assemble/bin/accumulo-cluster
# call debug to print the command only, or execute asynchronously if debug is off
function debugOrRun() {
debug "$(quote "$@")" || "$@"
debug "$(quote "$@")" || ("$@") &

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could keep a count and wait in batches, but it's hard to know how big the count should be, and making it configurable is a pain and adds complexity. I think that we should remember that these scripts are a cluster management reference implementation. If a user has a more complicated case that requires such batching, they can either run this command repeatedly with non-overlapping subsets of the servers to start processes on, or they can use alternate scripts centered around something like pssh or pdsh that has those kinds of features. I would hold off on this until we get feedback from users and determine that we actually need to have it.

kevinrr888 and others added 2 commits July 24, 2025 09:43
Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>

@ctubbsii ctubbsii left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I verified that the -o option can be used multiple times, because I wasn't 100% sure about that, but I tested it with variations of ssh -o PreferredAuthentications=password -o BatchMode=yes localhost and both options were effective.

@kevinrr888 kevinrr888 merged commit 188fc54 into apache:2.1 Jul 24, 2025
8 checks passed
@kevinrr888 kevinrr888 deleted the 2.1-feature-5698 branch July 24, 2025 19:27
kevinrr888 added a commit that referenced this pull request Jul 24, 2025
* Fixes accumulo-cluster script behavior:

"accumulo-cluster start" would briefly return control back to command
line, continue execution, then hang but everything seemed to start as
expected. Cause was background tasks were started and not waited on
before exiting, but would still complete after exit. So, functionality
wasn't wrong, it just was strange behavior/confusing to see. Now wait
for those tasks to complete.

closes #5698

---------

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
@kevinrr888

Copy link
Copy Markdown
Member Author

Some notes for documentation:

See 8cd70d1 for changes made in main that are different from what was done in 2.1:

  • removes ssh_wait function from the script in main... replaced with wait
  • updates some comments to more accurately reflect the new async functionality

Same as changes for 2.1:

  • debugOrRun renamed to debugOrRunAsync
  • SSH flag changes

Also verified that we are still only calling debugOrRunAsync within control_services... Wasn't sure if this changed from 2.1 to main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants