Fixes accumulo-cluster script behavior:#5730
Conversation
"accumulo-cluster start" would briefly return control back to command line, continue execution, then hang but everything seemed to start as expected. Cause was background tasks were started and not waited on before exiting, but would still complete after exit. So, functionality wasn't wrong, it just was strange behavior/confusing to see. Now wait for those tasks to complete. closes apache#5698
There was a problem hiding this comment.
It would be better to wait on subprocesses forked from the current script.
A subshell can be created by wrapping a command in parens.
for host in "${hosts[@]}"; do
(ssh $host ....)
doneThe script can be forked so the subshells can be run concurrently, using & to follow a command:
for host in "${hosts[@]}"; do
(ssh $host ....) &
doneYou can cause the script itself to wait on all forked child processes by calling the wait command:
for host in "${hosts[@]}"; do
(ssh $host ....) &
done
waitThis would not rely on any ps output or grep pattern, and would just rely on child processes forked by this process.
Here's an example that executes two child processes, and finishes the second one first, and waits on both:
(sleep 10; echo " first") & (sleep 5; echo " second") & echo waiting...; wait; echo finished| # call debug to print the command only, or execute asynchronously if debug is off | ||
| function debugOrRun() { | ||
| debug "$(quote "$@")" || "$@" | ||
| debug "$(quote "$@")" || ("$@") & |
There was a problem hiding this comment.
Not suggesting a change for this PR, just sharing something I was looking into. Was wondering if we could wait where when the number of children is greater than some threshold. Found this while looking into that https://stackoverflow.com/a/21996387. Calling pgrep -c each time we start a new process would be N^2 type behavior though because all prev procs a listed each time a new one is added.
There was a problem hiding this comment.
We could keep a count and wait in batches, but it's hard to know how big the count should be, and making it configurable is a pain and adds complexity. I think that we should remember that these scripts are a cluster management reference implementation. If a user has a more complicated case that requires such batching, they can either run this command repeatedly with non-overlapping subsets of the servers to start processes on, or they can use alternate scripts centered around something like pssh or pdsh that has those kinds of features. I would hold off on this until we get feedback from users and determine that we actually need to have it.
ctubbsii
left a comment
There was a problem hiding this comment.
Looks good. A few small suggestions.
| # call debug to print the command only, or execute asynchronously if debug is off | ||
| function debugOrRun() { | ||
| debug "$(quote "$@")" || "$@" | ||
| debug "$(quote "$@")" || ("$@") & |
There was a problem hiding this comment.
We could keep a count and wait in batches, but it's hard to know how big the count should be, and making it configurable is a pain and adds complexity. I think that we should remember that these scripts are a cluster management reference implementation. If a user has a more complicated case that requires such batching, they can either run this command repeatedly with non-overlapping subsets of the servers to start processes on, or they can use alternate scripts centered around something like pssh or pdsh that has those kinds of features. I would hold off on this until we get feedback from users and determine that we actually need to have it.
Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
ctubbsii
left a comment
There was a problem hiding this comment.
Looks good. I verified that the -o option can be used multiple times, because I wasn't 100% sure about that, but I tested it with variations of ssh -o PreferredAuthentications=password -o BatchMode=yes localhost and both options were effective.
* Fixes accumulo-cluster script behavior: "accumulo-cluster start" would briefly return control back to command line, continue execution, then hang but everything seemed to start as expected. Cause was background tasks were started and not waited on before exiting, but would still complete after exit. So, functionality wasn't wrong, it just was strange behavior/confusing to see. Now wait for those tasks to complete. closes #5698 --------- Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
|
Some notes for documentation: See 8cd70d1 for changes made in main that are different from what was done in 2.1:
Same as changes for 2.1:
Also verified that we are still only calling |
"accumulo-cluster start" would briefly return control back to command line, continue execution, then hang but everything seemed to start as expected. Cause was background tasks were started and not waited on before exiting, but would still complete after exit. So, functionality wasn't wrong, it just was strange behavior/confusing to see. Now wait for those tasks to complete.
closes #5698