-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Running gce --quit-soon <name> does not delete the VM after a firework has completed in some cases. I would expect a worker to check for the metadata status after each firework completes but it looks like rapidfire can launch many fireworks before returning for the metadata check here:
borealis/borealis/fireworker.py
Lines 185 to 208 in d24b972
| rocket_launcher.rapidfire( | |
| self.launchpad, self.fireworker, strm_lvl=self.strm_lvl, | |
| max_loops=1, sleep_time=self.sleep_secs) | |
| # Idle to the max. | |
| idled = self.sleep_secs # rapidfire() just slept once | |
| while not self.launchpad.run_exists(self.fireworker): # none ready to run | |
| future_work = self.launchpad.future_run_exists(self.fireworker) # any ready or waiting? | |
| if idled >= (self.idle_for_waiters if future_work else self.idle_for_rockets): | |
| return 'idle' | |
| req = gcp.instance_attribute('quit') | |
| if req == 'soon' or req == 'when-idle': | |
| return '"quit={}" request'.format(req) | |
| FW_CONSOLE_LOGGER.debug( | |
| 'Sleeping for %s secs waiting for launchable rockets', | |
| self.sleep_secs) | |
| time.sleep(self.sleep_secs) | |
| idled += self.sleep_secs | |
| req = gcp.instance_attribute('quit') | |
| if req == 'soon': | |
| return '"quit={}" request'.format(req) |
I think the arg nlaunches=1 should be passed to rapidfire to exit after launching only one firework so we can check for the quit metadata. I think rapidfire will launch as many rockets that are waiting as it can since it looks like it skips the loop check if more fireworks are ready.
https://github.com/materialsproject/fireworks/blob/6cb2a66d35239611ec2a1ccb807be38976198a0b/fireworks/core/rocket_launcher.py#L107-L126
Is the expectation to check for the metadata after each firework or to let rapidfire launch as many as it wants before checking?