Skip to content

gce --quit-soon does not have expected behavior #6

@tahorst

Description

@tahorst

Running gce --quit-soon <name> does not delete the VM after a firework has completed in some cases. I would expect a worker to check for the metadata status after each firework completes but it looks like rapidfire can launch many fireworks before returning for the metadata check here:

rocket_launcher.rapidfire(
self.launchpad, self.fireworker, strm_lvl=self.strm_lvl,
max_loops=1, sleep_time=self.sleep_secs)
# Idle to the max.
idled = self.sleep_secs # rapidfire() just slept once
while not self.launchpad.run_exists(self.fireworker): # none ready to run
future_work = self.launchpad.future_run_exists(self.fireworker) # any ready or waiting?
if idled >= (self.idle_for_waiters if future_work else self.idle_for_rockets):
return 'idle'
req = gcp.instance_attribute('quit')
if req == 'soon' or req == 'when-idle':
return '"quit={}" request'.format(req)
FW_CONSOLE_LOGGER.debug(
'Sleeping for %s secs waiting for launchable rockets',
self.sleep_secs)
time.sleep(self.sleep_secs)
idled += self.sleep_secs
req = gcp.instance_attribute('quit')
if req == 'soon':
return '"quit={}" request'.format(req)

I think the arg nlaunches=1 should be passed to rapidfire to exit after launching only one firework so we can check for the quit metadata. I think rapidfire will launch as many rockets that are waiting as it can since it looks like it skips the loop check if more fireworks are ready.
https://github.com/materialsproject/fireworks/blob/6cb2a66d35239611ec2a1ccb807be38976198a0b/fireworks/core/rocket_launcher.py#L107-L126

Is the expectation to check for the metadata after each firework or to let rapidfire launch as many as it wants before checking?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions