fix(core): eliminate TOCTOU in MachineManager::create by holding write lock#151
fix(core): eliminate TOCTOU in MachineManager::create by holding write lock#151
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a race in MachineManager::create() where concurrent creates using the same machine name could both pass an existence check and then overwrite each other on insert, potentially leaving orphaned VM resources/directories.
Changes:
- Acquire the
machineswrite lock at the start ofcreate()to make the existence check + insert atomic. - Reuse the same write guard for the final
insert()to avoid the read-then-write TOCTOU window.
| // Hold the write lock for the entire create operation to prevent TOCTOU | ||
| // races: without this, two concurrent creates with the same name could | ||
| // both pass the existence check before either inserts. | ||
| let mut machines = self.machines.write().map_err(|_| CoreError::LockPoisoned)?; | ||
|
|
There was a problem hiding this comment.
create() now holds a std::sync::RwLock write guard while doing filesystem I/O (create_dir_all), VM creation, and persistence writes. That can block all readers/writers of machines (e.g., list/get/start) for the entire duration of a potentially slow operation, impacting daemon responsiveness. Consider reserving the name under the lock (e.g., track an in-progress set or insert a temporary "creating" entry) and then releasing the lock while performing I/O, or using an atomic filesystem reservation (e.g., create_dir/lockfile with create_new) to avoid the TOCTOU without a long-lived global write lock.
| // Hold the write lock for the entire create operation to prevent TOCTOU | ||
| // races: without this, two concurrent creates with the same name could | ||
| // both pass the existence check before either inserts. | ||
| let mut machines = self.machines.write().map_err(|_| CoreError::LockPoisoned)?; | ||
|
|
||
| if machines.contains_key(&config.name) { | ||
| return Err(CoreError::already_exists(config.name)); |
There was a problem hiding this comment.
This change is specifically about preventing concurrent creates from duplicating/overwriting machines, but there’s no test asserting the new behavior. Since this file already has async tests, please add a concurrency test that runs two create() calls with the same name in parallel and asserts exactly one succeeds (the other returns already-exists) and that only one machine is registered/persisted.
Summary
create()instead of read-then-writeTest plan
arcbox machine createfrom CLI + desktop doesn't duplicateCloses ABX-230