Skip to content

core/allocator: heap_free_all should periodically yield the allocator lock #599

@nwf

Description

@nwf

At the moment, the entirety of heap_free_all runs under a single acquire/release of the allocator lock:

__cheriot_minimum_stack(0x1a0) ssize_t
heap_free_all(AllocatorCapability heapCapability)
{
STACK_CHECK(0x1a0);
LockGuard g{lock};
auto *capability = malloc_capability_unseal(heapCapability);
if (capability == nullptr)
{
Debug::log<DebugLevel::Warning>("Invalid heap capability {}",
heapCapability);
return -EPERM;
}
auto chunk = gm->heapStart.cast<MChunkHeader>();
ptraddr_t heapEnd = chunk.top();
ssize_t freed = 0;
do
{
if (chunk->is_in_use() && !chunk->isSealedObject)
{
auto size = chunk->size_get();
if (heap_free_chunk(
*capability, *chunk, gm->chunk_body_size(*chunk)) == 0)
{
freed += size;
}
}
chunk = static_cast<MChunkHeader *>(chunk->cell_next());
} while (chunk.address() < heapEnd);
// If there are any threads blocked allocating memory, wake them up.
if ((freeFutex > 0) && (freed > 0))
{
Debug::log("Some threads are blocking on allocations, waking them");
freeFutex = 0;
freeFutex.notify_all();
}
return freed;
}

While it's true that our heaps will never have gazillions of objects in it, this nevertheless is a source of long-tail latency, which we make some efforts to avoid elsewhere (for example, our O(1) de-quarantining per heap operation).

I believe heap_free_all can safely drop and reacquire the heap lock if it first takes an ephemeral claim on the chunk it is currently processing. (The thread currently running heap_free_all by definition will have no ephemeral claims that this risks clobbering: heap_free_all is a cross-compartment call, and so any ephemeral claims have been shed.) Because the allocator lock is held, the hazard epoch is necessarily both stable and even, and so there's no need for the full heap_claim_ephemeral machinery: just store the pointer to the chunk body into one of the slots returned by switcher_thread_hazard_slots.
No, that doesn't work: re-acquiring the lock might cross-call into the scheduler, at which point we lose ephemeral claims. Rats. We could, however, because we are the allocator, take a non-ephemeral claim on the object with an internal "infinite" quota. That's a bit more work, but only a bit.

Whether or not we should wake sleeping threads as part of this lock yield is a good question, but can be a follow-up issue. (Maybe we should be recording something about how much progress needs to be made before we wake the thundering herd?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions