-
Notifications
You must be signed in to change notification settings - Fork 121
Description
While investigating some intermittent deadlocks being observed in a processes that is part of a suite of software I help maintain, I think I ended up tracking down a potential deadlock.
(Partially due to how pointer hashes could collide for the for the 'spinlocks' used in a few places... but I'm not sure any hash algorithm would ensure this couldn't be hit, just help reduce the rate it happens at)
The process seems to only lock up during initial startup. If gets through initial startup, it seems like it continues chugging along without any hiccups. The following rabbit hole I'm about to describe does use this fact as a basis for making some assumptions on which code paths are taken.
Stack trace of the deadlocked thread (this is a multi threaded application, but none of the other threads are really interesting to look at):
Thread 1 (Thread 24814.24814):
#0 0x0000ffffa4ba0120 in clock_nanosleep ()
#1 0x0000ffffa4ba570c in nanosleep ()
#2 0x0000ffffa4ba55e4 in sleep ()
#3 0x0000ffffa4d4d9e0 in ?? ()
#4 0x0000ffffa4d4dec4 in objc_sync_enter ()
#5 0x0000ffffa4d3e6dc in ?? ()
#6 0x0000ffffa4d46b34 in objc_msg_lookup_sender ()
#7 0x0000ffffa4d4bfe0 in objc_retain ()
#8 0x0000ffffa4d4eb64 in objc_getProperty ()
#9 0x0000ffffa14a2fa0 in -[SomeObj start] (self=0xaaaafd5ab608, _cmd=<optimized out>)
... remaining trace isn't very relevant
So it looks like inside objc_getProperty
we attempt to retain
, while inside a spin_lock(0xaaaafd5ab608 + propOffset)
OBJC_PUBLIC id objc_getProperty(id obj, SEL _cmd, ptrdiff_t offset, BOOL isAtomic) {
// ...
char *addr = (char*)obj;
addr += offset;
// ...
if (isAtomic) { // in our case, `isAtomic` is `true`
volatile int *lock = lock_for_pointer(addr);
lock_spinlock(lock); // locked on lock_for_pointer(obj + offset)
ret = *(id*)addr;
ret = objc_retain(ret); // <==== we're inside here @ bt #7
// ...
At this point we've taken the lock_spinlock(0xaaaafd5ab608 + propOffset)
At #6
the objc_msg_lookup_sender()
calls the objc_msg_lookup_internal()
version, and where this deadlock only happens during initial startup, I'm assuming we're hitting the 'uninstalled dtable' path:
static __attribute__((always_inline)) struct objc_slot2 *objc_msg_lookup_internal(id *receiver, SEL selector, uint64_t *version) {
// ...
struct objc_slot2 * result = objc_dtable_lookup(class->dtable, selector->index);
if (UNLIKELY(0 == result)) {
dtable_t dtable = dtable_for_class(class);
/* Install the dtable if it hasn't already been initialized. */
if (dtable == uninstalled_dtable) {
objc_send_initialize(*receiver); // <===== `receiver == (0xaaaafd5ab608 + propOffset)`
// ...
Inside objc_send_initialize()
, we essentially hit a @synchronized()
block:
OBJC_PUBLIC void objc_send_initialize(id object) {
Class class = classForObject(object);
// ...
Class meta = class->isa;
// ...
LOCK_OBJECT_FOR_SCOPE((id)meta); // <=== essentially `meta == [(0xaaaafd5ab608 + propOffset).class]->isa`
// ...
}
Essentially LOCK_OBJECT_FOR_SCOPE()
is a macro that runs objc_sync_enter()
/ objc_sync_exit()
automatically for the current lexical scope, like a @synchronized
block would. So now we're at #4
, and we attempt to either loop up or create the lock for this property:
OBJC_PUBLIC int objc_sync_enter(id object) {
if ((object == 0) || isSmallObject(object)) { return 0; }
struct reference_list *list = referenceListForObject(object, YES); // <== looking up the `reference_list` can cause a `lock_spinlock()`
// to be taken on the `object` when `create == YES`
// ...
}
Once referenceListForObject()
is called with create == YES
, then within referenceListForObject()
, if the reference_list->lock
needs to be initialized, we take another spin lock, this time on [(0xaaaafd5ab608 + propOffset).class]->isa
essentially:
static struct reference_list* referenceListForObject(id object, BOOL create) {
if (class_isMetaClass(object->isa)) {
Class cls = (Class)object;
if ((NULL == cls->extra_data) && create) {
volatile int *lock = lock_for_pointer(cls);
// ...
lock_spinlock(lock); // <==== maybe here?
// ...
}
return cls->extra_data;
}
Class hiddenClass = findHiddenClass(object);
if ((NULL == hiddenClass) && create) {
volatile int *lock = lock_for_pointer(object);
lock_spinlock(lock); // <==== maybe here?
// ...
}
return hiddenClass ? object_getIndexedIvars(hiddenClass) : NULL;
}
In either code path where the lock needs to be initialized, we're essentially now taking a second lock_for_pointer()
, and I'm thinking in some cases the pointer hashes collide, making us lock_spinlock()
on the same spinlock for both pointers... and in turn, deadlocking the main thread.
Unfortunately, I haven't figured out a good solution for this one yet. But I at least wanted to get this issue posted, in case anyone else has seen weird behavior with atomic properties, or this sparks any ideas on how to fix the issue.