Rare class init / atomic property deadlock

While investigating some intermittent deadlocks being observed in a processes that is part of a suite of software I help maintain, I think I ended up tracking down a potential deadlock. 

(Partially due to how pointer hashes could collide for the for the 'spinlocks' used in a few places... but I'm not sure _any_ hash algorithm would ensure this _couldn't_ be hit, just help reduce the rate it happens at)

The process seems to _only_ lock up during initial startup. If gets through initial startup, it seems like it continues chugging along without any hiccups.  The following rabbit hole I'm about to describe does use this fact as a basis for making some assumptions on which code paths are taken.

----

Stack trace of the deadlocked thread (this is a multi threaded application, but none of the other threads are really interesting to look at):
```
Thread 1 (Thread 24814.24814):
#0  0x0000ffffa4ba0120 in clock_nanosleep () 
#1  0x0000ffffa4ba570c in nanosleep () 
#2  0x0000ffffa4ba55e4 in sleep () 
#3  0x0000ffffa4d4d9e0 in ?? ()
#4  0x0000ffffa4d4dec4 in objc_sync_enter ()
#5  0x0000ffffa4d3e6dc in ?? ()
#6  0x0000ffffa4d46b34 in objc_msg_lookup_sender ()
#7  0x0000ffffa4d4bfe0 in objc_retain ()
#8  0x0000ffffa4d4eb64 in objc_getProperty ()
#9  0x0000ffffa14a2fa0 in -[SomeObj start] (self=0xaaaafd5ab608, _cmd=<optimized out>)
... remaining trace isn't very relevant
```

So it looks like inside `objc_getProperty` we attempt to `retain`, while inside a `spin_lock(0xaaaafd5ab608 + propOffset)` 
```c
OBJC_PUBLIC id objc_getProperty(id obj, SEL _cmd, ptrdiff_t offset, BOOL isAtomic) {
	// ...
    char *addr = (char*)obj;
	addr += offset;
    // ...
	if (isAtomic) {                                 // in our case, `isAtomic` is `true`
		volatile int *lock = lock_for_pointer(addr);
		lock_spinlock(lock);                        // locked on lock_for_pointer(obj + offset) 
		ret = *(id*)addr;
		ret = objc_retain(ret);                     // <==== we're inside here @ bt #7
	    // ...
```

**_At this point we've taken the `lock_spinlock(0xaaaafd5ab608 + propOffset)`_**  

At `#6` the `objc_msg_lookup_sender()` calls the `objc_msg_lookup_internal()` version, and where this deadlock only happens during initial startup, I'm assuming we're hitting the 'uninstalled dtable' path:
```c
static __attribute__((always_inline)) struct objc_slot2 *objc_msg_lookup_internal(id *receiver, SEL selector, uint64_t *version) {
    // ...
	struct objc_slot2 * result = objc_dtable_lookup(class->dtable, selector->index);
	if (UNLIKELY(0 == result)) {
		dtable_t dtable = dtable_for_class(class);
		/* Install the dtable if it hasn't already been initialized. */
		if (dtable == uninstalled_dtable) {
			objc_send_initialize(*receiver);   // <===== `receiver == (0xaaaafd5ab608 + propOffset)`
            // ...
```


Inside `objc_send_initialize()`, we essentially hit a `@synchronized()` block:
```c
OBJC_PUBLIC void objc_send_initialize(id object) {
    Class class = classForObject(object);
    // ...
    Class meta = class->isa;
    // ...
    LOCK_OBJECT_FOR_SCOPE((id)meta);  // <=== essentially `meta == [(0xaaaafd5ab608 + propOffset).class]->isa`
    // ...
}
```

Essentially `LOCK_OBJECT_FOR_SCOPE()` is a macro that runs `objc_sync_enter()` / `objc_sync_exit()` automatically for the current lexical scope, like a `@synchronized` block would.  So now we're at `#4`, and we attempt to either loop up or create the lock for this property:
```c
OBJC_PUBLIC int objc_sync_enter(id object) {
	if ((object == 0) || isSmallObject(object)) { return 0; }
	struct reference_list *list = referenceListForObject(object, YES);  // <== looking up the `reference_list` can cause a `lock_spinlock()`
                                                                        //     to be taken on the `object` when `create == YES`
    // ...
}
```

Once `referenceListForObject()` is called with `create == YES`, then within `referenceListForObject()`, if the `reference_list->lock` needs to be initialized, we take another spin lock, this time on `[(0xaaaafd5ab608 + propOffset).class]->isa` essentially:
```c
static struct reference_list* referenceListForObject(id object, BOOL create) {
	if (class_isMetaClass(object->isa)) {
		Class cls = (Class)object;
		if ((NULL == cls->extra_data) && create) {
			volatile int *lock = lock_for_pointer(cls);
            // ...
			lock_spinlock(lock);            //  <====  maybe here?
			// ...
		}
		return cls->extra_data;
	}
	Class hiddenClass = findHiddenClass(object);
	if ((NULL == hiddenClass) && create) {
		volatile int *lock = lock_for_pointer(object);
		lock_spinlock(lock);                //  <====  maybe here?
		// ...
	}
	return hiddenClass ? object_getIndexedIvars(hiddenClass) : NULL;
}
```

In either code path where the lock needs to be initialized, we're essentially now taking a second `lock_for_pointer()`, and I'm thinking in some cases the pointer hashes collide, making us `lock_spinlock()` on the same spinlock for both pointers... and in turn, deadlocking the main thread.


Unfortunately, I haven't figured out a good solution for this one yet. But I at least wanted to get this issue posted, in case anyone else has seen weird behavior with atomic properties, or this sparks any ideas on how to fix the issue.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rare class init / atomic property deadlock #284

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rare class init / atomic property deadlock #284

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions