While I've not measured the performance of the approaches, from reading the Python patch discussed in the article it would appear that Objective-C uses a more intelligent approach to maintaining the reference count in the face of concurrent manipulation.
The patch to Python involves guarding every increment and decrement of a reference count with a single pthread mutex. This pthread mutex would become a major source of contention if multiple threads are attempting operations that manipulate the reference count. Pthread mutexes are also a relatively heavyweight synchronization mechanism, and their overhead would impact performance even when the single mutex was uncontended.
In contrast, Objective-C uses more efficient means of managing the reference count. The implementation of -[NSObject retain] uses spinlocks to guard the side tables that hold the reference counts. There are multiple such side tables and associated spinlocks in order to reduce contention if multiple threads are attempting to manipulate the reference counts of different objects. CoreFoundation, which provide the implementations of many common types such as strings and arrays, uses an inline reference count that is manipulated using an atomic compare-and-swap operations. This reduces contention at the cost of increasing the storage size of every object of this type.
The patch to Python involves guarding every increment and decrement of a reference count with a single pthread mutex. This pthread mutex would become a major source of contention if multiple threads are attempting operations that manipulate the reference count. Pthread mutexes are also a relatively heavyweight synchronization mechanism, and their overhead would impact performance even when the single mutex was uncontended.
In contrast, Objective-C uses more efficient means of managing the reference count. The implementation of -[NSObject retain] uses spinlocks to guard the side tables that hold the reference counts. There are multiple such side tables and associated spinlocks in order to reduce contention if multiple threads are attempting to manipulate the reference counts of different objects. CoreFoundation, which provide the implementations of many common types such as strings and arrays, uses an inline reference count that is manipulated using an atomic compare-and-swap operations. This reduces contention at the cost of increasing the storage size of every object of this type.