That was really interesting, especially this part:
'Reference counting is a really lousy memory-management technique for free-threading. This was already widely known, but the performance numbers put a more concrete figure on it. This will definitely be the most challenging issue for anyone attempting a GIL removal patch.'
If ref counting is so bad with threads, how does Objective-C do it performantly?
While I've not measured the performance of the approaches, from reading the Python patch discussed in the article it would appear that Objective-C uses a more intelligent approach to maintaining the reference count in the face of concurrent manipulation.
The patch to Python involves guarding every increment and decrement of a reference count with a single pthread mutex. This pthread mutex would become a major source of contention if multiple threads are attempting operations that manipulate the reference count. Pthread mutexes are also a relatively heavyweight synchronization mechanism, and their overhead would impact performance even when the single mutex was uncontended.
In contrast, Objective-C uses more efficient means of managing the reference count. The implementation of -[NSObject retain] uses spinlocks to guard the side tables that hold the reference counts. There are multiple such side tables and associated spinlocks in order to reduce contention if multiple threads are attempting to manipulate the reference counts of different objects. CoreFoundation, which provide the implementations of many common types such as strings and arrays, uses an inline reference count that is manipulated using an atomic compare-and-swap operations. This reduces contention at the cost of increasing the storage size of every object of this type.
I think it mostly is by not counting as much. In typical Objective-C code, you will find that only the UI is actual Objective-C. Also, many fields of Cocoa UI classes are 'plain old data' such as 'int', 'BOOL' or enum's. That keeps the number of objects down and decreases the amount of bookkeeping.
The GUI library also is smart enough to not allocate more objects than needed. For example, in a table, only the cells actually on the screen really exist, and all controls in a single window share a NSTextView called 'the field editor' that is used for editing text (http://developer.apple.com/library/mac/documentation/Cocoa/C...)
Finally, I do not think it is that fast. It is just modern hardware that is fast.
1. Objective-C doesn't do automatic runtime reference counting the way Python does AFAIK. You either do it yourself where needed, or it's automatically inserted and heavily optimized at compile time by ARC. (I could have misunderstood how Python does it, but I think it does more refcounting than Objective-C does.)
2. Although Objective-C is pretty fast in the grand scheme of things, using its object system does entail a performance penalty when compared to similar languages like plain C or C++.
3. The garbage collector, despite being really immature and pretty quickly falling out of favor, actually did often give better performance in heavily threaded situations.