“A reference counting strategy would be more efficient in processor utilization compared to garbage collection as it does not need to perform processor intensive sweeps through memory identifying unreferenced objects. So reference counting trades memory for processor cycles.”
I think it’s the reverse.
Firstly, garbage collection (GC) doesn’t identify unreferenced objects, it identifies referenced objects (GC doesn’t collect garbage). That’s not just phrasing things differently, as it means that the amount of garbage isn’t a big factor in the time spent in garbage collection. That’s what makes GC (relatively) competitive, execution-time wise. However, it isn’t competitive in memory usage. There, consensus is that you need more memory for the same performance (https://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf: with five times as much memory, an Appel-style generational collector with a non-copying mature space matches the performance of reachability-based explicit memory management. With only three times as much memory, the collector runs on average 17% slower than explicit memory management)
(That also explains why iPhones can do with so much less memory than phones running Android)
Secondly, the textbook implementation of reference counting (RC) in a multi-processor system is inefficient because modifying reference counts requires expensive atomic instructions.
So, reference counting gets better memory usage at the price of more atomic operations = less speed.
That last PDF describes a technique that doubles the speed of RC operations, decreasing that overhead to about 20-25%.
It wouldn’t surprise me if these new ARM macs use a similar technique to speed up RC operations.
It might also help that the memory model of ARM is weaker than that of x64, but I’m not sure that’s much of an advantage for keeping reference counts in sync across cores.
I think it’s the reverse.
Firstly, garbage collection (GC) doesn’t identify unreferenced objects, it identifies referenced objects (GC doesn’t collect garbage). That’s not just phrasing things differently, as it means that the amount of garbage isn’t a big factor in the time spent in garbage collection. That’s what makes GC (relatively) competitive, execution-time wise. However, it isn’t competitive in memory usage. There, consensus is that you need more memory for the same performance (https://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf: with five times as much memory, an Appel-style generational collector with a non-copying mature space matches the performance of reachability-based explicit memory management. With only three times as much memory, the collector runs on average 17% slower than explicit memory management)
(That also explains why iPhones can do with so much less memory than phones running Android)
Secondly, the textbook implementation of reference counting (RC) in a multi-processor system is inefficient because modifying reference counts requires expensive atomic instructions.
Swift programs spend about 40% of their time modifying reference counts (http://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf)
So, reference counting gets better memory usage at the price of more atomic operations = less speed.
That last PDF describes a technique that doubles the speed of RC operations, decreasing that overhead to about 20-25%.
It wouldn’t surprise me if these new ARM macs use a similar technique to speed up RC operations.
It might also help that the memory model of ARM is weaker than that of x64, but I’m not sure that’s much of an advantage for keeping reference counts in sync across cores.