This might be implemented with the same feature as the lack of clusters: a label suppresses other labels around it, with bigger cities suppressing more. Would be easy to do with a greedy algorithm.
It's not the implementation that surprises me - once I know it's the right thing to do, I can easily figure out a method how. What's surprising is the insight that it is the right thing to do.