Isn't rendering text inside a graphics framework and then saying that rendering text is surprisingly resource-intensive a bit like implementing an O(n log n) algorithm with nested for-loops and saying that the problem domain is surprisingly resource-intensive? Surely the thing that was causing text rendering to be resource-intensive was doing it inside a graphics framework?
If you don't rotate or zoom, then most of the work can be cached (at least when using latin fonts with few ligatures) which make things appear to be fast, but if you tried to render text in real time without caching (because you're allowing rotation and zoom, and you can't cache everything) then you'd see how expensive it is.