Back in those days we didn't have that many cores,etc so the raw computation power of the PS3 was an issue in itself and the SPU was a kind of middle-ground between shaders and CPU's so you probably had to use a regular CPU to emulate it.
We have multicore machines with far more cores today so we can match the computation unit count and performance.
The other part was that older consoles (8 and 16 bit era) really needed a lot of cycle-exact emulation to not fail and that requires an order of magnitude faster CPU's to manage emulating it properly and with CPU's hitting the Ghz limits around the same time we thought it'd be impossible to do that level of cycle-exactness needed.
Luckily though, because the PS3 needed optimal multi-core programming and the way to achieve the maximum throughput for that was to use DMA channels to shuffle data between CPU/SPU parts, emulator authors can probably use them as choke-points to handle emulation on a slightly more high level and avoid trying to manage cycle-exact timings.
The nice thing about more recent architectures is that no one (including the original authors) can rely on cycle-exactness because of the basically unpredictable effects of caches and speculative execution and bus contention and (...).
Most of these, as a big exception, do not apply to the Cell running code on data in its local memory, but fortunately, it's different as seen from other system components, as you say.
We have multicore machines with far more cores today so we can match the computation unit count and performance.
The other part was that older consoles (8 and 16 bit era) really needed a lot of cycle-exact emulation to not fail and that requires an order of magnitude faster CPU's to manage emulating it properly and with CPU's hitting the Ghz limits around the same time we thought it'd be impossible to do that level of cycle-exactness needed.
Luckily though, because the PS3 needed optimal multi-core programming and the way to achieve the maximum throughput for that was to use DMA channels to shuffle data between CPU/SPU parts, emulator authors can probably use them as choke-points to handle emulation on a slightly more high level and avoid trying to manage cycle-exact timings.