How Much Processing Power Does it Take to be Fast?

dmbaggett · on Jan 25, 2012

Intermediate point of reference: Crash Bandicoot, which came out at the end of 1996 and ran on the original PlayStation, used 2MB of RAM and a 33Mhz processor. As in Defender, there was no operating system to speak of, and though Sony "required" developers to use libraries, we ignored that dictum and coded the rendering pipeline in R3000 assembly. We could render about 1000 polygons per frame, 30 frames per second.

Now I have to work similarly hard just to get IE8's crappy JavaScript implementation to sort 1000 objects in under 33ms on a 3GHz machine. Sigh.

For even more extreme examples of performance within confines, see the excellent chronicle of Atari 2600 development, "Racing the Beam".

extension · on Jan 25, 2012

If you can live without such luxuries as an operating system, libraries, and drivers then you too can write software as efficient as Defender! But resources are well spent on abstraction, most of the time.

Lagged2Death · on Jan 25, 2012

But is it mere abstraction that causes me (to take a completely random example) to wait on Windows so often, when I've got so many GHz and so many GB at my disposal?

I think it's more likely to be a matter of priorities. It's more important to ship your PC software than it is to expend the effort required to make it slick and responsive. On non-PC platforms (like the iPad) that's not true; the platform's relative simplicity and responsiveness is a big selling point, so it's crucial to be fast and slick to fit in.

wladimir · on Jan 25, 2012

Agree that it is completely a matter of priorities. With hardware generations succeeding each other every few years, much faster every time, there was hardly reason to optimize everything. Bolting on new features was the highest priority.

But this has changed. See also http://herbsutter.com/welcome-to-the-jungle/ .

kjhughes · on Jan 25, 2012

Right. I didn't have time to write a short program, so I wrote a long one instead.

gcp · on Jan 25, 2012

I think this is entirely backwards. Programs don't get necessarily slower with length. Or faster for lack of it.

I didn't have time to write a program that handles all corner cases without losing responsiveness at any point, so I just wrote the simplest thing that mostly worked.

It's true that over-complication is the bane of good performance, but lets not pretend it is, or ever was, easy to get great performance.

ajuc · on Jan 25, 2012

> It's true that over-complication is the bane of good performance, but lets not pretend it is, or ever was, easy to get great performance.

It's easier to overcomplicate things, than it is to keep things simple. Especially when there are changes to specification(which means always :)). And often performance comes as a bonus, when we simplify things. So I think GP had it right.

Recently I've noticed in my code (ok, friend pointed it out to me), that I often leave too general code mixed with code with specific assumptions. Like - at first I've assumed nothing, so code was general, but more complicated than neccesary. Then I've learnt a few things, made more assumptions to get what I need faster, but I've left the general parts of code in place (maybe I'll need them later, besides more general == better, right?), even thought I could replace them with simpler and shorter and faster code, if I rethought these parts with new asumptions.

So code is more complicated than it's needed, and won't work in general case anyway, and depending where somebody starts reading the code, he will think differently about what he can assume about data.

Friend had replaced 2 instances of NumberAssigner class, synchronised between threads, with two instances of private int variables, and removed NumberAssigner class altogether :)

kjhughes · on Jan 25, 2012

It's not backwards at all. Understand that the allusion to concise letter writing speaks to more than literal lexical length; it speaks to the effort required to optimize expression. And none of this pretends that "it is, or ever was, easy to get great performance."

stonemetal · on Jan 25, 2012

Maybe, maybe not. There are exactly two hardware configurations for Ipads(and less than a dozen if you consider iPhones as well). The abstractions in the OS can be much thinner and more specialized to what is required to get good performance out of the hardware. For the PC there is much larger variation so the abstraction layers have more ground to cover. Though I have to agree with you, it seems performance is rather low on the feature list.

aaronblohowiak · on Jan 25, 2012

What about the simulator?

dedward · on Jan 25, 2012

no flamewars.... this must have spmethingj to do with windows giant mess of libraries and backwards comatabilities. my mac is running the same os install, cloned, upgraded, moved to newer gear ( but older than anyone i know hardware wise... the point being its not a fresh insall by any means... its years older than its hardware). it does have an intel ssd i stuck in there..... but it boots to fully useable in abou 10 seconds and shuts down (not sleep or hibernate) in less than 2 seconds. so.... i only used the macas an example because its what im using rught now. i know what part of the speed comes from the ssd... but the rest, is it just lack of backwards compatability and a simpler design? attention to detail? dunno.... but it rarely makes me wait for anything except stuff like compiling or number crunching, both of those being sets of unique cases where then amount of hardware neededto not have them interrupt your workflow is variable.

extension · on Jan 25, 2012

Abstraction can be very expensive. You wouldn't have to wait on your Windows app if it didn't have to share memory and storage with other apps. Web sites spend a few orders of magnitude in performance in order to abstract out operating systems, low-level programming, hardware form factors, physical proximity, deployment, and a host of other things.

Lagged2Death · on Jan 26, 2012

The sort of delays I'm speaking of happen even with no other applications running. Windows itself is probably the most demanding software system on many PCs.

extension · on Jan 28, 2012

Just the capability to share memory with other apps is what requires the sacrifice in efficiency. It means that apps can't control what and when parts of themselves are paged out to disk. If they could, they might only page out infrequently used data, as opposed to data that needs to be available quickly for responsiveness.

drzaiusapelord · on Jan 25, 2012

>when I've got so many GHz and so many GB at my disposal?

Because like most laptop and desktop users youre IO bound by your slow mechanical spinning disk. My desktops at work at home and my laptop all have SSD's now. I have a level of responsiveness that's very close to what I get with an ipad or my transformer.

Don't blame the software, blame the hardware. You can't really compare flash-based storage to mechanical storage. A lot of the "bloaty" OS's really aren't. Its the damn mechanical storage. I mean, there's literally a mechanical arm that roams around spinning platters. You can't compare that to electrons dancing on flash media.

Lagged2Death · on Jan 25, 2012

I'm aware of the mechanical I/O bottleneck, and I'd love to have SSDs in the machines I use, but funds do not permit.

But I doubt that's the whole story. The first time I type "Ctrl-F" in a VS2010 session, the disk thrashes and there's a noticeable delay of a couple of seconds before the Find tool loads. The delay is so bad that keystrokes are lost; if I not-particularly-quickly type "Ctrl-F mystring", the find tool searches for "tring."

I'm sure an SSD could improve that situation, but the real issue here isn't that the disk is slow, it's that the software is causing disk access where none should be necessary.

I'm sure there are reasons behind it; the find tool is a module in a modular system, it's not hanging around when it's not needed, it makes a smooth fit with the rest of the framework that was used to build VS2010, etc.

But the end result, the end user experience, is worse than it was in older, simpler versions of the product. (In this particular way, it's actually worse than a DOS-era editor, running on far more severely bottlenecked hardware, was 20 years ago.) The find tool was probably built in whatever was the most straightforward way to get it to fit into the horrendously complicated system (VS2010 on Windows) it's a part of, and no effort (or not enough effort) was expended to make it any better than that.

This is just one example that struck me today. There are plenty of other situations where a Windows machine full of Windows apps will cause the user to wait in situations that shouldn't require waiting. You can classify them as I/O bottlenecks, and that's not wrong, but I think it's missing the point. Everyone knows there's an I/O bottleneck there; when you develop code, you're supposed to bear that in mind.

But backwards compatibility and shipping the damn thing are more important than optimizing, and that goes for Windows itself and all the components thereof as well. The stuff we write today has to accommodate the foolish-in-retrospect decisions of yesterday.

rogerbinns · on Jan 26, 2012

This complaint is the reason why some apps include "quick" loaders under Windows. Effectively they have a tray icon, run after startup and load the files making up an app so they are already in memory when you run the app. Then when you launch the app for real it appears very quickly including all the functionality.

Of course people get annoyed at these things, taking up "memory", "delaying" system start etc and get rid of them.

I/O just tends to be more expensive on Windows. Not only does the code have to deal with backwards compatibility, old drivers and tag alongs (anti-virus, backup), potentially obstinate hardware, and numerous other things but often the OS is supported for a long time. For example Windows XP uses a 10MB buffer cache, irrespective of how much RAM you really have. (Can be changed via registry, code changed in Vista.)

smackfu · on Jan 25, 2012

Windows is slow because of the spinning disk. That's why simply replacing it with an SSD gives such dramatic speed improvements.

GregBuchholz · on Jan 25, 2012

    "Well spent"?

http://lambda-the-ultimate.org/node/4436

reedlaw · on Jan 25, 2012

I enjoyed reading this piece from the same site: http://prog21.dadgum.com/52.html

It's about how modern interpreted languages such as Ruby and Python are orders of magnitude faster than BASIC's such as found on typical 1980's systems.

saalweachter · on Jan 25, 2012

The speedup is actually precisely what you would expect from hardware over that time period.

2^((2009-1984) / 1.5) = 104031.915

He's accidentally demonstrated that we would expect a 1984 implementation of BASIC on modern hardware to be exactly as fast as modern Python.

Wilduck · on Jan 25, 2012

Which actually isn't damning of Python at all. Python has optimized for things other than speed. I'm pretty sure that you wouldn't choose a 1984 BASIC implementation (on modern hardware) over Python for any of your projects. You can have a Better Language at no cost to performance.

tectonic · on Jan 25, 2012

True, I expect (but don't know for sure) that BASIC was actually highly optimized to be as fast as it was.

bane · on Jan 26, 2012

Well, it was also usually written in the local machine language as well.

aaronblohowiak · on Jan 25, 2012

I'd wager that BASIC on a modern machine would outperform python. There are fully compiled versions of basic that are quite fast, but they aren't cool.

lallysingh · on Jan 25, 2012

Funny, there's no discussion of the modern bottlenecks: the bus in between the CPU & GPU, and the speed difference of the CPU and RAM. And no discussion of what else the modern machine has to do now: maintain a lot more devices (like radios), handle background work, and have a network stack.

It's not like we have faster hardware but suddenly got dumber programmers.

onemoreact · on Jan 25, 2012

The CPU & GPU bus is hardly a meaningful bottleneck when dealing with basic UI interactions. 3d games often have less input lag than the background windowing environment because they are optimized for that. The technical problem is simply excessive buffering and a poor interrupt implementation.

smackfu · on Jan 25, 2012

The real problem is levels of abstraction that let you do anything, but not necessarily quickly.

Compared to the old games, where you could do only one thing, but you could do it fast.

onemoreact · on Jan 25, 2012

Abstraction is not what causes modern operating system to buffer user input. You can often have the same image drawn to 7-10 buffers a user can see it. And honestly most of the time there is plenty of time for this so it's not a big deal, what you don't have time for is doing the full path for each tiny input. Let's suppose you want to drag a window. The secret is you don't need to use the same window location for every stage of the process as long as you find an acceptable middle ground so that not rendering part of a dragged window does not look that bad.

Now you need to do this for menu options etc etc.

PS: Games often do this trading a little less accuracy in a single frame to enable low latency responses.

s800 · on Jan 25, 2012

There were a lot of other tricks to be had:

- Some form of sprites were the rule, not the exception. Albeit very hardware coupled, usually a fixed width and maybe unlimited height. Fixed # of sprites per line as well, or they'd just turn invisible.

- Tile based (character) graphics, so there's a lot of games that aren't bitmapped, but rather character or tile based. So think updating a 40x25 screen (eg) not a 320x200 screen.

- Palette (indirect) based graphics = some tricks for animation here.

- A hardware supported transparent color.

- Direct control of frame buffer pointer = lots of goodness.

- Interrupts based on raster position

- DMA for sound

Sometimes:

- Some control of modulus based scrolling (think starting bitwise horizontal and vertical for up to 8 bits) = easy tile based bit scrolling.

- Blitter (with boolean ops) - maybe.

- Planar graphics.

Still, we're lazy today. Lots o layers in between us the hardware.

tintin · on Jan 25, 2012

The screen resolution he mentions is 0.1% of a 1024x768 resolution. So lets say we need 1000MHz to render 1024x768. Then I think modern software is doing alright. Because today we calculate more color bits, keep track of all the input devices, output 'real audio', keep the device online and so on.

bradfa · on Jan 25, 2012

320x256 is 10% of 1024x768 in terms of raw pixels, not considering bit depth. But bit depth and raw pixels shouldn't matter here, the GPU takes care of mucking about with those and the GPU in the iPad is amazingly fast for its power consumption.

I think the point is that the iPad is sluggish and it shouldn't be. 30 years ago people knew how to make something fast reacting with way less hardware behind it, somehow we've lost that and added complexity that doesn't really seem visible to the user.

aaronblohowiak · on Jan 25, 2012

>think the point is that the iPad is sluggish and it shouldn't be The OP is arguing the opposite-- the responsiveness of the iPad demonstrates that we can have responsive systems even with all of today's luxuries

iliis · on Jan 25, 2012

320x240/(1024*768)=0.10416666 which is about 10%. So we got ten times more pixel (and another factor 4-8 for more colors). Compared to a thousend times more cycles (and big factor for pipelining, dedicated graphic hardware, etc.). So, I think Op has got a point...

GregBuchholz · on Jan 25, 2012

(320. * 256)/(1024 * 768) = 0.104 = 10.4%

tintin · on Jan 25, 2012

Whoops I got my calculation wrong. It should be 10% indeed. So now I wonder why a GPU is needed...

Confusion · on Jan 26, 2012

Well, they're all forgetting a factor of 8 bits per pixel...

c1sc0 · on Jan 26, 2012

"And of course there's overhead involved in the application framework where messages get passed around to delegates and so on." Am I the only one who finds this an amusing & concise summary of iPhone development?