If only hardware research and optimization had gone the way of stack machines in...

rayiner · on March 12, 2012

It wasn't stack machines so much as it was RISC and UNIX. The RISC philosophy basically took every instruction not needed for efficiently implemented C and punted it to software, even if it could be efficiently implemented in hardware. Then UNIX took pretty much every machine feature not needed for running C programs and hid them from software.

Take, for example, read and write barriers for GC. On a modern system with virtual memory, each memory access is run through a TLB which has among other things protection and page out bits. That could easily be supplemented with a couple of extra bits to implement GC barriers. While we were getting greedy, we could even add a lightweight trap mechanism of handling the associated faults in user space, at the user's privilege level, to avoid the expense of transitioning into kernel privilege level (indeed Intel and AMD implement all the necessary functionality in their virtualization extensions).

jgon · on March 12, 2012

This is very much in line with what Alan Kay says about current chip architectures compared to what was available when he was coming up in the field. He often talks about the Burroughs machines and how much more advanced they were compared to our current CPUs and laments that for all the gains that Moore has given us, we have lost incredible amounts of speed via our architectures being aimed solely at C.

One anecdote that he likes to use is to compare the speed of Smalltalk running on the Xerox Alto computer with Smalltalk running on a current CPU that is 50,000x faster than the Alto. He notes that benchmarks run in both systems are only 50x faster, claiming that this means we've lost a factor of 1000x in efficiency just on the basis of using inferior architectures (at least inferior if your target language isn't C).

Part of me is thankful for the relentless push of x86 and the speed gains realized, but another part of me really regrets that all of the crazy architectures from the 70's and 80's have been lost.

rayiner · on March 12, 2012

The 1000x figure is probably an overstatement, as is the 50,000x figure.

The Alto's main memory had a cycle time of about 850 nsec, and could transfer 2 16-bit words per cycle: http://www.computer-refuge.org/bitsavers/pdf/xerox/parc/tech....

This gives a main memory bandwidth of roughly 5 MB/sec. A top-end single CPU system today has probably 25 GB/sec available to it, a factor of 5,000 more. Moreover, much of that is achieved through optimizing burst reads--actual sustained random access throughput is going to be much lower and the delta much less.

Given modern implementation techniques, the actual efficiency loss is probably on the order of 10x rather than 1000x. And much of it is the result of the memory wall, which has been driven by DRAM physics rather than micro-architecture. Doing a couple of memory lookups to support dynamic dispatch is a hell of a lot more expensive, relative to an ALU operation, these days than it was 30 years ago.

sedachv · on March 13, 2012

Kay is greatly exaggerating those figures, and tends to blame problems with the modern software stack on the hardware.

Dan Ingalls gave a talk in 2005 about the history of Smalltalk implementations in which he mentioned the Xerox NoteTaker. The NoteTaker was a PC powered by the 8086, and according to Ingalls executed Smalltalk VM bytecode at twice the speed of the Alto. Here is the link to the talk: http://www.youtube.com/watch?v=pACoq7r6KVI#t=42m50s and here is my analysis with more details on the specs and economics of the NoteTaker: http://carcaddar.blogspot.com/2012/01/personal-computer-youv...

sedachv · on March 13, 2012

What you're saying about the VM is a software problem, not a hardware one. You're right about the VT-x extensions, and read barriers for GC is exactly what Azul is trying to do with their kernel patches, but there was no reason why the GC couldn't have been moved into kernel space before virtualization extensions came along.

Same thing with stack machines vs registers (why would you ever want a stack machine for CPS-compiled code?), tagged arithmetic (SPARC has tagged arithmetic instructions, but it turns out pipelining makes "manual" tag-checking just as fast), etc.

If anything, a pipelined, superscalar RISC CPU benefits Lisp more than it does C.

rayiner · on March 13, 2012

> What you're saying about the VM is a software problem, not a hardware one.

The strict conceptual partitioning of software problems and hardware problems is quite passé these days. In the last 10 years, Intel and AMD have added a tremendous amount of very CISC-y functionality into x86 (e.g. string search instructions), in recognition of the fact that exploding transistor budgets make hardware the right place to implement certain things.

> but there was no reason why the GC couldn't have been moved into kernel space before virtualization extensions came along.

GC couldn't have been moved into kernel space because of the second part of my argument: UNIX hides hardware features not necessary to run C programs. The MMU can do quite a lot that is obscured behind the very limited mmap() abstraction.

pnathan · on March 12, 2012

I have a profound regret for what was lost in the transition to the Windows/Unix & C worlds. We've gained in the transition, but what was lost is so much disregarded with the `popularity=useful` metric that is so common.

wingo · on March 12, 2012

Much love for lisp machines, but stack machines do not offer any performance advantages over register machines, and actually make optimization much harder. See Ungar's 1993 thesis on the Self 93 compiler for an early realization of this, where he examines what microcode / register windows / etc could do for him, and how he was able to do just as well with registers.

_19qg · on March 12, 2012

Several RISC chips for Lisp Machines were under development. Xerox, Symbolics (Sunstone), University of California (SPUR), had projects for that. The AI winter then killed it. The Lisp Machines then were ported as emulators to ALPHA (Symbolics), SPARC (Interlisp) and other processors.

"Also, note that the Sunstone project did address many of the competitive concerns, especially the continual mention of Sun in this analysis. The Sunstone project included a chip design for a platform meant to run Unix and C, as well as Lisp. It was a safe C exploiting the tagged architecture, for example, to allow checking of array bounds. And the Sunstone project was being produced on-time. But to back up the analysis of Symbolics’ priorities, it was cancelled as we were getting the first chips back from LSI Logic."

javascriptlol · on March 12, 2012

And then we'd have had people writing applications that Lisp isn't suited for complaining at what could have been if only we'd gone with a simple register architecture. Don't get me wrong - I believe in owning the whole stack, and I'd love to see some Lisp machines. I fully believe that something like this may reappear in the future. But let's not kid ourselves: none of these singular visions of simplicity is going to be good enough for everything.

rayiner · on March 12, 2012

It's not necessarily the case that Lisp machines would be the only way. I'd think more like what Intel is doing these days, adding instructions to SSE to speed up things like string processing.