Lisp System Implementation

kerkeslager · on June 1, 2019

If you're interested in this, Lisp In Small Pieces is also a good resource.

shaunxcode · on June 1, 2019

Everything this guy writes or works on is worth consuming. Looking forward to reading this!

pvitz · on June 1, 2019

I totally agree with you. I bought his book on writing a compiler (t3x language) and although the code and text could be more educationally structured, it certainly conveys how easy it can be to implement a simple compiler.

macoovacany · on June 1, 2019

The free pdf preview seems to consolidate some of the concepts from the register machine chapters of SICP.

new4thaccount · on June 1, 2019

This author is a treasure. I've been planning on reading his book on Klong (array language like J, K, and APL) for awhile now.

I absolutely love the minimalist website design which reminds me of Zork and old terminals.

Annatar · on June 1, 2019

Started reading the sample text but as soon as I read abstract machine it was all over: we have suffered far too much and far too long under inefficiencies and distribution nightmares of abstract machines like that of Java, .NET or Python: if I cannot compile it straight to a machine code executable, I want nothing to do with it. Intermediate representation: just say no.

geocar · on June 1, 2019

> if I cannot compile it straight to a machine code executable, I want nothing to do with it.

Pretty much every modern implementation of every programming language has an intermediate representation -- GCC has one[1], LLVM has several [2,3], and unless you're programming in Forth (which notably generally doesn't have an intermediate representation), you're probably already happy with a language with an intermediate representation of some kind and just don't know enough to appreciate it.

The idea that "compile[d] straight to a machine code executable" is a line in the sand worth squat is just ignorant.

[1]: http://gcc.gnu.org/onlinedocs//gccint/RTL.html

[2]: https://llvm.org/docs/LangRef.html#type-system

[3]: https://llvm.org/docs/BitCodeFormat.html

Annatar · on June 1, 2019

[flagged]

dang · on June 1, 2019

I know how easily internet comments land on sensitive feelings—it happens to me every day—but this reply breaks the site guidelines. It takes the thread in an off-topic, unsubstantive direction, and it doesn't do what we ask here: "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith." (https://news.ycombinator.com/newsguidelines.html) Even though it felt like geocar was lecturing you, the stronger plausible and good-faith interpretation was to assume otherwise. In that case the discussion could have continued to exchange substantive information and readers could have learned more.

Strangely enough a similar thing happened in an unrelated thread yesterday: https://news.ycombinator.com/item?id=20062560.

reitzensteinm · on June 1, 2019

I'm sorry, but people with real expertise view everything as a complicated engineering trade off that's highly dependent on context.

That you're willing to jump in and unilaterally declare a common strategy used by hordes of clever people as a Bad Idea pointing only to your years of experience as proof probably says more about you than it does about the merits of the technique in question.

Annatar · on June 1, 2019

I sure hope it says something about me. And just for the record, IT is this terrible cesspit of reinvented wheels by hordes, so yeah, hordes are not right. Never will be.

IshKebab · on June 1, 2019

Calm down, it wasn't a lecture. He was just pointing out that compilers nearly always have an IR.

Annatar · on June 1, 2019

I know that. It's been that way for years. His lecturing was really upsetting.

chrisseaton · on June 1, 2019

So what do you propose instead of using an IR? No serious compiler for decades has been without an IR, going straight to machine code. Do you have some new idea? Or do you think we should go back to syntax-directed translation? I'm not sure that's tractable - performance would be terrible.

Annatar · on June 2, 2019

I think the two-pass assembler method is the way to go. With hand-crafted keyword to mnemonic translation tables.

nils-m-holm · on June 1, 2019

There are some well though-out abstract machines out there that are quite efficient and map pretty directly to machine code, like the SECD or the WAM (for PROLOG).

The LISP9 Abstract Machine, which is described in the book, is very close to the metal. So much so that you might as well emit machine code instead. The approach is outlined (but not described in detail, as you noted) in the last chapter.

Then abstract machines have one huge advantage and that is portability. I can just compile the system on x86, ARM, SPARC, MIPS, Alpha, or whatever and it will simply run. So whatever the reader has, the code will work. (I acknowledge that pretty much everybody uses x86 today, but still...)

SomeHacker44 · on June 1, 2019

Counting general purpose CPUs only, I have so many in my house it isn't even funny. 6502, Z80, MIPS (64-bit), Alpha, SPARC, Ivory, 680x0, x86(_64), ARM (32/64), PowerPC and a lot of variants, and some I do not even know offhand (Xerox, TI). I am actually ashamed to have no RS/6000, PA-RISC or Itanium to go with it.

So... Thank the world for IR, ByteCodes and VMs.

Annatar · on June 1, 2019

What does owning all these processors have to do with bytecodes and VM's? What kind of a statement is that?

new4thaccount · on June 1, 2019

I assume he means that the intermediate code allows for easier abstractions to support all those different processors instead of writing a specific machine code compiler for each one.

FullyFunctional · on June 2, 2019

SECD is _not_ a quite efficient abstract machine, by a long shot. Something a lot closer to the state of the art would be Leroy's modern version of his Zinc or, far more extreme, GRIN by Boquist (there are several implementation and a new one under way).

Annatar · on June 1, 2019

"So much so that you might as well emit machine code instead."

Exactly, thank you!!! So why the hell would you screw around with bytecode!?!?!

I don't want portability, my source code is portability! I want the fastest possible machine code translation for the smallest possible program!!! That "portability" thing with bytecode makes me absolutely livid!!!

avmich · on June 1, 2019

Remember, neither translation to machine code is guaranteed to be faster than translation to a VM bytecode, nor execution of such a code is guaranteed to be faster than execution of bytecode.

Annatar · on June 2, 2019

The machine code has to be faster because the intermediate to machine code step, which impacts performance at least once is eliminated.

mrighele · on June 1, 2019

Abstract machine doesn't necessarily means that your code will be interpreted. An intermediate representation can just be a practical way to reason about low level code (well, the fact itself that is called "intermediate" gives an hint that it is not necessarily the "final" representation).

Many lisps in fact generate native code, in Common Lisp you can even use "disassemble" to get the native code generated for a function

pfdietz · on June 1, 2019

And many lisps that generate native code go through an intermediate representation. SBCL has two, simply named IR1 and IR2.

Also, even an interpreter of "intermediate code" can (dynamically) translate that intermediate code to native code. That's how many JVMs work, for example.

0815test · on June 1, 2019

This is one of those comments that the remark "not sure if serious..." was made for. You do know that C has an abstract machine, too? The C standard tells you all about it. Please don't troll or flame bait so blatantly, it's against the commenting guidelines here.

ColinWright · on June 1, 2019

It's interesting to read their profile[0], submissions[1], and other comments[2]. It certainly provides some context:

[0] https://news.ycombinator.com/user?id=Annatar

[1] https://news.ycombinator.com/submitted?id=Annatar

[2] https://news.ycombinator.com/threads?id=Annatar

moocowtruck · on June 1, 2019

I find ppl who have view points like these interesting. Even if I don't agree with most of their points I like reading them sometimes; Gives an interesting perspective of how some people think and once and awhile there is some decent points.

Take for example their comment on distributing your product as docker container, forcing your customer to infrastructure they may not have. Many people may not think of it that way but it's definitely a valid concern.

I dunno the world is a weird place :D

Jach · on June 2, 2019

I had the same thoughts reading the book's preface and exploring the author's site (I had to apply a CSS override, my eyes can't seem to handle green on black for long anymore). And this thread is another instance. Most of me is just fine with the JVM's performance and I sometimes have to correct naive coworkers concerned that splitting up a method for testability and readability is going to be a performance problem -- due to the way the JIT works it may very well improve performance.

But part of me agrees and wants things to go further, since "machine code" is just an abstraction on the microcode. x86 is too high of a layer for what's actually going on in modern CPUs and I hate that to write the most optimal code by hand you have to layout the assembly in a way that coerces the CPU to do what you want at the lower layer (and coerce the compiler to layout the assembly that way too if you aren't working at the assembly layer). At least FPGAs offer salvation to mostly do what you want without lower interference (but there are no "pure" FPGAs, i.e. they actually contain dedicated hardware like DSP slices they can use to be competitive for common tasks rather than just pure programmable gates), but then you're going to be fairly application specific instead of having something more general purpose with lower level hooks or a meta protocol that can give you choices on the tradeoffs instead of being forced to use the one the creators chose. I guess the complaint is that there's just lots of performance left on the table that can't be captured generally.

Annatar · on June 2, 2019

The abstract machine in C is a concept, not an actual machine. The entire C compiler toolchain is designed to translate that abstract machine into machine code suitable for the target.

kryptiskt · on June 1, 2019

I don't think that's the end goal: "In the final part of the book, compilation of LISP-N, compilation to stand-alone binaries, and compilation to native machine code are outlined."

Annatar · on June 1, 2019

I read that. Outlined means how one might go about it, but doesn't provide details. That's why it's an outline.

Why spend the entire book on tokenization circa 1981 when what's really needed are compact, fast, executable machine code binaries?

avmich · on June 1, 2019

Because there are often more compact, faster executable virtual machine bytecode binaries?

Annatar · on June 2, 2019

That is theoretically impossible: a bytecode VM must perform machine code generation and it must do so on the fly or pre-assemble the code, both of which impose a tremendous performance penalty. That's why Java is so slow, for example.

avmich · on June 2, 2019

Just-in-time compilation can use for compilation data available right before code execution - for example, results of previous loop executions. This data can be unavailable during static compilation time, so static compilation can theoretically produce inferior code.

Annatar · on June 3, 2019

Just in time compilation is still slower overall because you incur the assembler penalty. Another problem with just in time compilation is that one cannot control when and how many times it happens. I'm sick and tired of waiting for JIT every time I run a program. JIT is an extremely stupid idea.

I can in fact compile with profile collection, both Sun Studio and hp compilers support that for decades. Then I run the representative workload and then recompile while feeding the compiler with the collected data.

avmich · on June 3, 2019

Just-in-time compilation is supposed to save you time when compilation time is less than savings of execution time - comparing to static compilation. Can you prove that JIT savings can never happen, or don't happen on average? And even if you do profile collection, JIT will adapt to changing load - think about long-running services which have variations in request patterns frequencies and data - while static compilation can't do that.

I'm not saying that JIT is always better. It seems to me there are cases for that though.

Annatar · on June 4, 2019

Can you prove that binaries compiled with profile feedback are slower? No, I did not keep track of JIT but it is a very raw wound or I wouldn't have commented.

I want machine code executables. No JIT. Not even in the theoretical case where it might be faster long term, first because those cases are practically non-existent: daemons don't usually crunch numbers but JIT versions consume inordinate amounts of memory and second they don't actually provide portability: take any audiovisual demo and try to run it on all the platforms which the VM supports: I want to know how many will play sound and how many will correctly and quickly display graphics.

JIT and VM's are a lie.

JohnStrangeII · on June 1, 2019

What about bootstrapping by writing an interpreter for language X and then implement a compiler in X? That used to be common practice and has many advantages, such as easy targeting of new systems.

Maybe with tools like LLVM it has become a bit less common to do all that on your own, but not everybody wants to use LLVM and spitting out your own machine code directly, without even having a bytecode interpreter, still seems like a rather bad idea. Besides loss of retargetability, the generated machine code will never be fast without an IR, because it will basically just allow for peephole optimizations. I know one or two ingenious people who compiled to x86 assembler, because it was considered the fastest option then, and now regret that choice.

pfdietz · on June 1, 2019

It's not a problem that comes up very often. It's only when a language is first invented. After that, you just compile with older versions of the compiler.

chrisseaton · on June 1, 2019

Abstract machines and just-in-time compilers make languages like Java more efficient, not less. Compilation direct to machine code for many languages that are more dynamic produces inefficient code because it has to be pessimistic. Even C can sometimes run faster on an abstract machine (concrete example is inlining through function pointers at runtime when they can’t be at compile time.)

Compare the efficiency of the same Java machine code compilers in AOT mode to them in JIT mode - JIT is more efficient for anything beyond short-living programs.

That doesn't address your concern about distribution - so that does still remain.

Annatar · on June 2, 2019

"Abstract machines and just-in-time compilers make languages like Java more efficient, not less."

More efficient when compared to itself, but still extremely slow and inefficient compared to a straight machine code binary executable.

avmich · on June 1, 2019

You can always include abstract machine as part of native executable. Sometimes (JIT compilation comes to mind) it leads to superior performance compared to "classical" native code. Thus, both some inefficiencies and some distribution nightmares (you only need to write abstract machine once for a given platform) will be mitigated.