Hacker News new | past | comments | ask | show | jobs | submit login
Zero Feet: a proposal for a systems-free Lisp (applied-langua.ge)
111 points by tosh 8 months ago | hide | past | favorite | 51 comments



Writing your obscure language in a popular language has an overwhelming benefit: simplicity of bootstrapping.

The user can easily obtain an executable C compiler from their distro or whatever. With that, they build your language.

If your language is written in itself entirely, then the user needs an executable of the compiler for your language.

There is a second benefit: C is an external language. Nothing that you do in your program changes the C language. If your language is written in itself, and you make changes to it, you're changing the language in which your language is written. Mistakes in this can cause hair-pulling bugs.

Bootstrapping with another language like C doesn't have to mean that anything remains that is written in C in the boostrapped product!

A self-in-self thing can be turned into a C boostrapped thing by simply adding an independent, minimal implementation of that language written in C that the user can compile and run, which is capable of running the real self-in-self compiler to compile that compiler and the run-time. Once that is done, the real compiler can be used to again compiler the compiler into a faster compiler and the run-time into a faster run-time. After that, the C boostrapper is not used for anything, and there is no evidence of it in the product.


While nothing you said is technically wrong, my experience has been that languages that force themselves to dogfood are generally much more pleasant to use than those that are not, both as languages and as tooling. I strongly feel this is not a coincidence. Today's "ease of bootstrapping" is 3 years from now "incoherent library hell".


It is fiendishly hard to disentangle cause from effect here. The sort of person to implement a dogfooded language is likely also the sort of person to care deeply about the quality of that language.

It may be that just being in a position to use the language a lot (because you're implementing it) and to change that language (because you're the one building it) is enough to get the effect you've observed, but I wouldn't be at all surprised if you would only start down that path in the first place if you already had strong opinions about what a good language looks like.


Another counterpoint is the mismatch between the intended use of languages. Languages are designed for many types of goals. The language you’re building should be tested against the goals it’s designed to achieve.

Whereas, some languages are really good at making compilers. Then, some tools are specifically designed for program transformation and analysis. Using those languages can dramatically improve your compiler which might indirectly increase the number of goals the other language hits.


I'm curious about specific examples you are referring to, going in either direction. Off the top of my head, Java is not implemented in itself yet has perfectly fine tooling. Whereas Haskell is implemented in itself and its tooling leaves much to be desired (despite my love for the language overall).


I'm curious too. Among the major language implementations I can think of, almost every single one is dogfooded as much as it can be. In other words, dogfooding is almost always derivative of other technical decisions, not an independent factor. Which makes it hard to distinguish correlation from causation.

Fully dogfooded: C++, Rust, Go, D, Zig, [Java with Graal or Jikes], [Python with PyPy]

Not dogfooded: JavaScript, Python, Ruby, Lisp, PHP, Lua, Swift

Compiler is dogfooded but VM/runtime is not: Java, C#, Haskell, Kotlin Native

The compiler (i.e. whatever parses source files and generates either native code or bytecode) is almost always written in the language itself, except for purely interpreted languages where that would be too slow.

If there's a VM or runtime, it's written in C or C++ if the language is slow; it's written in the language itself if the language aims to be as fast as C; and for the many languages in the middle, it varies.

The biggest exception I can think of is Swift, which is mostly not dogfooded, although it's becoming more so. And even that is explainable by other design decisions. Like C++ or Rust, Swift requires the compiler to do a lot of work to compile it properly. Like Java or Haskell, Swift is really slow at runtime. The combination of those would be deadly for a dogfooded compiler.

Besides that, C and Objective-C are technically exceptions because the most popular compilers for them are all written in C++. But I didn't count them because C++ is in the same language family.


Lisp isn't one language like Ruby or PHP; it is a family.

Some members of that family are fully self hosting. E.g. to build SBCL, you need an implementation of Common Lisp. It doesn't have to be SBCL.


Compiler and runtime are partially dogfooded now in .NET:

- CoreLib is entirely C# (let alone anything outside it)

- Low-level facilities like thread-locals, instead of C++, are now C# + compiler idioms + C# to unmanaged exports mostly

- Type system facilities have more of their code moved to C# as well

- Memmove/memcpy and other performance-sensitive paths, that are part of CoreLib, are either pure C# using its SIMD API or special-cased in compiler (like unrolling of copies, comparisons (including text) of constant lengths and similar last-mile optimizations)

In addition, ILC (IL AOT Compiler) is fully written in C#, although uses the same back-end as JIT.

JIT could have been reasonably written in C# now that there is NativeAOT but would require more work (also considering that it's a GC-based language), apparently it was considered at some point but ultimately there isn't just capacity to do so - .NET compiler team is fairly small.


> If your language is written in itself, and you make changes to it, you're changing the language in which your language is written. Mistakes in this can cause hair-pulling bugs.

Fair, but, hold on, shouldn't you just make sure that if you're working on version 1.2 of your language, you compile it using your compiler from a stable 1.1 build? (other than maybe in tests where you need to test new features etc)

This is not a problem specific to programming languages, but to any other system that uses part of itself


That does introduce the possibility of a Trusting Trust attack where the original C program could compile the compiler or runtime with a vulnerability that persists until the final compile. Or a similar vector to the xz attack where everything looks completely normal, except for this old script that was subtly changed and if you bootstrap from c you are vulnerable but if you bootstrapped from the language source itself you are good.


>but if you bootstrapped from the language source itself you are good

No, that’s exactly what Ken Thompson was describing in Trusting Trust. You always have to bootstrap the compiler, but once you do, you use the compiler to compile itself and all future versions. So you use that circularity to hide the evidence of a back door in the compiler. It applies to all bootstrapped languages.


You cannot get around the Ken Thompson problem because all bootstrapping starts with some trusted binary that compiles your level 0 sources.

Now, sure, if your chain does not involve C, then it is immune to a compromised C compilers; how could C compilers do anything to it, that are not installed or run?

However, you're vulnerable to a doctored binary of your language.

Bootstrapping your Language X from C helps you here also, for multiple reasons.

Reason 1: there is no recursive cycle when you externally boostrap from another language. Even if a C compiler is doctored in order to recognize that it's boostrapping Language X, and do something malicious, that malicious thing will not propagate to new Language X installations. Reason being that the build output of Language X is not used for further bootstrapping new Language X instances. The other language is always used for boostrapping. All proliferation of the hack has to propagate through the ecosystem of that other language.

Reason 2: Language X is less popular than the boostrapping language, which makes its ecosystem vulnerable. If a vulnerable language with a tiny ecosystem is boostrapped using ready-made binaries of itself, those binaries are likely only available from one site. If those are infected, it's game over.

Reason 3: Language X development does not develop the host language (such as C). C is not of interest to the Language X project, and can be conservatively used, so that Language X will build fine with a 20 year old GCC. Even if a bad actor attacks Language X by getting malicious code upstreamed into GCC, that attack won't appear in old versions of GCC. (By the same token, Language X written in Language X could also use a conservative dialect of Language X for boostrapping, so that a many-years-old implementation of it with a widely known SHA-256 can be used.)


> You cannot get around the Ken Thompson problem

I thought Dr. David A. Wheeler's 2009 doctoral dissertation "Fully Countering Trusting Trust through Diverse Double-Compiling" had been generally accepted as proof that you could?

https://dwheeler.com/trusting-trust/


Yep I slipped up there.

There is a defense against the trusting trust attack. It basically involves using two different compilers and checking their output matches. You can find the article about it if you search.


> However, you're vulnerable to a doctored binary of your language.

I think obscure language X being written in language X is more of a security benefit because it is more likely that a much greater portion of those who use language X are better able to recognise weirdness than if there are parts of your computing environment that are essentially blackboxes.


I’ll add one more benefit to using languages like C or Java for output or building. That is their tooling ecosystem. Almost all research in static analysis, code generation, etc was for these languages. Using them means you can use all their tooling. Talent, too, for contributions and reviewers.


Could be a good idea to maintain a very simple stable reference C implementation for bootstrapping and testing purposes while later developing a more complex one in the language itself.


So, you are arguing that It's easier to download the source and compile it yourself than downloading the compiler?


The goal is to boostrap the compiler. So we need two things: the source code, and the compiler which understands it.

If the compiler which understands it is in some widely used language, things are easier because we likely already have it. If we don't have it, it's likely nicely packaged.

We also likely have it on a platform where that language we are trying to build has not been ported. That gives us a big head start in porting.


If the compiler exists only as a binary, and you cannot compile it yourself, how can you be sure what the compiler is actually doing?


As proven by Ken Thompson, unless you are a security guru, having source code won't help.


> A more interesting problem is that, if almost every construct in the language is a message send, and all of the implementation is written in the language, handling a message send may cause infinite recursion.

This is where I suggest reading Ian Piumarta's Open, extensible object models [1] and later his COLA paper [2]

Piumarta tried to imagine an entire late-bound, meta-circular system based around message passing, starting from as little C as possible to build a Smalltalk-like language. It is a small paper that had be obsessed for a year trying to imagine how to implement an entire OS based on it. Worth a read.

And in the theme of meta-circular Lisp, he also worked on Maru [3]

1: https://www.piumarta.com/software/id-objmodel/objmodel2.pdf

2: https://www.piumarta.com/software/cola/

3: https://www.piumarta.com/software/maru/


Do you have any idea why the final product of STEPS didn't actually use COLA? I assume there was some difficulty that there wasn't time to work around, but I haven't found what it was.


If I understand the point of this, I believe Coral Common Lisp (now Clozure CL) has always been this way, if not more so. The lowest level is a Lisp program generating native machine code. [0] There’s no lower-level implementation language or even a bytecode translator.

[0] https://github.com/Clozure/ccl/tree/master/level-0/ARM


Interesting. Although I think the author advocates for going a step further than CCL. My understanding is that the GC of CCL is implemented by its lisp kernel [0]. The lisp kernel is a platform specific C / ASM program which seems to provide a runtime for CCL. The author states that the GC should be written in the language itself.

[0] https://github.com/Clozure/ccl/blob/master/lisp-kernel/gc-co...


Ah, so there is some C! Thanks for the correction. I haven’t worked on it since 68K CCL days (like, 1990) and I’m almost positive there was no C in that version, just inline assembler in Lisp. It was awesome.


I really hope this won’t be taken the wrong way, but why are these people always fighting the last war? Brendan Eich was already doing this in the 90s when developing JavaScript.

I disagree with a lot of what Jonathon Blow says, but he’s correct when he said that even the most trivial of projects are regarded with awe, as if they’re some insane and innovative skunkworks project. There are problems that need solving, that are in dire need of solving, but nobody seems to care about them. What we end up with is more and more bloated frameworks and languages that nobody uses after a week.


Because it's fun. Not everything is a war. I just want to make a language that I would enjoy using. I don't exactly expect it to reach the TIOBE top 10.

There is an infinite number of problems out there that "need" solving. The thing is I don't really care about those problems. They'd have to pay money to get me to even think about them. These "trivial" projects though? Those provide me with plenty of intrinsic motivation. I like working with the unseen infrastructure that everybody takes for granted.


What was Brendan Eich doing that is covered in this article?


LISP in the browser, although he used Self instead of Newspeak for objects and Scheme as the blueprint for LISP. For systems language LISP, there are already many languages available for BEAM, like LFE and Elixir.


This isn't about Lisp in the browser though, it's about high-level low-level programming (to borrow the name from Daniel Frampton <https://www.steveblackburn.org/pubs/theses/frampton-2010.pdf>).


What do you consider in dire need of solving?


webGL and WASM for one thing. Of course it is dependent on the implementation, but it seems to have abysmal performance.

In line with that, there needs to be a good FOSS replacement of Unity. I know Godot exists but more open game engines will help.

Those are just a few off of the top of my head.


The problems that need "solving" in this space are 99% politics. Making something become a standard and/or popular is a fundamentally different task from making something good.

The answer to your earlier question, of course, is that—to some people, anyway—trying to make something good is far more satisfying than trying to make something popular.


WASM performance seems mostly fine for most applications [1]. Certainly better than Python or Ruby. Maybe running in the browser is a bit of a different story, given that it has to go through JS to manipulate the DOM, but I don't think this is a problem that can be solved with more developers.

[1] https://00f.net/2023/01/04/webassembly-benchmark-2023/


“… all human progress depends on the unreasonable man.”

What’s old can easily become new again if a technology change in one corner of the system propagates outward, modifying the constraints & requirements on other components.


Wasn't Symbolics' Genera OS fully Lisp all the way down?


The MIT/LMI/TI Lisp Machines were not, they ran a simple RTOS written in "assembler" that checked for interrupts from devices and woke up Lisp threads to handle the event. The assembler code also interpreted the instruction set of the VM.

Someone could write a modern lisp that sat on top of a Xen unikernel that would work in a similar way, the compiler could generate native instructions though.

I think you would need to examine the Symbolics microcode carefully to categorize what it was doing.


Yes, it was. You can still read the sources for most of it if you know how to unpack them.


Sort of. Some ops that a typical lisp would have in the std lib it has in hardware, though.


some ops a typical C has in the stdlib are in hardware


Probably not things like singly linked list ops, though.


some machines had a lisp-friendly CPU even


Yes, definitely.


It seems fairly normal to write lisp compilers in lisp, you tend to see them wrapped in a c shell sometimes, and to use c to talk to the system for io. It also seems easier to write the garbage collector in c than write it in the language being garbage collected. But .. take a typical lisp compiler and generate a standalone exe, possibly reimplement the gc, at least unlike other languages you still have the compiler, and can reimpliment the c wrapper in your native ffi. Job done I think.


Not sure I understand all the ideas presented but it was an interesting read, thanks!


Why "feet"?


The Smalltalk special of Byte Magazine infamously has a hot air balloon floating away from an island with ivory towers on it. Flying things measure their altitude in feet, but the low level is low. Furthermore, according to Tubeway Army, bombers fly at zero feet <https://www.youtube.com/watch?v=GAj89NEqX-M>.


C isn't magic, and "systems programming language" is a Rust buzzword with little meaning. C is just the native language of Unix, and you keep needing to use it if and only if you keep wanting to run things on Unix. (The system is really C-and-Unix, not one of C or Unix.)

Nothing high-minded is needed, you just implement what you implement. Operating systems (including Lisp ones!) existed before C-and-Unix and they'll exist after C-and-Unix (haha just kidding there is no "after C" or "after Unix").

(Offended Rust fans, please note that your pet language is just as free to escape Unix, or not, as any Lisp ever was.)


Rust deliberately abandoned using the "systems" word many years ago, just so you know.


Unfortunately, you can't put a meme back in the bottle, no matter how hard you try. There are people who are not yet born who will say things like "Rust is a blazingly fast rocketship emoji memory safe systems programming language", just like there are people currently living who say inanities like "C is close to the metal".




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: