Another revelation: lambda calculus can be reduced down to 4 primitive operations[0].
I had this revelation after a pilgrimage into the land of Binary Lambda Calculus[1], a binary encoding of lambda calculus that represents variables with numerical (de Bruijn) indices[2] in unary.
I was more directly inspired by Lisps, but I do prefer the original M-expressions and the syntactic choices that REBOL and Red make.
I think placing the operator before the opening bracket better emphasizes its special significance and can reduce nesting for constructs like `f[x][y]` (vs. `((f x) y)` in Lisps). Square brackets somehow seem more aesthetically pleasing to me. And there is a practical reason to prefer them, especially if your syntax uses only one kind of brackets -- square brackets are the easiest to type on an average keyboard.
So REBOL-like syntax is nicer. As were M-expressions. They probably didn't catch on, because they were not minimal enough, compared to S-expressions. And maybe because S-expressions were fully implemented first.
What if we could have minimal redundancy and still be capable of error detection?
Check this[0] out:
[
id [0003]
type [donut]
name [Old Fashioned]
ppu [0.55]
batters [
batter [
[
id [1001]
type [Regular]
]
]
]
topping [
[
id [5004]
type [Maple]
]
]
]
Now if we delete `id` we will get a syntax error. And yet no commas, no colons, no quotemarks! Only square brackets. Minimal redundancy.
For a bit more error-checking-thru-redundancy we could analyze indentation (one reason why I recommend C-style rather than Lisp-style formatting) and warn if we detect any inconsistencies.
Ergo: through design magic we can get rid of a lot of the redundancy and increase desirable properties, without trading off much of the positive side.
As long as we are on topic of JSON alternatives suitable for configuration, I've been tinkering with various minimal syntaxes and formats for a long time, most notably Jevko[0].
One format based on that I've been conjuring up recently would represent the first example from the RON README like so:
GameConfig[ optional struct name
window size [[800] [600]]
window title [PAC-MAN]
fullscreen [false]
mouse sensitivity [1.4]
key bindings [
up [Up]
down [Down]
left [Left]
right [Right]
Uncomment to enable WASD controls
;[
W [Up]
A [Down]
S [Left]
D [Right]
]
]
difficulty options [
start difficulty [Easy]
adaptive [false]
]
]
I put a syntax-highlighted version and some more details in this gist[1].
I wonder what you guys think about such a minimal alternative.
> With XML, the complexity is the baseline, and it only goes up from there. With JSON, the complexity is just an option, the baseline is pretty simple.
Very well put. And we could lower the baseline substantially towards simplicity, even from JSON.
It's pretty clear that a lot of people think this way. Some even seriously try to figure out what such a baseline of simplicity would look like.
There are lots of simple indentation-based designs (similar to YAML) such as NestedText[0], Tree Notation[1], StrictYAML[2], or even @Kuyawa's Dixy[3] linked in this thread.
There seem to be less new ideas based around nested brackets, the way S-expressions are. Over the years, I have developed a few in this space, most notably Jevko[4]. If there ever will be another lowering of the simplicity baseline, I believe something like Jevko is the most sensible next step.
Hey, just want to say that BLC is a remarkable artifact -- to me it's an art piece in computational minimalism.
I actually got so obsessed with it last year that I worked out a variant of lambda calculus[0] in which, with some trickery, a port of the BLC interpreter could be squeezed into 194 bits.
Which would be only 2 bits more than the intriguing conjecture from your paper[0] says to be optimal:
> We conjecture that any self-interpreter for any binary representation of
lambda calculus must be at least 24 bytes in size, which would make E close to
optimal.
I wonder what are the assumptions behind this conjecture. Surely my trickery broke some of them.
LAST is an interesting variation, that is in essence identical to the oddly named "Real Fast Nora's Hair Salon 3: Shear Disaster Download" language [1].
Instead of L A S T, it names the 4 tokens LAMBDA APP ONE_MORE_THAN ZERO.
I noticed that using two separate tokens for variable handling allows BLC to interpret LAST in only 193 bits.
Still, I suspect that for most programs, the savings from S-optimization do not quite make up for the (n-1) extra bits needed for every occurrence of variable n. What would for instance be the length of the shortest LAST program for the prime number character sequence, which takes 167 bits in BLC?
> I wonder what are the assumptions behind this conjecture.
I chose 24 bytes because it's a nice round number (3 * 2^3 * 2^3 bits) that sat a seemingly comfortable 14 bit margin below my best effort.
The conjecture assumes a binary input, that must be read bit-by-bit. How long is your LAST self-interpreter with a binary rather than quaternary input?
> The other encodings for variables would certainly increase parsing complexity, so self-interpreters for these BLC variants would be much longer than the original.
Indeed this is the reason why I find those alternatives not too interesting. In most practical programs the variable index frequencies are reasonably well approximated by an exponential distribution for which the very simple unary encoding is optimal.
> In the end, as we relax the restrictions on the encoding of the input, it seems that we can decrease the length of the self-interpreter almost arbitrarily -- down to λm.m(λx.x)(λx.x) [Mogensen] (as pointed out by @sargstuff in this thread) or even λm.m(λx.x) [Brown and Palsberg]
Those are of a very different nature; mine has to tokenize and parse from a binary stream; theirs already has the whole term in a higher order abstract syntax tree.
> While obsessing about this and letting the mind wander I noticed the nice direct correspondence between the quaternary encoding of LAST and the DNA/RNA nucleobases
Yep; I didn't fail to notice that link either:-)
> Now this is a pretty cool coincidence that can provide some individuals with peculiar interests with even more peculiar entertainment, such as going through all possible permuations of AGCT, looking for the longest valid LC program embedded in random mosquito DNA.
They only way to not be valid is to run out of DNA or to not be closed. If you find a long repetition of a single nucleotide and make that L then the latter is unlikely to happen soon...
> Thank you for coherently writing down your thoughts for anybody to build upon what you have figured out. It's fun, it's inspiring, and it works!
Thanks for all the effort in creating and explaining LAST. I found it intellectually stimulating!
Snarky conjecture: 7bit ascii in-ram version of "λm.m(λx.x)(λx.x)" is mxx which if hardware supports bit addressing only uses 21 bits. Smallest ascii lisp is 14bits though.
The article mentions SICP aka the wizard book. In it, there are a lot of metaphors pertaining to magic and spirits, which IMO makes the book much better than if they weren't there. One of the authors of the book, Hal Abelson, famously said:
> There's a good part of Computer Science that's like magic. Unfortunately there's a bad part of Computer Science that's like religion.
This points to perhaps a useful distinction that is to be made here between religion (as in tribalism which leads to holy wars) and mysticism/spirituality/magic (as in deep fascination in search of ultimate truth).
Another example would be Lisp. And yeah generally, fair point.
In Elm though it would cause some ambiguity because of the function call syntax not requiring parentheses. For example:
[foo bar baz]
Is this a List with three elements or is it one element whose value is a function call on foo with the arguments bar and baz? Who knows. Sure you could have implicit commas on line break but that would be weird.
The linebreaks don't change the tree structure, they are inserted purely for readability.
There are no implicit commas or anything.
Each subtree is also potentially a name-value pair. If you insert names before the opening brackets:
[x [foo] y [bar] z [baz]]
You have 3 pairs.
The names themselves are also pieces of text (whitespace is not a separator), so you can have names like:
[my x [foo] your y [bar] their z [baz]]
Still 3 pairs.
Now to turn trees like this into something useful, for example XML/SVG, you'd do a little bit of processing on them, e.g. you could transform the names like so:
Now I intend to write specifications that codify conventions like this into different formats based on this fundamental syntax of square brackets.
It can be useful for all kinds of things. Its advantage is extreme simplicity and flexibility.
BTW, for clarity I have to say that the format that I used here: https://news.ycombinator.com/item?id=32995047 does a bit more transformations -- it actually sometimes treats whitespace as a separator (e.g. in `svg width[391]` space is a separator). That allows for extreme conciseness, but is not necessary and introduces complexity.
I had this revelation after a pilgrimage into the land of Binary Lambda Calculus[1], a binary encoding of lambda calculus that represents variables with numerical (de Bruijn) indices[2] in unary.
Ultimately, 0 and 1 is all we need. ;)
[0] https://jevko.github.io/writing/2023-07-23-revelation.html -- the pun in the article (I'm the author) fits perfectly into the OP ;D
[1] https://tromp.github.io/cl/Binary_lambda_calculus.html
[2] https://en.wikipedia.org/wiki/De_Bruijn_index