> The other encodings for variables would certainly increase parsing complexity, so self-interpreters for these BLC variants would be much longer than the original.
Indeed this is the reason why I find those alternatives not too interesting. In most practical programs the variable index frequencies are reasonably well approximated by an exponential distribution for which the very simple unary encoding is optimal.
> In the end, as we relax the restrictions on the encoding of the input, it seems that we can decrease the length of the self-interpreter almost arbitrarily -- down to λm.m(λx.x)(λx.x) [Mogensen] (as pointed out by @sargstuff in this thread) or even λm.m(λx.x) [Brown and Palsberg]
Those are of a very different nature; mine has to tokenize and parse from a binary stream; theirs already has the whole term in a higher order abstract syntax tree.
> While obsessing about this and letting the mind wander I noticed the nice direct correspondence between the quaternary encoding of LAST and the DNA/RNA nucleobases
Yep; I didn't fail to notice that link either:-)
> Now this is a pretty cool coincidence that can provide some individuals with peculiar interests with even more peculiar entertainment, such as going through all possible permuations of AGCT, looking for the longest valid LC program embedded in random mosquito DNA.
They only way to not be valid is to run out of DNA or to not be closed. If you find a long repetition of a single nucleotide and make that L then the latter is unlikely to happen soon...
> Thank you for coherently writing down your thoughts for anybody to build upon what you have figured out. It's fun, it's inspiring, and it works!
Thanks for all the effort in creating and explaining LAST. I found it intellectually stimulating!