I think this is close, but sort of missing the point: it’s possible to extend Common Lisp to take code in the form of Python source code (clpython), JavaScript source code (cljs) or many other textual syntaxes: originally, the iconic s-expression syntax of lisps was intended to be a sort of IR dump for m-expressions.
What makes CL homoiconic has nothing to do with the textual syntax, but rather the existence of a function READ that takes _some_ textual syntax and produces the code to be evaluated in the form of the language’s ordinary datastructures: EVAL and friends consume a representation of the code that is neither just a blob of bytes nor some exotic AST objects, it’s lists, hash-tables, strings, vectors, and all the types of things programmers manipulate every day.
The implication of this is that intercepting and modifying the code between READ and EVAL doesn’t really require any special knowledge: you have to know the semantics of your language, and how to manipulate its basic datastructures, but you don’t need to understand any special “metaobjects” for representing code.
Dylan did not die because of any technical deficiency. Dylan died because Apple management decided to kill the project and so it never achieved critical mass.
Programming languages never live or die on their technical merits. Javascript is a hot mess, and yet it is one of the most popular languages in the world.
Also:
> macros become much harder once you move away from leading parenthesis
That is not really true. You do lose the backquote syntax, but that is only useful for writing the simplest most trivial macros. That turns out to be the kind what most people write and so backquote is a big deal, but if you want to write more complex macros (e.g. cl-who) then backquote becomes less useful and the surface syntax becomes less relevant.
> The point is that macros become much harder once you move away from leading parenthesis.
This is... an extraordinary claim. Prolog uses foo(hello, world) syntax (note the comma) and has no problem with macros or any other manipulation of such terms. Note that there are longer, better reasoned, and higher voted answers to that StackOverflow question than the accepted answer which opts for "macros would become more difficult" FUD.
If it parses to primitive data structures (and actually produces a structure) and allows you to intercept the code between parsing and evaluation/compilation, this is exactly what I’m saying.
I think your best insight here is the importance of the "lid off" intermediate representation; plenty of languages have an eval function of some sort, but it does both READ and EVAL in your terms with no intermediate access to the parsed representation.
The issue I’ve had with things like that is that I have to learn a whole new API for manipulating syntax: while there are definite advantages to this, representing code as everyday types that very generic functions can operate on has the advantage of making metaprogramming look like normal code. e.g. if I’m doing try...finally... stuff alot, I can right a macro that transforms:
The backtick/tilde notation isn’t macro-specific: it’s a generic way to template readable datastructures that I can use anywhere, so when I see it in a macro, it isn’t some strange API to learn, it’s just a handy way to build up the datatypes I use all the time.
Nope. The homoiconity of Common Lisp and its related ancestor Lisps is based on the fact that the source code you compile is not textual. Yes, there is CL:READ. Notice however that CL:COMPILE and CL:EVAL does not take text. It takes data structure that is also the datastructure manipulated with macros and which you can easily operate upon. Which can be easily serialized in the form of S-Expression.
Wouldn't a parser library that returns a data structure that is the one passed to the equivalent of EVAL and also the one acted on by macros fit this definition? And then also the ability to easily serialize it back into textual syntax? If so, this doesn't seem to require S-Expressions or lisp, but rather, like the parent comment said, just a good macro and parser library design. (It is true that languages haven't generally seemed to prioritize this feature, I'm just saying that I don't see any reason it must be fundamentally unique to lisps.)
Think of it more in terms of how the language is specified. Common Lisp is specified in terms of data structure, not text, and you can depend on that particular data structure and manipulate it etc.
a 3rd party library that, to be portable, needs to ultimately serialize to text, does not support homoiconicity.
Is it really specified in terms of data structure? Are there not rules in the language specification regarding how s-expressions are parsed? Could I create a, for instance, C-like syntax that parses to and can be serialized from this data structure specification, and call that a valid Common Lisp? If so, neat! I do think more languages should be less defined by their concrete syntax.
Yes - the standard does cover "Common Lisp Reader", but it's essentially a self-contained chapter - every special form, standard macro etc. is defined in term of the data structures. So what you'd have is, at most, an extension - to be compatible with Common Lisp, CL:READ would still need to read S-Expressions with standard readable, and CL:WRITE write S-Expressions, but nothing stops you from adding extra reader that uses a different syntax.
Nope. It’s not about “parsing”, it’s about representation.
Languages such as Python and C draw clear distinction between literal values on one hand and flow control statements and operators on the other. Numbers, strings, arrays, structs are first-class data. Commands, conditionals, math operators, etc are not; you cannot instantiate them, you cannot manipulate them.
What homoiconic languages do is get rid of that (artificial) distinction.
Lisp takes one approach, which is to describe commands using an existing data structure (list). This overloading means a Lisp program is context-sensitive: evaluate it one way, and you get a nested data structure; evaluate it another, you get behaviors expressed. The former representation, of course, is what Lisp macros manipulate, transforming one set of commands into another.
Programming in Algol-descended languages, we tend to think algorithmically: a sequence of instructions to be performed, one after the other, in order of appearance. Whereas Lisp-like languages tend to encourage more compositional thinking: composing existing behaviors to form new behaviors; in Lisp’s case, by literally composing lists.
Another (novel?) approach to homoiconicity is to make commands themselves a first-class datatype within the language. A programming language does not need swathes of Python/C-style operators and statements to be fully featured; only commands are actually required.
I did this in my kiwi language: a command is written natively as `foo (arg1, arg2)`, which is represented under the hood as a value of type Command, which is itself composed of a Name, a List of zero or more arguments, and a Scope (lexical binding). You can create a command, you can store it and pass it around, and you can evaluate it by retrieving it from storage within a command evaluation (“Run”) context:
R> store value (foo, show value (“Hello, ”{$input}“!”))
R>
R> input (“Bob”)
# “Bob”
R>
R> {$foo}
Hello, Bob!
Curly braces here indicate tags, which kiwi uses instead of variables to retrieve values from storage. (Tags are first-class values too, literally values describing a substitution to be performed when evaluated.)
..
When it comes to homoiconicity, Lisp actually “cheats” a bit. Because it eagerly (“dumbly”) evaluates argument lists, some commands such as conditionals and lambdas end up being implemented as special forms. They might look the same as every other command but their non-standard behaviors are custom-wired into the runtime. (TBH, Lisp is not that good a Lisp.)
Kiwi, like John Shutt’s Kernel, eliminates the need for special forms entirely by one additional change: decoupling command evaluation from argument evaluation. Commands capture their argument lists unevaluated, thunked with their original scope, leaving each argument to be evaluated by the receiving handler as/when/only if necessary. Thus `AND`/`OR`, `if…else…`, `repeat…`, and other “short-circuiting” operators and statements in Python and C are, in kiwi, just ordinary commands.
What’s striking is how much non-essential complexity these two fundamental design choices eliminate from the language’s semantics, as well as from the subsequent implementation. kiwi has just two built-in behaviors: tag substitution and command evaluation. The core language implementation is tiny; maybe 3000LOC for six standard data types, environment, and evaluator. All other behaviors are provided by external handler libraries: even “basics” like math, flow control, storing values, and defining handlers of your own. Had I’d tried to build a Python-like language, I’d still be writing it 10 years on.
There are other advantages too. K&R spends chapters discussing its various operators and flow control statements; and that’s even before it gets to its stdlibs. I once did a book on a Python-like language; hundreds of pages just to cover the built-in behaviors: murder for me, and probably not much better on readers.
In kiwi, the core documentation covering the built-in data types and how to use them, is less than three dozen pages. You can read it all in half an hour. Command handlers are documented separately, each as its own standardized “manpage” (currently auto-generated in CLI and HTML formats), complete with automated indexing and categorization, TOC and search engine. You can look up any language feature if/when/as you need it, either statically or in an interactive shell. Far quicker than spelunking the Python/C docs. A lot nicer than Bash.
Oh, and because all behaviors are library-defined, kiwi can be used as a data-only language a-la JSON just by running a kiwi interpreter without any libraries loaded. Contrast that with JavaScript’s notorious `eval(jsonString)`. It wasn’t created with this use-case in mind either; it just shook out of its design as a nice free bonus. We ended up using it as our preferred data interchange format for external data sources.
Honestly, I didn’t even plumb half the capabilities the language has. (Meta-programming, GUI form auto-generation, IPC-distributable job descriptions…)
..
Mind, kiwi’s a highly specialized DSL and its pure command syntax makes for some awkward reading code when it comes to tasks such as math. For instance, having to write `input (2), + (2)` rather than the much more familiar `2 + 2`, or even `(+ 2 2)`. Alas it’s also proprietary, which is why I can’t link it directly; I use it here because it’s the homoiconic language I’m most familiar with, and because it demonstrates that even a relative dumbass like me can easily implement a sophisticated working language just by eliminating all the syntactic and semantic complexity that other languages put in for no better reason than “that’s how other languages do it”.
More recently, I’ve been working on a general-purpose language that keeps the same underlying “everything is a command” homoiconicity while also allowing commands to be “skinned” with library-defined operator syntax to aid readability. (i.e. Algebraic syntax is the original DSL!) It’s very much a work in progress and may or may not achieve its design goals, but you can get some idea of how it looks here:
Partly inspired by Dylan, a Lisp designed to be skinnable with an extensible Pascal-like syntax, and also worth a look for those less familiar with non-Algol languages:
> it’s lists, hash-tables, strings, vectors, and all the types of things programmers manipulate every day.
Except the lisps with good error messages don't really use those things without tagging on the file and line number where they came from. You end up working with "syntax objects" or something instead.
TXR Lisp keeps a weak hash table which associates objects in the syntax with file/line information, without changing their representation in any way.
There is no need for ugly, cumbersome syntax objects that destroy Lisp.
Whether or not the recording of source loc info is enabled is controlled by a special variable. It is disabled by default, because you probably don't want that overhead if you're calling the reader for reams of data. Functions like compile-file and load enable it locally.
To take advantage of the info for error reporting in a user defined macro, you can simply do this:
(defmacro mymacro (:form f arg1 arg2 ...) ;; get the form using :form
(when (somehow-no-good arg1) ;; something wrong with arguments
(compile-error f "first argument ~s is no good!" arg1))
...)
You get an error which mentions mymacro, and the location.
(foo.lisp:10) mymacro: first argument 42 is no good!
This is interesting and clever - thank you for sharing it. However, it seems like the hash table would work on either the address or the shape of the code, and in either case you could build a pathological case which would break it. It's probably a really nice solution for the common/realistic case though.
Racket does this, I personally have no issues with the error messages in Common Lisp implementations, that generally implement this sort of thing by generating special source forms that are just lists wrapping other parts of the program.
Sort of, but I think the third bullet point is irrelevant: the correspondence of the visual syntax to the data structure representation isn’t really essential to homoiconicity. The alternative syntax in http://readable.sourceforge.io is still homoiconic, despite being a somewhat different syntax for the same program.
EDIT: If anything, I’d define homoiconicity by the ability to write the same program (as EVAL sees it) in a variety of different textual formats: e.g. ‘foo is equivalent to (quote foo) and both programs are indistinguishable to EVAL
What makes CL homoiconic has nothing to do with the textual syntax, but rather the existence of a function READ that takes _some_ textual syntax and produces the code to be evaluated in the form of the language’s ordinary datastructures: EVAL and friends consume a representation of the code that is neither just a blob of bytes nor some exotic AST objects, it’s lists, hash-tables, strings, vectors, and all the types of things programmers manipulate every day.
The implication of this is that intercepting and modifying the code between READ and EVAL doesn’t really require any special knowledge: you have to know the semantics of your language, and how to manipulate its basic datastructures, but you don’t need to understand any special “metaobjects” for representing code.