public final class Cons extends LispObject
{
public LispObject car;
public LispObject cdr;
}
I would have thought it would provide that abstraction to the programmer but use another representation internally. Chasing pointers like this on modern hardware is surely crazy?
We're using object-oriented languages like Java on modern hardware, so pointer chasing is quite natural. Typical Java programs will do that all the time. In many ways the Java runtime is relatively similar to a Lisp runtime - it's just not tailored on the low-level to Lisp - for example cons cells for linked lists are not provided on the virtual machine level.
Since lists in typical Lisp implementations are made with cons cells, some form of two element data structure is needed. Using Java objects has the drawback, that this is not the most efficient representation.
Good Lisp implementations typically make sure that the cons cell has very little overhead:
* cons cells are just objects made of two memory words without type tag. The tag is encoded in pointers to these cons cells.
* cons cells can directly encode some simple data like fixnum integers, characters, ...
* cons cells might be allocated in special memory regions and GCs might take special care
OTOH, more efficient low-level representations like CDR-coding, which allows some lists to be represented in vector like ways is mostly not used in current implementations, since it was not see worth the effort anymore - it was thought worth on older systems with much small main memories than we have today. Smaller main memories than the caches on a modern CPU.
True, especially considering that parametric polymorphism is done via "everything is a pointer" way (although I've heard something about introducing "value types" in Java in the future).
Also Common Lisp's pairs are heterogeneous, polymorphic by nature, so I doubt there could be any other solution but pointers.
For performant computations you would need unboxed arrays in Java, just as you would use SIMPLE-ARRAY in Common Lisp (which should be implemented efficiently in ABCL as well).
In addition to cons cells being forced in some scenarios, this is Common Lisp, not Lisp 1.5: Programmers who want performance for linear collections are expected to use data types like vectors, which are primitive and not implemented in terms of cons cells at all.
> Vectors are Common Lisp's basic integer-indexed collection, and they come in two flavors. Fixed-size vectors are a lot like arrays in a language such as Java: a thin veneer over a chunk of contiguous memory that holds the vector's elements.2 Resizable vectors, on the other hand, are more like arrays in Perl or Ruby, lists in Python, or the ArrayList class in Java: they abstract the actual storage, allowing the vector to grow and shrink as elements are added and removed.
Because CONS cells aren't used only for lists, but are fundamental data type used as building blocks for few other things.
As such, they are required to hold any value or reference possible, and aren't exactly amenable to different format.
There is one technique which had somewhat large use, CDR-coding, based in the idea that you can tag a cell in a way that tells the system that the next field isn't a reference to a CONS cell, but the CAR of one, and thus linearize lists in memory. However, doing it fast requires a custom cpu so you can resolve this fast in transparent way, as well as it turned out to not give much benefit. However, widening gap between cpu and memory might make it attractive again...
A lot of things in Lisp force the implementation of lists as cons cells. It’s probably possible to do something different, but no one has, and the result won’t be a Common Lisp.
(Consider that the detail of improper lists is enshrined in the language.)
> A lot of things in Lisp force the implementation of lists as cons cells
This is one of the reasons that Clojure wasn’t written to be backwards compatible with CL. Rich chose to implement all of his core functions on the first/rest abstraction instead, as explained in his Clojure for Lisp Programmers talk.
Common Lisp does have proper arrays as well, which can be used if the situation calls for it.
As it turns out, arrays aren't actually used that much (except for strings of course) in CL code. Most lists are just a few elements long, and regular lists are perfect for that.
The structure of a list is still exposed, and that decision is baked into the essence of Lisp. If you have a list, you cannot substitute a skip list or an array. If you have a skip list, it is (I think) not a language-level sequence and so things like rest and nth won’t work on it.
Arrays, struts, all other things are possible, but the things that make Lisp Lisp mean that you may get a list where repeatedly calling cdr may result in something that is not listp (cons or nil). And that’s weird. It enables some things that are wonderful, but I think the Lisps are practically unique in conflating pairs and list elements.
Common Lisp has a standard abstract data structure called 'sequence' and generic operations for it. Sequences are either vectors or lists. Vectors are bit vectors, strings, general vectors, ... Sequence operations are then supported for all subtypes of sequences.
There are some problems with this design. The main motivations for it were:
* basic operations in Lisp are type specific, one wanted to have some generic operations and a generic sequence abstraction over lists and vectors
* the low-level data structures like conses, arrays, tables are not object-oriented and generic
* generic operations in a dynamically typed language are slower, since there is always a runtime dispatch necessary
* to remove runtime dispatch one can use type declarations (which are optional in Common Lisp) or a JIT compiler (which is not used in most implementations, they do AOT compilation)
* the language at the type of this design had no agreed OOP extension - CLOS came some years later -> sequences are non-extensible and don't support adding new sequence types
The design of this was later improved in languages like Dylan: see the Collections of Dylan
I use arrays much more than lists, for the same reasons arrays are useful in any language. Plus they're dynamically resizable and you can map across them in Lisp just like lists. For many purposes arrays are a much better choice than lists in CL.
Last I heard (years ago), some Lisp implementations had tried using arrays for non-branching cons chains, but found that performance wasn't actually better than actual conses in general.
> In computer science CDR coding is a compressed data representation for Lisp linked lists. It was developed and patented by the MIT Artificial Intelligence Laboratory, and implemented in computer hardware in a number of Lisp machines derived from the MIT CADR.
> CDR coding is in fact a fairly general idea; whenever a data object A ends in a reference to another data structure B, we can instead place the structure B itself there, overlapping and running off the end of A. By doing this we free the space required by the reference, which can add up if done many times, and also improve locality of reference, enhancing performance on modern machines. The transformation is especially effective for the cons-based lists it was created for; we free about half of the space for each node we perform this transformation on.