From the README: <quote>We're going to declaratively define a parser for the tar format, and once we're done, our code will not only extract .tar files, but create them as well.</quote>
I recall reading about this somewhere, but haven't seen such nice real world example before. Well done.
I think if done right, this falls out of Prolog automatically, as you program in relations, rather than functions. Depending on which places you leave empty in a query, Prolog will work to fill in the empty places. One is probably the tar file and one is the folder being added to a tar archive, but none of the 2 is any more important than the other.
The bidirectional nature of predicates is one of the cool aspects of logic programming.
It is also cool that is completely transparent whether a predicate returns a single value or multiple values. It's sort of like if every function were implicitly a generator.
Nicely done. Prolog is killer for database programming/querying and parsing/formatting. Since these are not exactly "niche" tasks, its kinda sad that prolog isn't more popular.
One reason it is not more popular is that it was, but then its Datalog subset, and supersets of Datalog that are different than Prolog, took a lot of that use from it. (Often with the Prolog syntax removed since it is frequently embedded in other languages, so it becomes a semantic rather than syntactic subset of Prolog.)
There does at least seem to be a recent uptick in popularity and interest in Datalog, which expresses a more constrained form of similar concepts specialized on data management tasks. Not as general purpose as Prolog, but more suitable for a database environment.
In this case, it's even worse. Prolog is computationally complete. With everything that comes with rice's theorem. Every non-trivial property of a program being generally undecidable.
Datalog, on the other hand, isn't. It's terminating, like SQL. And many program properties are indeed decidable.
Because of it's restrictions it traditionally uses different search/resolution strategies than Prolog. And in the other direction the SLD resolution strategy Prolog traditionally uses can fail on syntactically valid Datalog (clause ordering issues).
That said, the big Prolog implementations all support multiple resolution strategies although I suspect there are a lot of various optimizations Datalog squeezes out of it's restrictions that let it handle things that might not be reasonable in a given Prolog.
Good point; in the olden days of prolog only the SLD search strategy came out of the box, and you had to roll your own if you wanted others. However, happily a lot of the tabled programming research from the XSB prolog group (and others) has osmosed out into the mainstream prolog implementations by now.
Aren't there significant differences in implementation between a Datalog system and Prolog running similar rules, though? Bottom-up vs top-down eval, is what I read...
Can you elaborate on what you mean by this? Like, custom database implementations? I wrote a little bit of Prolog in college and thought it was neat but I haven't looked at it since. I've seen Datalog mentioned recently in relation to biscuits, a technology I think is incredibly cool but that I don't fully grok.
Yes. Datalog is not Turing-complete; thus is much easier to reason about, and good query optimizers can be much more aggressive. Once you introduce cuts (as in prolog), you have to throw a lot of that out the window. But Datalog is a subset of prolog.
Fun trivia: I got a Emacs user to switch to Vim because in one of our class projects using Prolog, Emacs kept trying to highlight it as Perl (though it might be valid Perl syntax now that I think about it…). When I showed him that Vim did that for an empty file, but if it detected "Prolog-y" content (IIRC, the comment marker or a `:-` did it), it defaulted to Prolog highlighting.
No idea if Emacs ever fixed it (this was 2009 or so).
Yes, it looks like modern Prolog systems using ".pl". I checked Ciao, GNU Prolog (which recognizes ".pl" and ".pro"), SICStus Prolog (also ".pl" and .pro"), SWI-Prolog, and B-Prolog.
The documentation for SWI-Prolog 5.10 (2010) even says "Tradition calls for .pl, but conflicts with Perl force the use of another extension on systems where extensions have global meaning, such as MS-Windows. On such systems .pro is the common alternative." https://www.swi-prolog.org/download/stable/doc/SWI-Prolog-5.....
Thing is, I can't confirm that tradition predates Perl 4 - call it 1993.
I think it's even more explicit on p25 where it says "A module name can have the form either file.ext or just file. In the latter case the extension 'PL' is implied."
I like these data description languages but most of them just have support for sequential structures and don't support offsets within the file. These are very common in file formats created for use with C or a similar language, which is most of them. It's not hard to add but it destroys the elegance of the formalism.
What's the point of a comment like this when Prolog supports file seeking just fine? That's like complaining about the lack of structs in procedural languages when the post you're commenting on is a C++ post...
You can file seek in any language, but not in most file format DSLs (or this one, because it's local). The author mentions QuickBMS and thus is aware of this, but most academic format languages don't support it.
I took a course in college with the first half in Haskell and the second half in Prolog. Programming in Prolog felt like learning to program all over again.
I took a course called Principles of Programming Languages. Covered grammars, regular languages, state machines, and then had practical assignments in Haskell and Prolog. I had the exact same experience with Prolog. I loved it, but I haven't looked at it since college.
Sounds exotic at first glance, but how different is this from the Erlang implementation of tar parsing? I.e. how much of this is relying on Prolog-specific features, vs. how much is just succinct head-clause pattern matching?
The unique part is that you get encoding and decoding for free when just describing the spec. It is essentially just succinct head-clause matching though.
An idiomatic Erlang/Elixir implementation of validation for a binary format would also give you decoding for free — see e.g. https://zohaib.me/binary-pattern-matching-in-elixir/. So I guess it's the encoding part that's unique to how Prolog works.
Which makes sense; my understanding is that in Prolog, insofar as a function encodingOf(A, B) is defined by using A and B on either side of a bind expression to equate B as the encoding of A, if encodingOf(A, knownEnc) would act as a parser for knownEnc, then encodingOf(knownDec, B) would act as an encoder for knownDec — essentially swapping the question "what would this parse to" given an encoded binary, for "what would parse to this" given a decoded term.
Erlang can't do exactly that — a given variable's "bound-ness" is determined during compilation, and head-clause variables can't be indeterminately bound — but if it could, then you would indeed get an encoder for the same format "for free", since binary-pattern matching in Erlang becomes a binary-building expression depending on whether the variables in the expression are bound or unbound.
(Now I'm curious how hard it would be to add indeterminately-bound compile-time function polymorphism to Erlang — i.e. generating bytecode for the powerset of bound-ness-es of the variables in the head clause, only where at least one generates valid bytecode, probably name-mangled with a bitfield suffix of said bound-ness-es; and then being able to bind function parameters from such functions as new output variables in expressions, where the call sites also generate a call to the equivalent mangled name. The usefulness of this would probably be pretty small, without the other aspects of Prolog, but you never know...)
GP talks about getting encoding and decoding together by implementing the spec once. In Erlang/Elixir you can only get one at a time because you can't do the equivalent of this:
But in Prolog, that's what many predicates permit as a baseline. In Erlang, yes, a validator is, or can be, a parser/decoder. But it's not an encoder which still needs to be a separate function (or set of functions) to describe movement in the other direction.
This is fantastic bite-sized example of exactly what makes Prolog and other logic languages so different to program in and will probably become my go-to example for introducing what's so special about Prolog to programmers with backgrounds in other programming languages who are looking for examples that involve "real-world" programming tasks.
That works okay only for fixed-sized structs that don't require any transformation of their members-- otherwise you'll need to write a traditional parser.
What I mean is that if you need to parse a format that has variable-sized structures, you can't just memcpy them into a struct because structs have a fixed size. Additionally sometimes you need to parse a value into a more tractable representation, for example those octal ASCII strings in the tar format, which requires some additional processing that isn't possible with loading/serializing a struct.
That's been my experience, and chimes with what the author writes at the end about performance. I remember trying to stream a few hundred MB of CSV data with Prolog a few years ago and giving up after not being able to either write or find a library that could cope with even that kind of size (which I wouldn't consider particularly big). (In the end, I managed to bulk import the data into SQLite, which took a few seconds, then used SQL instead).
I should give Scryer Prolog a go to see if it is any better, and I'm by no means an expert Prolog programmer so YMMV.
It’s all mind-bending fun until the interpreter starts blowing one of its internal stacks. Then it’s hours of tedium tracing through queries looking for ways to contain the computation.
Around 30 years ago I was really into Prolog but recently I have tried getting back into it and I am having a rough go at it. I still have the old Art of Prolog book, so that might be a good reentry point.
> recently I have tried getting back into it and I am having a rough go at it
Fwiw, doing problem setup and result processing in another language can avoid much pain. So prolog only where it excels, and a mainstream ecosystem for the rest. I've liked batch mode "write a prolog file, run it, read result (eg json, or wrapper-language code to eval)", but was recently thinking of trying PySwip[1].
Seems like a common chicken-egg problem of niche languages. Performance doesn't improve until lots of people are using it, but lots of people won't use it until performance improves.
Eh, people will happily use slow languages if they are marketed the right way. Look at Python, it has a very large following, yet it's so slow that the dog's descendants will have evolved to adapt to a marine environment by the time Python finishes running Hello World.
Thanks for this link. Prolog is sort of the last language on my list of things to look at. However, I never really made it past the classic towers of hanoi example, always longing for a small self-contained example that achieves a real world goal. And here it is! Very neat showcase.
I don't get it. tar is such a simple format you could do this in like 50 lines of C. Most of that would be declaring a struct. And unlike this implementation, it would be performant.
The cool thing is that it uses the same code as the reader and writer.In essence, you write one function, taking two arguments:
tar(dstTarFile, srcFiles)
Next you provide one of the two arguments and query Prolog to find the other: you pass in a tar file as the first argument, and it tells you what `srcFiles` would need to be to create that archive, effectively extracting it. Alternately, you pass in `srcFiles` and it tells you what the corresponding dstTarFile should be, creating an archive.
Another way to think about it is that the Prolog code is not directly telling you how to read/write tar files. Instead, imagine it as a cost function that tells you how well this tar file corresponds to the uncompressed representation of those files, which you're minimizing.
If you were to write it in C, the code would imperative: do this, then that, then this other thing. Consequently, the reader and writer code are totally different. For example, to process the file size, you would write functions two functions:
char* uIntToOctalASCII(unsigned int sz)`
and
octalASCIIToUInt(char* szstr).
Semantically, these are inverses[0]:
octalASCIIToUInt(uIntToOctalASCII(x)) == x
but the implementations have very little in common. At best, you can use one to test the other.
Similarly if you write a sudoku solver in prolog, you automatically get a solver AND a problem generator, both for unfilled and partially filled grids.
First time I see Prolog...I thougt that seeing the compiler invert a function automatically is quite impressing. I mean just being able to take only the forwards declaration and figure out the inverse is interesting in itself.
Writing tar in PROLOG sounds "cool", a neat hack, but it is as practical as writing the Linux kernel in COBOL. Possible, perhaps, but not the best idea.
It's hard enough to write things in PROLOG that _are_ ideally suited for the language - like natural language parsers (they used to be LISP or PROLOG, but these days they are all in C++ or FORTRAN wrapped in Python so no-one gets hurt; why? Because of efficiency and numeric libraries required - modern parsers for human languages implement probabilistic or neural models, no longer symbol manipulation and pattern matching).
EDIT: I'm not a PROLOG hater - in fact, enjoyed reading the article (and I will read sequels on implementing "mount" in RPG3, "htop" in COMAL etc.) and my friends include some PROLOG lovers. I also find it important and stimulating to learn about another paradigm, and logical programming is an important one for many reasons. This rant is more about using it on the specific task of implementing "tar", for which it is a poor fit, as the article's PS shows.
Rubbish. It’s attitude like that which keeps less common languages from taking off.
Hidden mindset-language bias stemming from using the hammer you’re familiar with for every nail.
Moreover, Prolog in particular requires a complete mindset change in order to naturally write in it. Once you reach that level - anything is possible and even not that hard to write in Prolog.
Binary file formats, like text formats, are simply languages, and declarative logic languages like Prolog, are excellent for parsing, as you said. Thus, it stands to reason that Prolog would be a very good fit for parsing data. That doesn't change just because the data is not ASCII-encoded.
I recall reading about this somewhere, but haven't seen such nice real world example before. Well done.