Tar.pl – A tar creator and extractor in ~130 lines of Prolog

marbu · on Jan 18, 2023

From the README: <quote>We're going to declaratively define a parser for the tar format, and once we're done, our code will not only extract .tar files, but create them as well.</quote>

I recall reading about this somewhere, but haven't seen such nice real world example before. Well done.

zelphirkalt · on Jan 18, 2023

I think if done right, this falls out of Prolog automatically, as you program in relations, rather than functions. Depending on which places you leave empty in a query, Prolog will work to fill in the empty places. One is probably the tar file and one is the folder being added to a tar archive, but none of the 2 is any more important than the other.

Avshalom · on Jan 18, 2023

There's a cool little bencode library that also works like that, all of 30 short lines too.

https://github.com/mndrix/bencode/blob/master/prolog/bencode...

https://en.wikipedia.org/wiki/Bencode

samuell · on Jan 19, 2023

Check out "Production Prolog" by Michael Hendricks on YouTube, which demonstrates the same thing, at a talk at StrangeLoop:

https://youtu.be/G_eYTctGZw8

lliamander · on Jan 18, 2023

The bidirectional nature of predicates is one of the cool aspects of logic programming.

It is also cool that is completely transparent whether a predicate returns a single value or multiple values. It's sort of like if every function were implicitly a generator.

rhelz · on Jan 18, 2023

Nicely done. Prolog is killer for database programming/querying and parsing/formatting. Since these are not exactly "niche" tasks, its kinda sad that prolog isn't more popular.

dragonwriter · on Jan 18, 2023

One reason it is not more popular is that it was, but then its Datalog subset, and supersets of Datalog that are different than Prolog, took a lot of that use from it. (Often with the Prolog syntax removed since it is frequently embedded in other languages, so it becomes a semantic rather than syntactic subset of Prolog.)

cmrdporcupine · on Jan 18, 2023

There does at least seem to be a recent uptick in popularity and interest in Datalog, which expresses a more constrained form of similar concepts specialized on data management tasks. Not as general purpose as Prolog, but more suitable for a database environment.

(Also here's a gratuitous plug for my employer's product in a similar vein: https://docs.relational.ai/rel/intro/overview)

optimalsolver · on Jan 18, 2023

Wouldn’t Datalog be a better alternative?

gamache · on Jan 18, 2023

On one hand, Prolog is more powerful than Datalog. But on the other hand, Prolog is more powerful than Datalog.

DonHopkins · on Jan 18, 2023

Just as C combines the power and performance of assembly language with the flexibility and ease-of-use of assembly language.

maweki · on Jan 18, 2023

In this case, it's even worse. Prolog is computationally complete. With everything that comes with rice's theorem. Every non-trivial property of a program being generally undecidable.

Datalog, on the other hand, isn't. It's terminating, like SQL. And many program properties are indeed decidable.

tpoacher · on Jan 18, 2023

Query: morepowerfullthan(Prolog, Datalog).

yes; yes; yes; yes; yes; yes; yes; yes; ...

telekid · on Jan 18, 2023

With great power comes great irresponsibility.

rhelz · on Jan 18, 2023

Datalog is great for database programming, and since it is a restricted subset of prolog, a datalog comes in every box of prolog :-)

Avshalom · on Jan 18, 2023

Sort of.

Because of it's restrictions it traditionally uses different search/resolution strategies than Prolog. And in the other direction the SLD resolution strategy Prolog traditionally uses can fail on syntactically valid Datalog (clause ordering issues).

That said, the big Prolog implementations all support multiple resolution strategies although I suspect there are a lot of various optimizations Datalog squeezes out of it's restrictions that let it handle things that might not be reasonable in a given Prolog.

rhelz · on Jan 18, 2023

Good point; in the olden days of prolog only the SLD search strategy came out of the box, and you had to roll your own if you wanted others. However, happily a lot of the tabled programming research from the XSB prolog group (and others) has osmosed out into the mainstream prolog implementations by now.

cmrdporcupine · on Jan 18, 2023

Aren't there significant differences in implementation between a Datalog system and Prolog running similar rules, though? Bottom-up vs top-down eval, is what I read...

rhelz · on Jan 18, 2023

Classically, yes, but most prolog implementations these days support a tabled search strategy as Datalog typically does.

ddejohn · on Jan 19, 2023

> great for database programming

Can you elaborate on what you mean by this? Like, custom database implementations? I wrote a little bit of Prolog in college and thought it was neat but I haven't looked at it since. I've seen Datalog mentioned recently in relation to biscuits, a technology I think is incredibly cool but that I don't fully grok.

anon291 · on Jan 18, 2023

Yes. Datalog is not Turing-complete; thus is much easier to reason about, and good query optimizers can be much more aggressive. Once you introduce cuts (as in prolog), you have to throw a lot of that out the window. But Datalog is a subset of prolog.

d33 · on Jan 18, 2023

Just curious, is .pl the standard extension for Prolog programs? Why would Perl and Prolog choose the same one, instead of one using e.g. .pg?

Tomte · on Jan 18, 2023

Probably because few people used both.

.pro is also common for Prolog code, if you care about the extension clash.

tmtvl · on Jan 18, 2023

.pro is also used by IDL and Qt. I imagine there are few two or three letter extensions that aren't overloaded.

DonHopkins · on Jan 18, 2023

Because .log was taken.

mathstuf · on Jan 18, 2023

Fun trivia: I got a Emacs user to switch to Vim because in one of our class projects using Prolog, Emacs kept trying to highlight it as Perl (though it might be valid Perl syntax now that I think about it…). When I showed him that Vim did that for an empty file, but if it detected "Prolog-y" content (IIRC, the comment marker or a `:-` did it), it defaulted to Prolog highlighting.

No idea if Emacs ever fixed it (this was 2009 or so).

waynecochran · on Jan 18, 2023

This is a trivial fix for an emacs user. M-x prolog-mode.

puffoflogic · on Jan 18, 2023

Instructions unclear, pinky finger fell off.

civopsec · on Jan 18, 2023

> instead of one using e.g. .pg?

Why not just `.prolog`? In this day and age. (Maybe there’s some reason like “then I can’t put these source files on FAT filesystems”, I don’t know.)

Java seems to do fine with it’s longer-than-three-letters `.java`.

Avshalom · on Jan 18, 2023

Most editors will understand .prolog, but on the other hand it's 15 years older than perl and it's not hard to teach editors to treat .pl as prolog.

anon291 · on Jan 18, 2023

.pl is used because prolog is really old and FAT used to enforce extension limits. Not relevant nowadays, but it's stuck around.

pjmlp · on Jan 19, 2023

FAT is 8 + 3, so it still had space for .pro as well.

infradig · on Jan 19, 2023

Prolog is older than Perl. Perl usurped the .pl extension. Some are trying to claim it back.

eesmith · on Jan 20, 2023

Yes, it looks like modern Prolog systems using ".pl". I checked Ciao, GNU Prolog (which recognizes ".pl" and ".pro"), SICStus Prolog (also ".pl" and .pro"), SWI-Prolog, and B-Prolog.

The documentation for SWI-Prolog 5.10 (2010) even says "Tradition calls for .pl, but conflicts with Perl force the use of another extension on systems where extensions have global meaning, such as MS-Windows. On such systems .pro is the common alternative." https://www.swi-prolog.org/download/stable/doc/SWI-Prolog-5.....

Thing is, I can't confirm that tradition predates Perl 4 - call it 1993.

Turbo Prolog used ".pro" in the late 1980s - https://archive.org/details/bitsavers_borlandturOwnersHandbo... .

So did Prolog-2, according to the 1990 book at https://archive.org/details/advancedlogicpro0000unse/page/80... .

VPI Prolog from 1991 has an '".hc" file name extension (which stands for Horn Clause, the logic upon which the PROLOG language is based).'.

I even found one Prolog system from that era using ".txt", at https://archive.org/details/programminginpro0000cloc/page/26... .

falsissime · on Jan 20, 2023

The 1978 DEC10 Prolog user guide mentions PROG.PL as an example of a Prolog file with an extension. See 3.2 in https://userweb.fct.unl.pt/~lmp/publications/online-papers/U...

Maybe the 1975 Prolog I manual has more to say...

eesmith · on Jan 21, 2023

Good work!

I think it's even more explicit on p25 where it says "A module name can have the form either file.ext or just file. In the latter case the extension 'PL' is implied."

davelee · on Jan 18, 2023

A less common extension is .pro.

aeneasmackenzie · on Jan 18, 2023

I like these data description languages but most of them just have support for sequential structures and don't support offsets within the file. These are very common in file formats created for use with C or a similar language, which is most of them. It's not hard to add but it destroys the elegance of the formalism.

TheRealPomax · on Jan 18, 2023

What's the point of a comment like this when Prolog supports file seeking just fine? That's like complaining about the lack of structs in procedural languages when the post you're commenting on is a C++ post...

aeneasmackenzie · on Jan 18, 2023

You can file seek in any language, but not in most file format DSLs (or this one, because it's local). The author mentions QuickBMS and thus is aware of this, but most academic format languages don't support it.

TheRealPomax · on Jan 20, 2023

Except we're talking about Prolog, so we're talking about SWI-Prolog, the insanely academic flavour of this programming language.

It's basically the flavour of Prolog that won the popularity contest exactly because of its academic sponsorship (by the University of Amsterdam).

kragen · on Jan 19, 2023

you'll probably be interested in daedalus from galois

luuuzeta · on Jan 18, 2023

I took a course in college with the first half in Haskell and the second half in Prolog. Programming in Prolog felt like learning to program all over again.

ddejohn · on Jan 19, 2023

I took a course called Principles of Programming Languages. Covered grammars, regular languages, state machines, and then had practical assignments in Haskell and Prolog. I had the exact same experience with Prolog. I loved it, but I haven't looked at it since college.

jen729w · on Jan 19, 2023

Prolog was responsible for me getting kicked out of uni! I just couldn't wrap my head round it, and I failed the exam resit.

In fairness I was a shocking student and deserved it.

derefr · on Jan 18, 2023

Sounds exotic at first glance, but how different is this from the Erlang implementation of tar parsing? I.e. how much of this is relying on Prolog-specific features, vs. how much is just succinct head-clause pattern matching?

superdisk · on Jan 18, 2023

The unique part is that you get encoding and decoding for free when just describing the spec. It is essentially just succinct head-clause matching though.

derefr · on Jan 18, 2023

An idiomatic Erlang/Elixir implementation of validation for a binary format would also give you decoding for free — see e.g. https://zohaib.me/binary-pattern-matching-in-elixir/. So I guess it's the encoding part that's unique to how Prolog works.

Which makes sense; my understanding is that in Prolog, insofar as a function encodingOf(A, B) is defined by using A and B on either side of a bind expression to equate B as the encoding of A, if encodingOf(A, knownEnc) would act as a parser for knownEnc, then encodingOf(knownDec, B) would act as an encoder for knownDec — essentially swapping the question "what would this parse to" given an encoded binary, for "what would parse to this" given a decoded term.

Erlang can't do exactly that — a given variable's "bound-ness" is determined during compilation, and head-clause variables can't be indeterminately bound — but if it could, then you would indeed get an encoder for the same format "for free", since binary-pattern matching in Erlang becomes a binary-building expression depending on whether the variables in the expression are bound or unbound.

(Now I'm curious how hard it would be to add indeterminately-bound compile-time function polymorphism to Erlang — i.e. generating bytecode for the powerset of bound-ness-es of the variables in the head clause, only where at least one generates valid bytecode, probably name-mangled with a bitfield suffix of said bound-ness-es; and then being able to bind function parameters from such functions as new output variables in expressions, where the call sites also generate a call to the equivalent mangled name. The usefulness of this would probably be pretty small, without the other aspects of Prolog, but you never know...)

Jtsummers · on Jan 18, 2023

GP talks about getting encoding and decoding together by implementing the spec once. In Erlang/Elixir you can only get one at a time because you can't do the equivalent of this:

  some_predicate(<concrete values>, Decoded). % decoder
  some_predicate(Encoded, <concrete values>). % encoder
  some_predicate(<concrete values>, <concrete values>). % validator

But in Prolog, that's what many predicates permit as a baseline. In Erlang, yes, a validator is, or can be, a parser/decoder. But it's not an encoder which still needs to be a separate function (or set of functions) to describe movement in the other direction.

dwohnitmok · on Jan 18, 2023

This is fantastic bite-sized example of exactly what makes Prolog and other logic languages so different to program in and will probably become my go-to example for introducing what's so special about Prolog to programmers with backgrounds in other programming languages who are looking for examples that involve "real-world" programming tasks.

lostmsu · on Jan 18, 2023

To go even further, how much is it different from C# or C++? There you'd just declare the appropriate structs and bitcast.

superdisk · on Jan 18, 2023

That works okay only for fixed-sized structs that don't require any transformation of their members-- otherwise you'll need to write a traditional parser.

lostmsu · on Jan 18, 2023

It is there in Prolog version too. Search for "FileSize" on the page.

superdisk · on Jan 18, 2023

What I mean is that if you need to parse a format that has variable-sized structures, you can't just memcpy them into a struct because structs have a fixed size. Additionally sometimes you need to parse a value into a more tractable representation, for example those octal ASCII strings in the tar format, which requires some additional processing that isn't possible with loading/serializing a struct.

lostmsu · on Jan 18, 2023

Which is also there in the Prolog code.

Filligree · on Jan 19, 2023

Sorry, isn't that undefined behaviour?

lostmsu · on Jan 19, 2023

Not in C# for sure. Not a C++ expert, but I believe reinterpret_cast behavior is defined too.

pjmlp · on Jan 19, 2023

You can have it as part of unsafe code, hence why the bytecode verifier won't validate Assemblies with unsafe code.

https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...

lostmsu · on Jan 19, 2023

I believe verifier is fine with unsafe code.

Also not sure what does it have to do with defined/undefined behavior. Even if it is not verifiable, the behavior is still defined.

aebtebeten · on Jan 18, 2023

DCGs are an awesome way to specify a format but a poor way to implement it?

nmadden · on Jan 18, 2023

That's been my experience, and chimes with what the author writes at the end about performance. I remember trying to stream a few hundred MB of CSV data with Prolog a few years ago and giving up after not being able to either write or find a library that could cope with even that kind of size (which I wouldn't consider particularly big). (In the end, I managed to bulk import the data into SQLite, which took a few seconds, then used SQL instead).

I should give Scryer Prolog a go to see if it is any better, and I'm by no means an expert Prolog programmer so YMMV.

richard_todd · on Jan 18, 2023

It’s all mind-bending fun until the interpreter starts blowing one of its internal stacks. Then it’s hours of tedium tracing through queries looking for ways to contain the computation.

mark_l_watson · on Jan 18, 2023

I agree and the code itself is awesome also.

Around 30 years ago I was really into Prolog but recently I have tried getting back into it and I am having a rough go at it. I still have the old Art of Prolog book, so that might be a good reentry point.

mncharity · on Jan 18, 2023

> recently I have tried getting back into it and I am having a rough go at it

Fwiw, doing problem setup and result processing in another language can avoid much pain. So prolog only where it excels, and a mainstream ecosystem for the rest. I've liked batch mode "write a prolog file, run it, read result (eg json, or wrapper-language code to eval)", but was recently thinking of trying PySwip[1].

[1] https://github.com/yuce/pyswip

riesenheim · on Jan 19, 2023

I had the same troubles until I encountered Markus Triska's modern perspective on revitalizing Prolog: https://www.metalevel.at/prolog. His video 'Preparing Prolog' may be the best place to (re)start: https://youtu.be/kGQNeeRp4sM

philjohn · on Jan 18, 2023

I still maintain that learning Prolog 25 years ago as part of my degree (AI + Comp Sci) made me a better programmer.

umvi · on Jan 18, 2023

> It's unfortunately absolutely dog slow

Seems like a common chicken-egg problem of niche languages. Performance doesn't improve until lots of people are using it, but lots of people won't use it until performance improves.

tmtvl · on Jan 18, 2023

Eh, people will happily use slow languages if they are marketed the right way. Look at Python, it has a very large following, yet it's so slow that the dog's descendants will have evolved to adapt to a marine environment by the time Python finishes running Hello World.

kartic · on Jan 18, 2023

Makes me want to learn Prolog.

lynx23 · on Jan 19, 2023

Thanks for this link. Prolog is sort of the last language on my list of things to look at. However, I never really made it past the classic towers of hanoi example, always longing for a small self-contained example that achieves a real world goal. And here it is! Very neat showcase.

moss2 · on Jan 18, 2023

Absolutely marvelous.

dmd · on Jan 18, 2023

This makes me miss using Ab Initio, which in its language "DML" lets you describe (even extremely complex) record formats declaratively.

pmarreck · on Jan 18, 2023

Now do it in Bash. ;)

olddustytrail · on Jan 18, 2023

Sure

    #!/bin/bash
    tar "$@"

Takes the same arguments as the original!

pmarreck · on Jan 18, 2023

without using `tar`

jonas21 · on Jan 18, 2023

I don't get it. tar is such a simple format you could do this in like 50 lines of C. Most of that would be declaring a struct. And unlike this implementation, it would be performant.

mattkrause · on Jan 18, 2023

The cool thing is that it uses the same code as the reader and writer.In essence, you write one function, taking two arguments:

    tar(dstTarFile, srcFiles)

Next you provide one of the two arguments and query Prolog to find the other: you pass in a tar file as the first argument, and it tells you what `srcFiles` would need to be to create that archive, effectively extracting it. Alternately, you pass in `srcFiles` and it tells you what the corresponding dstTarFile should be, creating an archive.

Another way to think about it is that the Prolog code is not directly telling you how to read/write tar files. Instead, imagine it as a cost function that tells you how well this tar file corresponds to the uncompressed representation of those files, which you're minimizing.

If you were to write it in C, the code would imperative: do this, then that, then this other thing. Consequently, the reader and writer code are totally different. For example, to process the file size, you would write functions two functions:

    char* uIntToOctalASCII(unsigned int sz)`

and

     octalASCIIToUInt(char* szstr).

Semantically, these are inverses[0]:

    octalASCIIToUInt(uIntToOctalASCII(x)) == x

but the implementations have very little in common. At best, you can use one to test the other.

[0] And leaks memory, but whatever....

xdfgh1112 · on Jan 19, 2023

Similarly if you write a sudoku solver in prolog, you automatically get a solver AND a problem generator, both for unfilled and partially filled grids.

dagss · on Jan 18, 2023

First time I see Prolog...I thougt that seeing the compiler invert a function automatically is quite impressing. I mean just being able to take only the forwards declaration and figure out the inverse is interesting in itself.

Avshalom · on Jan 19, 2023

You can pack bytes into a structure in prolog succintly enough too

   open('test.tar',read,TarFile)
   HSizes = [100,8,8,8,12,12,8,1,100,6,2,32,32,8,8,155,12],
   maplist(length,HBytes,HSizes),
   append(HBytes,~read_bytes(TarFile,512)),

HBytes is now a list of lists of your bytes, do with them as you please, we can even name them if we want

  HFields = [ filename, filemode, ownerid, groupid, filesize
            , modificationtime, checksum, typeflag, linkedfilename
            , ownerusername, ownergroupname, devicemajornumber
            , deviceminornumber, filenameprefix, padding],
  keys_and_values(HFields,HBytes,Header),

you could push it off into a predicate and it'd even more or less work relationally and it'd probably be faster than the DCG...

but it wouldn't be interesting.

And like they said https://news.ycombinator.com/item?id=34431450

anticensor · on Jan 19, 2023

Initially misread it as tax.pl.

jll29 · on Jan 18, 2023

Writing tar in PROLOG sounds "cool", a neat hack, but it is as practical as writing the Linux kernel in COBOL. Possible, perhaps, but not the best idea.

It's hard enough to write things in PROLOG that _are_ ideally suited for the language - like natural language parsers (they used to be LISP or PROLOG, but these days they are all in C++ or FORTRAN wrapped in Python so no-one gets hurt; why? Because of efficiency and numeric libraries required - modern parsers for human languages implement probabilistic or neural models, no longer symbol manipulation and pattern matching).

EDIT: I'm not a PROLOG hater - in fact, enjoyed reading the article (and I will read sequels on implementing "mount" in RPG3, "htop" in COMAL etc.) and my friends include some PROLOG lovers. I also find it important and stimulating to learn about another paradigm, and logical programming is an important one for many reasons. This rant is more about using it on the specific task of implementing "tar", for which it is a poor fit, as the article's PS shows.

graymatters · on Jan 19, 2023

Rubbish. It’s attitude like that which keeps less common languages from taking off. Hidden mindset-language bias stemming from using the hammer you’re familiar with for every nail.

Moreover, Prolog in particular requires a complete mindset change in order to naturally write in it. Once you reach that level - anything is possible and even not that hard to write in Prolog.

anon291 · on Jan 18, 2023

Binary file formats, like text formats, are simply languages, and declarative logic languages like Prolog, are excellent for parsing, as you said. Thus, it stands to reason that Prolog would be a very good fit for parsing data. That doesn't change just because the data is not ASCII-encoded.

phyzome · on Jan 18, 2023

This version is also insecure, as it does not prevent file path traversal attacks.