I'm a big awk fan but I'm not sold on this. The awk program is not very readable- I think that's fine for a dense one-liner, I'm not really sure it carries over to a 60 line script. I think for something like this I'd prefer a bash script, maybe with awk invoked somewhere, that would be much easier to understand at a glance.
Is there something in the awk script that makes it advantageous over a shell script?
Edit: I hadn't read the author's conclusion yet when I posted, he agrees
I consider AWK amazing, but I think it should remain where it excels: for exploratory data analysis and for one-liner data extraction scripts
A while ago I wrote a program to renumber TRS-80 Model 100 BASIC code in awk. Then re-wrote it in bash (pure bash, no sed/grep/cut/tr/awk/bc/etc), and the two are practically identical. That suprised me just how practically identical they wwere in the end.
awk is like a hidden miracle of utility just sitting there unused on every machine since the dawn of time.
Normally if you want something to be ultra portable, you write it in sh or ksh, (though by now, bash would be ok, I mean there is bash for xenix), but to get the most out of ksh or bash, you have to use all the available features and tricks that are powerful and useful but NOT readable. 50% of the logic of a given line of code is not spelled out in the keywords but in arcane brace expansion and word splitting rules.
But every system that might have some version of bash or ksh or plain sh, always has awk too, and even the oldest plain not-gnu awk is a real, "normal", more or less straighforward explicit programming language compared to bash. Not all that much more pawerful, but more readable and more straightforward to write. Things are done with functions that take parameters and do things to the parameters, not with special syntax that does magic transformations of variables which you then parlay into various uses.
Everyone uses perl/python/ruby/php/whatever when the project goes beyond bash scope, but they all need to be installed and need to be a particular version, and almost always need some library of modules as well, and every python script breaks every other year or on every other new platform. But awk is already there, even on ancient obscure systems that absolutely can not have the current version of python or ruby and all the gems.
I don't use it for current day to day stuff either, there's too many common things today that it has no knowledge of. I don't want to try to do https transactions or parse xml in awk. I'm just saying it's interesting or somehow notable how generically useful awk is pretty much just like bash or python, installed everywhere already, and almost utterly unused.
Okay, so the first dev writes 60 lines of indecipherable code, runs some sample invocations, looks at the output, says it looks good. A few months later, someone - maybe the original dev, maybe some other sucker - notices that in some edge case the code misbehaves. Now what? (Obviously, any answer that involves "don't write code with bugs" or "write perfect tests" is a nonstarter)
If we're going with the "start from scratch if it ever proves inadequate" philosophy, then the person who notices the misbehavior looks at the original code, sees that it's written in some obscure language, is undecipherable, but also is only 60 lines long, and decides that it will probably be simpler to make a new (short) implementation in their own favorite language that correctly handles both the original use case and their new requirement. The key insight is that given how much easier it is to write fresh code than understand old stuff, they could very well be correct in that guess, and the end result is a single piece of small clean code, rather than a simple core with layers of patches glued on top.
In this particular case, we're talking about a "make" replacement, so testing the new implementation can be done by simply running "make all" for the project. If it passes, then the new implementation must be identical to the old one in all the ways that actually matter for the project at hand. In all likelihood, for a simple program like this, fixing one bug will also silently fix others because the new architecture is probably better than the old one.
I actually really like this approach, and have been thinking about this in regards to coding with an LLM - for a sufficiently simple program (and assuming no security concerns), once you trust your test suite, you should trust AI generated code that passes it. And then if requirements change, you should be able to amend the test cases, rerun the AI until it passes all tests and linters, maybe give the code a quick glance, and be on with your life.
What do you find hard to read about it? If you know what make does, I think it is fairly easy to read, even for those who don’t know awk at all, but do know the Unix shell (to recognize ‘ls -t’) and C (both of which, probably the audience for this book knew, given that the book is from 1988)
> I think for something like this I'd prefer a bash script
But would it be easier to read? I doubt see why it would.
It would have been ksh, which was the bash of the day, as in, the more featureful sh-compatible sh-superset.
But a bash or ksh script would have been less readable than awk.
bash (or ksh88 or ksh93) is powrful and useful but not readable if you're actually using the powerful useful features.
In bash, a lot of functionality comes in the form of brace expansions and word splitting, basically abusing the command parser to get results there is no actual function for. In awk and any other more normal programming language, those same features come in the form of an explicit function to do that thing.
>In bash, a lot of functionality comes in the form of brace expansions and word splitting, basically abusing the command parser to get results there is no actual function for. In awk and any other more normal programming language, those same features come in the form of an explicit function to do that thing.
Right. That's one of the reasons why the man page for bash is so long. IIRC, going way back, even the page for plain sh was long, for the same reason.
> for exploratory data analysis and for one-liner data extraction scripts
I think both you and the author just don't like AWK if that's the takeaway. What you're describing is literally 1% of the AWK language -- like you don't have to like it, it's weird in many respects but you're treating AWK like it's jq when it's actually closer to like a Perl-Lite/Bash mix. An AWK focused on just those use-cases would look very different.
I think it should be appreciated in context: it's a good way to teach both awk(1) and make(1) to someone new to UNIX. It also demonstrates how to use awk(1) for prototyping, which IMO is a good programming habit to "develop": it forces to focus on the essential, and not to unnecessarily overthink.
> Is there something in the awk script that makes it advantageous over a shell script?
Pseudo multi-dimensional associative arrays for representing the dependency graph of make. This part:
for (i = 2; i <= NF; i++)
slist[nm, ++scnt[nm]] = $i
The way awk supports them is hacky and not really a multidimensional array, but still is better than what you would have to do with bash, because of split() and some other language features.
It would be much easier with any scripting language though, Perl for example.
It seems pretty readable to me, in particular the "update" function parses as JavaScript if you fix the implicit string concatenation (template literals or +) and replace the # comments with //. I'm actually surprised JavaScript is so similar to awk; it feels like a descendant language tbh.
Bash would really be bad idea if it is going to use bash stitching so many gnu utils for this kind of job.
I once had to rewrite a bash script into awk[1] that is big enough and it made the program more readable and the total time execution came down from 12 mins to less than 1 second.
I think maybe the original bash script would have written badly, (each util command will invoke it's own process and it has to piped to others instead of using awk which will be running in a single process).
Is there something in the awk script that makes it advantageous over a shell script?
Edit: I hadn't read the author's conclusion yet when I posted, he agrees