Hacker News new | past | comments | ask | show | jobs | submit login

I'm a big awk fan but I'm not sold on this. The awk program is not very readable- I think that's fine for a dense one-liner, I'm not really sure it carries over to a 60 line script. I think for something like this I'd prefer a bash script, maybe with awk invoked somewhere, that would be much easier to understand at a glance.

Is there something in the awk script that makes it advantageous over a shell script?

Edit: I hadn't read the author's conclusion yet when I posted, he agrees

  I consider AWK amazing, but I think it should remain where it excels: for exploratory data analysis and for one-liner data extraction scripts



A while ago I wrote a program to renumber TRS-80 Model 100 BASIC code in awk. Then re-wrote it in bash (pure bash, no sed/grep/cut/tr/awk/bc/etc), and the two are practically identical. That suprised me just how practically identical they wwere in the end.

awk is like a hidden miracle of utility just sitting there unused on every machine since the dawn of time.

Normally if you want something to be ultra portable, you write it in sh or ksh, (though by now, bash would be ok, I mean there is bash for xenix), but to get the most out of ksh or bash, you have to use all the available features and tricks that are powerful and useful but NOT readable. 50% of the logic of a given line of code is not spelled out in the keywords but in arcane brace expansion and word splitting rules.

But every system that might have some version of bash or ksh or plain sh, always has awk too, and even the oldest plain not-gnu awk is a real, "normal", more or less straighforward explicit programming language compared to bash. Not all that much more pawerful, but more readable and more straightforward to write. Things are done with functions that take parameters and do things to the parameters, not with special syntax that does magic transformations of variables which you then parlay into various uses.

Everyone uses perl/python/ruby/php/whatever when the project goes beyond bash scope, but they all need to be installed and need to be a particular version, and almost always need some library of modules as well, and every python script breaks every other year or on every other new platform. But awk is already there, even on ancient obscure systems that absolutely can not have the current version of python or ruby and all the gems.

I don't use it for current day to day stuff either, there's too many common things today that it has no knowledge of. I don't want to try to do https transactions or parse xml in awk. I'm just saying it's interesting or somehow notable how generically useful awk is pretty much just like bash or python, installed everywhere already, and almost utterly unused.


Well I think generally a 60 line program fits in that spot of "write once, read never, start from scratch if it ever turns out to be inadequate"

... also known as the APL Zone


I'm not dead set against it, but if there was any mistake or bugs I don't know how you'd find them and fix them in that approach


By checking the correctness of the outputs, which you need to do anyway?


Okay, so the first dev writes 60 lines of indecipherable code, runs some sample invocations, looks at the output, says it looks good. A few months later, someone - maybe the original dev, maybe some other sucker - notices that in some edge case the code misbehaves. Now what? (Obviously, any answer that involves "don't write code with bugs" or "write perfect tests" is a nonstarter)


If we're going with the "start from scratch if it ever proves inadequate" philosophy, then the person who notices the misbehavior looks at the original code, sees that it's written in some obscure language, is undecipherable, but also is only 60 lines long, and decides that it will probably be simpler to make a new (short) implementation in their own favorite language that correctly handles both the original use case and their new requirement. The key insight is that given how much easier it is to write fresh code than understand old stuff, they could very well be correct in that guess, and the end result is a single piece of small clean code, rather than a simple core with layers of patches glued on top.

In this particular case, we're talking about a "make" replacement, so testing the new implementation can be done by simply running "make all" for the project. If it passes, then the new implementation must be identical to the old one in all the ways that actually matter for the project at hand. In all likelihood, for a simple program like this, fixing one bug will also silently fix others because the new architecture is probably better than the old one.


I actually really like this approach, and have been thinking about this in regards to coding with an LLM - for a sufficiently simple program (and assuming no security concerns), once you trust your test suite, you should trust AI generated code that passes it. And then if requirements change, you should be able to amend the test cases, rerun the AI until it passes all tests and linters, maybe give the code a quick glance, and be on with your life.


The point is the “and fix them”


Not only is fixing more difficult, but also looking for likely weaknesses (and thus the inputs and outputs to focus on for testing).


> The awk program is not very readable

What do you find hard to read about it? If you know what make does, I think it is fairly easy to read, even for those who don’t know awk at all, but do know the Unix shell (to recognize ‘ls -t’) and C (both of which, probably the audience for this book knew, given that the book is from 1988)

> I think for something like this I'd prefer a bash script

But would it be easier to read? I doubt see why it would.


Bash also would have been an unlikely choice for a book published in 1988, considering it wasn't released until 1989 (Per Wikipedia).


It would have been ksh, which was the bash of the day, as in, the more featureful sh-compatible sh-superset.

But a bash or ksh script would have been less readable than awk.

bash (or ksh88 or ksh93) is powrful and useful but not readable if you're actually using the powerful useful features.

In bash, a lot of functionality comes in the form of brace expansions and word splitting, basically abusing the command parser to get results there is no actual function for. In awk and any other more normal programming language, those same features come in the form of an explicit function to do that thing.


>In bash, a lot of functionality comes in the form of brace expansions and word splitting, basically abusing the command parser to get results there is no actual function for. In awk and any other more normal programming language, those same features come in the form of an explicit function to do that thing.

Right. That's one of the reasons why the man page for bash is so long. IIRC, going way back, even the page for plain sh was long, for the same reason.


Indeed. But at least it acknowledges it, with the iconic "It's too big and too slow."


Interesting, didn't know. Been a while since I read the page.


> It would have been ksh

No, it wouldn’t have been ksh or any other shell, nor C or Perl, nor anything else but awk, in a book titled “The AWK Programming Language”.


Someone didn't read the thread (or lost the plot), but that didn't stop them from making a non-sensical remark about it.


> for exploratory data analysis and for one-liner data extraction scripts

I think both you and the author just don't like AWK if that's the takeaway. What you're describing is literally 1% of the AWK language -- like you don't have to like it, it's weird in many respects but you're treating AWK like it's jq when it's actually closer to like a Perl-Lite/Bash mix. An AWK focused on just those use-cases would look very different.

One of my favorite resources on AWK: https://www.grymoire.com/Unix/Awk.html


I think it should be appreciated in context: it's a good way to teach both awk(1) and make(1) to someone new to UNIX. It also demonstrates how to use awk(1) for prototyping, which IMO is a good programming habit to "develop": it forces to focus on the essential, and not to unnecessarily overthink.


> Is there something in the awk script that makes it advantageous over a shell script?

Pseudo multi-dimensional associative arrays for representing the dependency graph of make. This part:

  for (i = 2; i <= NF; i++)
      slist[nm, ++scnt[nm]] = $i
The way awk supports them is hacky and not really a multidimensional array, but still is better than what you would have to do with bash, because of split() and some other language features.

It would be much easier with any scripting language though, Perl for example.


It seems pretty readable to me, in particular the "update" function parses as JavaScript if you fix the implicit string concatenation (template literals or +) and replace the # comments with //. I'm actually surprised JavaScript is so similar to awk; it feels like a descendant language tbh.


Bash would really be bad idea if it is going to use bash stitching so many gnu utils for this kind of job.

I once had to rewrite a bash script into awk[1] that is big enough and it made the program more readable and the total time execution came down from 12 mins to less than 1 second.

I think maybe the original bash script would have written badly, (each util command will invoke it's own process and it has to piped to others instead of using awk which will be running in a single process).

[1] - https://github.com/berry-thawson/diff2html/blob/master/diff2...


Writing this make program in bash would invovle even more difficult to read hacks, as bash also does not support multidimensional arrays.


I would find it easier to read with more sensible, non-abreviated variable names.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: