Hacker News new | past | comments | ask | show | jobs | submit login

Where I run into trouble with awk is gawk incompatibilities with the implementation on Mac. The gawk manual really sucks at telling you what exactly is an extension to the language, and I haven't been able to find a good source -- you just have to either guess and check, or cross-check against other ones' manuals (like BSD). Otherwise it's an amazing tool...



I'd suggest just installing `gawk` from Homebrew and using it instead of having to guess which sort of crippled `awk` you're lucky to be using.

The same thing stands for GNU coreutils (`brew install coreutils`); here's macOS `cut` vs GNU `cut` as a quick example.

    ~ $ cut --help
    cut: illegal option -- -
    usage: cut -b list [-n] [file ...]
           cut -c list [file ...]
           cut -f list [-s] [-d delim] [file ...]    

    ~ $ gcut --help
    Usage: gcut OPTION... [FILE]...
    Print selected parts of lines from each FILE to standard output.    

    With no FILE, or when FILE is -, read standard input.    

    Mandatory arguments to long options are mandatory for short options too.
      -b, --bytes=LIST        select only these bytes
      -c, --characters=LIST   select only these characters
      -d, --delimiter=DELIM   use DELIM instead of TAB for field delimiter
      -f, --fields=LIST       select only these fields...
      -n                      (ignored)
          --complement        complement the set of selected bytes, characters or fields
      -s, --only-delimited    do not print lines not containing delimiters
          --output-delimiter=STRING  use STRING as the output delimiter
                                the default is to use the input delimiter
      -z, --zero-terminated    line delimiter is NUL, not newline
          --help     display this help and exit
          --version  output version information and exit


Looking at the Mac OS X source code https://opensource.apple.com/source/awk/awk-24/src/ its version of awk is the “one true awk” maintained by Brian Kernighan. For some reason Apple have deleted the README but you can find a copy at https://svnweb.freebsd.org/base/head/contrib/one-true-awk/


You should be looking at the POSIX specification, and assuming anything GNU awk documents on top of that is an extension: http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk...


Honestly that page isn't great at showing up in search results, but in any case -- POSIX can be too restrictive. I don't specifically recall an example for awk, but when every implementation supports something then it's a de-facto standard. If you follow POSIX then you become over-restricted compared to any shell you'll actually encounter. It'd be really nice if there was a page that showed the actual common features between various implementations of POSIX tools, not just the POSIX official features...


> that page isn't great at showing up in search results

I have twenty years of experience in getting that page to show up in search results. :)

Currently, a good way to get to it is these search terms:

  posix issue 7
that actually takes us to the newer version; the above is issue 6.


> I have twenty years of experience in getting that page to show up in search results. :)

Haha! I love that you acknowledge this because usually people just ignore all the experience they have in getting to the right page and make it look like you're dumb for not being able to find it. Thanks for the pointer! :-)


Before POSIX merged with the Single Unix Specification, I used to search using "single unix spec version 2" type queries; and that one brings us back to 1997:

http://pubs.opengroup.org/onlinepubs/7908799/

Ha, that didn't use frames yet! Totally forgot about that.


It's the first search result for "awk posix" for me and the fourth for "awk standard".

But yes, sometimes POSIX or the C standard or whatever is too restrictive, but it's still a good starting point for figuring out how to write portable code.

You'll know anything it doesn't cover is implementation-specific, and then either decide it's not worth it to pursue it, or if it is figure out whether the implementations you're targeting support the feature.


I thought the gawk book/documentation [1] did a good job of mentioning differences between various implementations, do you have an example?

You might find this [2] helpful (oops, seems like it got deleted, see [3] - thanks @bionoid)

[1] https://www.gnu.org/software/gawk/manual/gawk.html

[2] https://www.reddit.com/r/awk/comments/4omosp/differences_bet...

[3] https://archive.is/btGky


> do you have an example?

Sure, try this:

  echo 1 2 | awk '{ print gensub(/1/, "3", "g", $1); }'
The logical thing for them to do would be to mention in bold and/or big and/or red font under gensub's documentation that it's an extension (e.g. try nawk), whereas looking through it I don't see any mention at all: https://www.gnu.org/software/gawk/manual/html_node/String-Fu...

If I may rant about this for a bit, GNU software manuals are generally rather awful (though they're neither alone in this nor is it impossible to find exceptions). They frequently make absolutely zero effort to display important information more prominently and unimportant information less so (if you're even lucky enough that they tell you the important information in the first place). Like if passing --food will accidentally blow up a nuke in your hometown, you can expect that if they documented it at all, they just casually buried it in the middle of some random paragraph. Their operating assumption seems to be that if you can't be bothered to spend the next 4 hours reading a novel before writing your one-liner then it's just obviously your fault for sucking so much.


While I agree it should be more obvious, it does say in the opening section:

> Those functions that are specific to gawk are marked with a pound sign (‘#’). They are not available in compatibility mode (see section Command-Line Options)


Oh dear lord. I've looked at that page probably twenty times in the past year and still not seen the note about that pound sign. Thanks for pointing it out. Man it's infuriating.



I was there as well, and eventually decided to just always `brew install gawk` and alias `awk` to `gawk`, because more often than not I want to rely on gawk extensions (gawk has includes, for instance!).

To be fair though, every time I have read the manual for a gawk function it clearly says "this is a gawk extension" for non-standard implementations (case in point, delete[1], which is now POSIX, although Mac's awk is too old to have that implemented).

[1] https://www.gnu.org/software/gawk/manual/html_node/Delete.ht...


brew install gawk


Yes, but I don't control every computer my code runs on...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: