Hacker News new | past | comments | ask | show | jobs | submit login

There was a twitter thread ages ago where someone had written a collection of (php?) utilities - and the twitterer posted a laughing slap down saying why write a utility when this one liner and that one liner will do?

There was a lot of push back - and this article is a good example of why

If I wanted to remove duplicate lines from a file I would almost certainly not use awk

I have never spent the time to get good enough with the whole new and different languge of awk, and am unlikely to need to (my large scale file processing needs seem small, and if I do it's almost always in context of other processing chains - so a normal languge like python would be the natural choice

I could whip up something like this in python in a less time than it would take to google the answer, read up why the syntax works that way and verify I have not mistyped anything on a few test files.

Basically using awk takes me out of my comfort zone - for a one off task it loses me time, for a production like repeat task I am going to reach for a slew of other solutions.

I mean the title of this page loses the exclamation mark - and it took me two goes to spot it.




For many years my AWK knowledge was limited to basic '{print $1}' style usage. I never bothered to learn more. I tended to use perl when I needed a custom text-processing operation. Later, as perl became less a part of my working life, I began using ruby instead - they are pretty similar in spirit.

One day, nearly 20 years after it was published, I picked up a used copy of The AWK Programming Language by Aho, Kernighan, and Weinberger. Yes, they are credited in that order on the cover... I suspect intentionally. I only read the first N chapters, but it was enough. I used AWK many times within the following month, and I continue to use AWK on a daily basis. When the task is complicated, I will still use ruby, but often enough AWK is easier.

The point: you think "Why would I learn X when I can use Y?", but you won't really know the answer until you learn X. If I had never learned perl, python, ruby, AWK, shell script, vi macros, then I would probably be editing files by hand (!) like I sometimes catch developers actually doing (!!!). For a person who doesn't know these tools, that might actually be the path of least resistance. Investing some time here and there to learn new tools pays off in the future in ways that are unpredictable.


The basic imperative Python version is much easier to remember and read though, even for not-that-experienced Python programmers. I would expect laypeople to be able to more-or-less figure out what it is supposed to do.

  seen = set()
  with open(filename, "r") as file:
    for line in file:
      if line not in seen:
        print(line)
        seen.add(line)
Often (at least in my experience) this kind of operation is either (a) part of some larger automated data processing pipeline for which it’s really nice to have version control, tests, ... or (b) part of some interactive data exploration by a programmer sitting at a repl somewhere, not just a one-off action we want to apply to one file from the command line.

In those contexts, the Python (or Ruby or Clojure or whatever general-purpose programming language) version is easy to type out more-or-less bug-free from memory, debug when it fails, slot into the rest of the project, modify as part of a team with varied experience, etc. etc.


One advantage is that

  seen.add(line)
can be changed to

  seen.add(hash(line))
which can be significantly more memory efficient for files with long lines.


Or perhaps better, if needs change the seen = set() object can be swapped out for any alternative object seen = foo that provides foo.__contains__ and foo.add methods.

This could involve saving previously seen lines in a radix tree, adding multiple layers of caching, saving infrequently seen lines to disk or over the network, etc. as appropriate for the use case.


I don't get your point, it seems like you do not often use cli text processing tools

Just like Python, there are users who use cli and are comfortable using grep/sed/awk/sort/etc


The point is that the people who do use such tools tend to have a derisive attitude towards those who don't, and that the derisiveness is completely unwarranted.


It's very likely that those people know both python and awk, and this their attitude of superiority is not unwarranted.

It's much faster to type out the awk line then to write the same in python.


> It's very likely that those people know both python and awk, and this their attitude of superiority is not unwarranted.

Ok....

> It's much faster to type out the awk line then to write the same in python.

Is there some sort of speed-typing award that's being handed out that I'm missing? If there isn't, why would they feel superior?

We're all† smug pricks, but that's no cause for celebration. And 99% of the time we're not even justified in our smugness.

† All = a huge chunk of IT people, developers especially.


No, if you're a dick you're still a dick.

I know both Python and Awk. Do I go around telling people "stop using your preferred tool, even though it's efficient enough and works fine, use this other esoteric one instead"? Hell no.


And what if instead of "stop using your preferred tool" the person says "there's this other tool I use and I find it makes these sorts of tasks easier; You might like it, too."


That's different, and obviously not a case of the "derisive attitude" I pointed out. Just because you don't do it, doesn't mean people don't do it.


To me, this sounds like coming up with an ad hominem argument to rationalize not learning something that one finds different and challenging. For the record, I do not know awk, but that's because I just haven't taken the time to learn it yet, not because I (ironically) believe it's only for people who think they're better than me.


Perhaps what I am trying to say is

One-off one liners dashed off without syntax errors speaks of long and deep usage of a command line tool. That's cool. But continuing to use those one liners worries me for reasons not to do with skill

I would worry about the manual versus automation being used here. I can think of many cases where a sed/awk solution will work really well - but they almost always will be part of a larger developed and supported pipeline.

But if you using the one liner for anything not trivial you are still doing too much manual work

trying to be even shorter - if awk is your tool great! But ... at some point (and that point is much closer today than previously) anything we do needs a suite of tools we have hacked together and rewritten and passed around - from log file analysis to whatever.

And while awk can absolutely play a role in those tools, I doubt very much that anyone is good enough to make the one liners on the fly.

An quick example might be "show me all the logs for the request sent by user X in the last five minutes off the front web servers but ignore the heartbeat from that app marketing put out and ..."

I want that in my path, alongside everything else I and others working on the systems think useful.

Yes hack together your tools with any language you like. Put them in a seperate repo with all the linting turned off

But don't try and one liner them from scratch.


I agree. My perception is that tools like awk are best used for one-off tasks, whereas anything part of a greater pipeline should be written in a more readable/maintainable language.

I was just pushing back against the sentiment that awk is undesirable because of attitutes that its users may have, which I don't think you were expressing :)


No? As I said elsewhere, I know and use Awk. I just don't have an attitude about it.


>The point is that the people who do use such tools tend to have a derisive attitude towards those who don't,

This has not been my experience.


I don’t think you need to learn every cli tool in depth. I don’t know awk, but I recognize its power. However, I’ve got a set of tools I find easier to use, a few that I’ve written, and Ruby or Python at the command line. I’m sure awk could replace many cli tools, but at the cost of learning a new language (and cognitive load with each use) it hasn’t been worth it.


Thank you for putting on three sentences my multi-paragraphs :-)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: