Hacker News new | past | comments | ask | show | jobs | submit login

Awk is wonderful. It's an odd way to write programs, but for quick one-off processing tasks it almost can't be beaten.

Somewhat related blog post which I like to refer people to: "Command-line Tools can be 235x Faster than your Hadoop Cluster" https://adamdrake.com/command-line-tools-can-be-235x-faster-...




Why do you find it odd? I find it to be the very best introduction to C-style control structures.

Chapter 2 of The AWK Programming Language has incredible benefits for a novice.

https://archive.org/download/pdfy-MgN0H1joIoDVoIC7/The_AWK_P...


They probably refer to how it has the top-level as

  <line-condition> { <code> }
That's unusual enough among programming languages to call it odd. Being able to do stuff like

  if (/some-pattern/) { ...
and have the regex be evaluated like a condition where it matches with the current line implicitly is also pretty unique.


  if (/some-pattern/) { ...
This isn't really unique when you consider perl.

  while (<>) {
    if (/pattern/) {
This does the same. Awk simply has the implicit loop.


Yes, I think Perl based that on Awk, but then those are the only 2 languages I know that support something like that. That's still very unique. On the implicit loop, along with Ruby, they're the only 3 languages I know that support something like that. That's also pretty unique, and Awk is the only one that has the implicit loop as a requirement.


The implicit loop is uncommon among general-purpose langauges, but very common among filtering-centric languages like awk and grep and sed. For a more recent example, see jq. perl 5 may be close to the only mainstream general-purpose language to have embraced that paradigm though.


Article's like OPs about removing duplicate lines make me want to learn awk.

But it feels like it would be a net loss based on how seldom I currently need to write one-off scripts.

Based on experience, I'd probably have a perfect use-case for it every 2-3 years.


Or if you know you can achieve some tasks with one of these tools, you might suddenly realize more tasks than you imagined might be solved with them.


Do you consider learning a tool you won't use a waste of time?

IMHO knowing what kinds of tools exist and how they are used for different tasks is enormously useful. Most software projects require me to create a set of tools to solve problems in a certain space efficiently. In any long-living non-trivial project there will be feature requests you couldn't have anticipated in the beginning. They tend to be painful if your program is just a bunch of features hacked together. But if you take a tools-first approach, the unexpected features can often be solved with what you have.

Of course, time is limited and you can't learn everything. But learning one of every different kind of tool is a very good use of time.

EDIT: Note that I'm not claiming you'll build a web app with awk. I'm saying you might write code that can be used similarly to awk in some abstract sense, and that might be a core part of a web app.


Even if you do, it's a better investment to learn a general high level language with strong scripting capabilities, but that is also good at many things else.

Sure, the days you'll need awk, you'll take 15 minutes instead of 2 writing your script. So what ?

But the rest of the year, you'll have a more versatile toolbox at your disposal for automatic things, testing, prototype network processes, make quick web sites or API, and explore data sets.

That being said, I can see the point of learning awk because, well, it's fun.


> Even if you do, it's a better investment to learn a general high level language with strong scripting capabilities, but that is also good at many things else.

And we already have that: it's called perl. :)


I tried very hard not to name a specific language so that the point is not cancelled by some lang war.


Are there any pipeline tools for command line stream processing? Because when you have several terabytes of data you can't exactly afford to restart due to a stray comma in your CSV file.


If you have a stray comma in your multi-TB CSV file, you probably don't _want_ it to keep going. You risk misinterpreting the mistake and having a grossly malformed output... There's no way to reliably and elegantly recover from something like that. Validation should preferably happen before processing




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: