These blog posts and discussion usually pit one language against others and ofte...

benhoyt · on Sept 11, 2023

I don't think Python is very well suited to one-liners, but it's not due to interpreter startup time (20ms on my machine). Rather, it's due to all the scaffolding needed, which AWK provides implicitly: AWK automatically reads input lines and splits them into fields, automatically initializes variables to the type's default value, and has terser syntax for things like regex matching.

Consider the following AWK one-liner which, for every input line that starts with a letter, prints the line number and the line's second field:

  awk '/^[A-Za-z]/ { print NR, $2 }'

The equivalent Python program has a ton more boilerplate: import statements, explicit input reading and field splitting, and more verbose regex matching:

  import re
  import fileinput

  inp = fileinput.input(encoding='utf-8')
  for line in inp:
      if re.match(r'[A-Za-z]', line):
          fields = line.split()
          print(inp.lineno(), fields[1])

vidarh · on Sept 11, 2023

Ruby and Perl has the -n switches to provide that boilerplate. E.g Ruby:

    ruby -nae 'print $.," ",$F[1],"\n" if $_ =~ /^[A-Za-z]/'

-n wraps an implicit "while gets; ... ;end" around the code; "-a" adds an implicit "$F = $_.split" at the start of the loop; "-" takes an expression from the command line; $_ contains the result of the `gets`; $. contains the line number of the last line read.

Alternatively:

    ruby -ne '$_.match(/^[A-Za-z]+(.*)/) { puts "#{$.}#{$1}" }'

`match` sets $1, $2 etc to the corresponding capture group, and calls the block if successful.

The scaffolding would be easy to provide w/Python too, but the extra Awk/Perl-isms to make it convenient is another matter (and while I use them occasionally for one-liners, I will get shouty if I find $1 etc. in production code...).

Even the Ruby differences are sufficient extra noise that I still reach for awk for simple stuff like that.

1vuio0pswjnm7 · on Sept 11, 2023

Everyone has their own personal preferences.

Here is how I would do that task, assuming (a) I had to do it more than once and (b) I could choose any software. On the computer I'm using, the statically-linked, stripped binary is 50k versus a dynamically-linked gawk which is 623k. This solution is faster than AWK, Python, Go, etc. and uses much less CPU and memory. This is quick and dirty, written in a few minutes. I am not a paid programmer. I'm the so-called average user. I'm not compensated for writing programs.

usage: a.out <-- minimal typing

NB. There is a two space indent added to each line. One must remove exactly two spaces from each line or there will be error messages and this will not compile.

  #!/bin/sh
  flex -8Crf <<eof
   int fileno (FILE*);
   int x,y,n=1;
  %option noyywrap noinput nounput 
  %%
  ^[A-Za-z][^\n]+ {
   printf("%d ",n);
   for(x=0;x<yyleng;x++){if(yytext[x]==32)y++;
   if(y==1)putc(yytext[x],yyout);
   }
   putchar(10);y=0;
   }
  \n n++;
  .
  %%
  int main(){ yylex();exit(0);}
  eof
  cc -O3 -std=c89 -W -Wall -pedantic -pipe lex.yy.c -static

Brian_K_White · on Sept 11, 2023

If you suggested this as a joke, it is hilarious. Well done.

make3 · on Sept 11, 2023

I always thought we should make a short of Python for one liners inspired from awk, where the loop over the lines would be implied.

the line, lineno and fields would be predifined, and I guess re, os, shutil, pathlib and sys are pre imported. maybe the whole stdlib acts as if it's preimported, while only being imported lazyly

here it would be something like

```

if re.match(r'[A-Za-z]', line): fields = line.split() print(inp.lineno(), fields[1])

```

so

```

cat makefile | pyawk 'if re.match(r"[A-Za-z]", line): print(lineno, fields[1])'

```

I don't see a way out of multiple if statements requiring multiple lines though, otherwise you would have to introduce brackets to Python lol

cb321 · on Sept 11, 2023

Often whole program generation in a prog.lang (& ecosystem!) that you already know can substitute for a new prog.lang. Python even has eval. You may be interested in: https://github.com/c-blake/bu/blob/main/doc/rp.md

You can actually get pretty far depending upon boundaries with the always implicit command-option language (when launched from the shell language, anyway). For example, Ben's example can be adapted to:

    rp -m^\[A-Za-z\] 'echo nr," ",s[1]'

which is only 5 more characters and only 3 more key downs (less SHIFT-ing) than the space-optimized version of his `awk`. { key downs are, of course, just a start to a deep rabbit hole on HCI ergonometrics ending in heatmaps, finger reach/strain/keyboard layouts, left-right hand switching dynamics, etc., but they seem the most portable idea. }

Nim is not Python - it is actually a bit more concise while also being statically typed and can be compiled to code which runs as fast as the best C/C++ (at more expense than one usually wants for 1-liner interactive iteration, though unless you need to test on very large data). That said, I find it roughly "as easy" to enter `rp` commands as `awk`.

If doing this in Python tickles your fancy, Ben actually has an interesting on these ideas: https://benhoyt.com/writings/prig/ you might also find interesting.

EDIT: and while I was typing in a sibling @networked mentions a bunch more examples, but I think my comment here remains non-redundant. I'm not sure even one of those examples has some simple `-m` for auto-match mode (although many would say a grep pre-filter is enough for this).

networked · on Sept 11, 2023

Sorry, I have removed the list of awk replacements for other languages from that comment because I thought it wasn't the right place for it in the thread. I'll just post it here.

- Common Lisp: https://github.com/sharplispers/clawk

- Haskell: https://github.com/gelisam/hawk

- Racket: https://gitlab.com/xgqt/racket-rawk

- Tcl: https://wiki.tcl-lang.org/page/owh+%2D+a+fileless+tclsh (disclosure: the page links to my fork)

One use for an awk replacement is emitting more structured data. I have used my fork of owh a few times to emit JSON after awk-style parsing. I know GNU Awk can generate JSON with https://www.gnu.org/software/gawk/manual/html_node/gawkextli..., but I haven't tried it.

cb321 · on Sept 11, 2023

No problem. It might also bear mentioning that if one is willing to learn more specialized tools, even less key-downing is possible, such as (using https://github.com/c-blake/bu/blob/main/doc/cols.md):

    grep ^[A-Za-z]|cols 2

You just lose that row number in the original input coordinates feature of Ben's example which could probably be recovered with `grep -n` & `cols -d' :'`, etc., etc. In exchange, you can say `cols 2:5` to get a block of columns trivially. And then, of course, once you have any oft-repeated atom you can save it in a tiny script/etc.

A lot of these choices come down to atom discovery & how willing/facile someone is juggling/remembering syntax/sub-languages. In my experience, willingness tracks facility and both are highly variable distributions over the human population.

benhoyt · on Sept 11, 2023

Alec Thomas wrote a script like this called pawk.py (https://github.com/alecthomas/pawk). It reads input automatically, and for each line, defines "n" and "f" to the line number and fields list (among other things). It even supports /regex/ patterns. Even the print is implicit. So the example above would be:

  pawk '/^[A-Za-z]/ (n, f[1])'

By the way, triple backticks don't work on HN. You have to indent by 2 spaces to get a code block.

make3 · on Sept 11, 2023

thanks a lot for mentioning pawk, it really looks like what I had in mind

networked · on Sept 11, 2023

A sibling comment already mentions PAWK. You can do

  cat makefile | pyawk 'if re.match(r"[A-Za-z]", line): print(lineno, fields[1])'

in a Python one-liner without PAWK by abusing list comprehensions:

  python -c 'import fileinput, re; [print(re.split(r"\s+", line)[0], fileinput.lineno()) for line in fileinput.input() if re.match(r"[A-Za-z]", line)]' makefile

Edit: Removed a list of other awk replacements to post in a separate comment (https://news.ycombinator.com/item?id=37465164).

fuzztester · on Sept 11, 2023

>The equivalent Python program has a ton more boilerplate: import statements, explicit input reading and field splitting, and more verbose regex matching:

"awks and pythons"

fuzztester · on Sept 12, 2023

Like apples and oranges.