> Programs that just divide up work across processes are much easier to write wi...

adwn · on Oct 16, 2021

> GNU parallel is the way to go for dividing work up amongst CPU cores. Why reinvent the wheel?

Because most problems are not the embarrassingly parallel kind suitable for use with GNU parallel. For example, any problems that require some communication between the individual tasks.

sigotirandolas · on Oct 16, 2021

I don't think the parent was proposing reinventing the wheel, Python has straightforward process parallelism support in the 'multiprocessing' library and for Python that's generally a better idea than GNU Parallel, IMO.

globular-toast · on Oct 16, 2021

The advantage of GNU parallel is it's a standard tool that works for any non-parallel process. This has all the usual advantages of following the Unix principle.

adwn · on Oct 16, 2021

> that works for any non-parallel process

No, it doesn't. Only for those processes, where you can trivially split the input and concatenate the outputs. Try using GNU parallel to sort a list of numbers, or to compute their prefix sum – it's not possible, and those are even simpler use cases than most of what you'll encounter in practice.

globular-toast · on Oct 17, 2021

Oh come on. It should be obvious that I'm talking about the processes that can be split up in that way. Those problems are so common that someone literally wrote GNU parallel to solve them.

adwn · on Oct 17, 2021

> I'm talking about the processes that can be split up in that way

No, you weren't. You said: "[...] GNU parallel [...] works for any non-parallel process" (emphasis mine)

> Those problems are so common that someone literally wrote GNU parallel to solve them.

As part of my job I'm writing multi-threaded, parallel programs all the time, and in those years only a single problem would have been feasible to parallelize with GNU parallel; but since I was using Rust, it was trivial to do the parallelization right there in my code without having to resort an outer script/binary that calls GNU parallel on my program.

MainJane · on Oct 17, 2021

> Try using GNU parallel to sort a list of numbers,

`parsort` is part of GNU Parallel.

adwn · on Oct 18, 2021

... and it uses a manually implemented post-processing step. You can't just run the sort program with GNU parallel and expect to get a fully sorted list.

MainJane · on Oct 18, 2021

> Try using GNU parallel to sort a list of numbers, [...] – it's not possible,

Yet it clearly is possible, so your blanket statement is clearly wrong.

`parsort` a simple wrapper, and this really goes for many uses of GNU Parallel: You need to prepare your data for the parallel step and post-process the output.

Maybe you originally meant to say: "Only for those processes, where you can preprocess the input and post-process the outputs."

adwn · on Oct 18, 2021

Why would you use GNU parallel if you have to implement your own non-trivial pre- or post-processing logic anyway? Just spawn the worker processes yourself.

GNU parallel is great if you have, e.g., a bunch of files, each of which needs to be processed individually, like running awk or sed over it. Then you can just plop parallel in front and get a speedup for free. That's not what parsort does.

da39a3ee · on Oct 16, 2021

> GNU parallel is the way to go for dividing work up amongst CPU cores. Why reinvent the wheel?

We’re not talking about writing scripts to run on your laptop. We’re talking about code written for production applications. Deploying GNU parallel to production nodes / containers would be a major change to production systems that may not be feasible and even if it is would come with a high cost in terms of added complexity, maintenance, and production troubleshooting.

globular-toast · on Oct 18, 2021

I used to use GNU parallel to run big data tasks on supercomputers. There's nothing special about "production". It's all just computers.

lucb1e · on Oct 16, 2021

That's actually what I'm doing a lot of the time. Or even just bash: for i in {1..threadcount}; do pypy my.py $i/$threadcount & done;

da39a3ee · on Oct 16, 2021

We’re not talking about how to write scripts to run on your laptop, we’re talking about production systems.